Exploring how complex systems can work better.
Practical engineering and thoughtful research on the problems that sit between people and technology.
Research · Open Source · Fractional & Advisory
Currently exploring: LLM infrastructure — memory systems, agent benchmarking, and the gap between what vendors promise vs. what actually works in production.
Writing
All posts →Design Your LLM Memory Around How It Fails
Not all context is sacred. Design your agent's memory around what happens when critical information gets dropped.
Universal LLM Memory Does Not Exist
I benchmarked Mem0 and Zep on MemBench to understand why production agents were failing. Memory systems cost 14-77x more and were 31-33% less accurate than naive long-context.
LLM Memory Systems Explained
An introductory guide to how LLMs handle 'memory', from context windows to retrieval systems and everything in between.
Introducing Context-Store
Why we built infrastructure for LLM context management, and what problems it solves.
Open Source
GitHub →pacabench
Benchmarking agents shouldn't mean wrestling with brittle scripts and lost progress.
Local-first, reproducible benchmarks with isolated execution and persistent state. No SDK lock-in.
context-store
Users expect full message history, but LLMs have hard limits.
A reliable Elixir service for context management. Raft consensus, horizontal scaling, deterministic compaction.