Exploring how complex systems can work better.

Practical engineering and thoughtful research on the problems that sit between people and technology.

Research · Open Source · Fractional & Advisory

Currently exploring: LLM infrastructure — memory systems, agent benchmarking, and the gap between what vendors promise vs. what actually works in production.

Writing

All posts →

Dec 05, 2025

Design Your LLM Memory Around How It Fails

Not all context is sacred. Design your agent's memory around what happens when critical information gets dropped.

Nov 21, 2025

Universal LLM Memory Does Not Exist

I benchmarked Mem0 and Zep on MemBench to understand why production agents were failing. Memory systems cost 14-77x more and were 31-33% less accurate than naive long-context.

Nov 07, 2025

LLM Memory Systems Explained

An introductory guide to how LLMs handle 'memory', from context windows to retrieval systems and everything in between.

Oct 31, 2025

Introducing Context-Store

Why we built infrastructure for LLM context management, and what problems it solves.

Open Source

GitHub →

pacabench

Benchmarking agents shouldn't mean wrestling with brittle scripts and lost progress.

Local-first, reproducible benchmarks with isolated execution and persistent state. No SDK lock-in.

context-store

Users expect full message history, but LLMs have hard limits.

A reliable Elixir service for context management. Raft consensus, horizontal scaling, deterministic compaction.