Introducing Context-Store
Why we built infrastructure for LLM context management, and what problems it solves.
Every team building LLM apps hits the same wall: users expect full message history, but LLMs have hard context limits.
The Problem
- Users often need to review and trust the full conversation history.
- LLMs need compaction to meet latency and token budgets.
Those two requirements pull in opposite directions. So we built context-store to improve LLM latency while preserving full history for your users. It’s simple, predictable, and works well for AI assistants.
The Pattern Everyone Rebuilds
We kept seeing teams build the same stack:
- Redis for hot message storage
- Postgres for persistence
- Pub/sub for real-time updates
- Custom compaction logic to manage the window
Every implementation is slightly different. Every team makes similar mistakes. The infrastructure becomes a distraction from the actual product.
What Context-Store Does
Context-store turns that pattern into a single service. You can choose to back everything up into postgres with a write-behind feature for analytics and to act as cold storage.
You set a token budget. You pick a compaction policy (or write your own). The service handles:
- Storing full message history
- Enforcing token budgets
- Compacting context automatically
- Horizontal scaling, add more nodes to accept more data & traffic
- Optional cold storage in Postgres
const ctx = await fastpaca.context('chat_42', {
budget: 1_000_000,
trigger: 0.7,
policy: { strategy: 'last_n', config: { limit: 400 } }
});
await ctx.append({ role: 'user', parts: [{ type: 'text', text: 'Hi' }] });
const { messages } = await ctx.context();
Example: Next.js Chat
We took this architecture and built a minimal chat app in Next.js that shows the pattern end-to-end: append messages, keep full history, and compact deterministically before calling your model.

Why Elixir
We built this in Elixir because:
- Raft consensus for distributed state is table stakes
- Process supervision makes failure handling clean
- Hot code upgrades for zero-downtime deployments
- Memory efficiency at scale
We chose Elixir for its concurrency model and reliability characteristics that fit this problem well.
What's Next
Context-store is production-ready and Apache 2.0 licensed.
The research continues:
- Compression techniques on token-spaces
- Semantic compaction (embedding-based strategies)
- Multi-modal context handling
If you're building LLM apps and this problem sounds familiar, check out the docs or browse the code.
_This is part of the broader fastpaca research into hard problems at the intersection of humans and technology. More posts coming!