Introducing Context-Store

Every team building LLM apps hits the same wall: users expect full message history, but LLMs have hard context limits.

The Problem

Users often need to review and trust the full conversation history.
LLMs need compaction to meet latency and token budgets.

Those two requirements pull in opposite directions. So we built context-store to improve LLM latency while preserving full history for your users. It’s simple, predictable, and works well for AI assistants.

The Pattern Everyone Rebuilds

We kept seeing teams build the same stack:

Redis for hot message storage
Postgres for persistence
Pub/sub for real-time updates
Custom compaction logic to manage the window

Every implementation is slightly different. Every team makes similar mistakes. The infrastructure becomes a distraction from the actual product.

What Context-Store Does

Context-store turns that pattern into a single service. You can choose to back everything up into postgres with a write-behind feature for analytics and to act as cold storage.

You set a token budget. You pick a compaction policy (or write your own). The service handles:

Storing full message history
Enforcing token budgets
Compacting context automatically
Horizontal scaling, add more nodes to accept more data & traffic
Optional cold storage in Postgres

const ctx = await fastpaca.context('chat_42', {
  budget: 1_000_000,
  trigger: 0.7,
  policy: { strategy: 'last_n', config: { limit: 400 } }
});

await ctx.append({ role: 'user', parts: [{ type: 'text', text: 'Hi' }] });
const { messages } = await ctx.context();

Example: Next.js Chat

We took this architecture and built a minimal chat app in Next.js that shows the pattern end-to-end: append messages, keep full history, and compact deterministically before calling your model.

Example code →

Animated demo of context-store powering a Next.js chat with deterministic compaction

Why Elixir

We built this in Elixir because:

Raft consensus for distributed state is table stakes
Process supervision makes failure handling clean
Hot code upgrades for zero-downtime deployments
Memory efficiency at scale

We chose Elixir for its concurrency model and reliability characteristics that fit this problem well.

What's Next

Context-store is production-ready and Apache 2.0 licensed.

The research continues:

Compression techniques on token-spaces
Semantic compaction (embedding-based strategies)
Multi-modal context handling

If you're building LLM apps and this problem sounds familiar, check out the docs or browse the code.

_This is part of the broader fastpaca research into hard problems at the intersection of humans and technology. More posts coming!