Architecture
This is the conceptual walkthrough. For API details, see the command reference. For strategy, see VISION.md.
The one-sentence pitch
Section titled “The one-sentence pitch”CTXone is a versioned, searchable, structured memory store that gives AI agents persistent context without the token cost of carrying that context on every turn.
Why “O(log n)” beats “flat memory files”
Section titled “Why “O(log n)” beats “flat memory files””A flat memory file grows linearly with your project knowledge. Every message to the agent ships the whole file — useful facts, dead facts, and everything in between. More knowledge makes the agent smarter but every turn costs more.
turn 1: [whole memory file] + [message] → cost O(n)turn 2: [whole memory file] + [message] → cost O(n)turn 3: [whole memory file] + [message] → cost O(n)...CTXone inverts this. The graph is stored once, on disk. Every recall fetches only the facts relevant to the current question, plus a small set of always-included “pinned” facts. Adding more knowledge to the graph doesn’t increase per-turn cost.
turn 1: [pinned + relevant 3 facts] + [message] → cost O(log n)turn 2: [pinned + relevant 5 facts] + [message] → cost O(log n)turn 3: [pinned + relevant 2 facts] + [message] → cost O(log n)...The engine is AgentStateGraph — a branching, versioned state store with intent metadata. CTXone layers a memory-oriented API on top.
The four kinds of memory
Section titled “The four kinds of memory”Every fact in the graph falls into one of these buckets:
1. Pinned memory (/memory/pinned/<source>/<slug>)
Section titled “1. Pinned memory (/memory/pinned/<source>/<slug>)”Always included in every recall response, regardless of topic. Use for critical context the agent must always see: project conventions, licensing, current status, team norms.
Added via: ctx prime <file.md> --pin
Storage: Structured {title, body} pairs. Grouped by source so you can
re-prime from the same markdown file idempotently.
2. Primed memory (/memory/primed/<source>/<slug>)
Section titled “2. Primed memory (/memory/primed/<source>/<slug>)”Searchable like normal facts but grouped into a named source. Use for reference material you want the agent to find when relevant but not pay for on every call: API docs, design notes, historical decisions.
Added via: ctx prime <file.md> (no --pin)
Storage: Same as pinned but under /memory/primed/*.
3. Free-form facts (/memory/<context>/<id>)
Section titled “3. Free-form facts (/memory/<context>/<id>)”Individual facts added incrementally. Use for accumulating project knowledge as you work: decisions, bug fixes, preferences, status updates.
Added via: ctx remember "fact" --context <category>
Storage: String values under /memory/<context>/<timestamp>.
4. Sessions (/sessions/<session_id>/{summary,decisions,details})
Section titled “4. Sessions (/sessions/<session_id>/{summary,decisions,details})”End-of-session commits capturing what was learned in a working session. Enables the “close a session, open a new one, context preserved” workflow.
Added via: MCP summarize_session tool (agents call this automatically)
How recall ranks results
Section titled “How recall ranks results”When you (or an agent) runs recall "topic", here’s what the Hub does:
-
Fetch all pinned memories. These go into the response first, up to half the token budget. No matter what you searched for, pinned content is always included.
-
Tokenize the query. Split on whitespace, drop stopwords and short tokens.
"licensing decisions"becomes["licensing", "decisions"]. -
Run a full-phrase search for the exact query string. Hits get marked
full_match: trueand win ties. -
Run a per-token search for each query token. Aggregate by path: each path’s score is the number of tokens that matched, weighted so full-phrase hits outrank same-count token hits.
-
Deduplicate — don’t repeat a pinned section if its title or body also matches the topic search.
-
Budget-cap — add results in rank order until the remaining token budget is spent. Half for pinned, half for topic matches.
-
Return the combined list plus metadata: pinned count, topic match count, tokens sent, flat-memory baseline, savings ratio.
How savings are measured
Section titled “How savings are measured”Every response carries three numbers:
ctx_tokens_sent: how many tokens this specific response usedctx_tokens_estimated_flat: how many tokens would have been sent if the whole graph were loaded instead (the flat-memory baseline)ctx_savings_ratio:flat / sent
The baseline is computed by serializing the entire graph to JSON and dividing the character count by 4 (a rough tokens-per-char estimate). It’s cached between writes and only recomputed when the graph changes.
See TOKEN_SAVINGS.md for the full derivation and how to read the numbers.
Branches (multi-context memory)
Section titled “Branches (multi-context memory)”The graph supports git-style branches. Every read and write takes an optional
--branch parameter. Common patterns:
- Main branch — your shared project memory
- Experiment branches — try priming a large doc, see if recall improves, merge or discard
- Per-session branches — give a one-off session its own memory namespace without polluting main
- Per-agent branches — each agent writes to its own branch, with periodic merges
ctx branch experimentctx --branch experiment prime ./big-proposal.md --pinctx --branch experiment recall "architecture" # see if it helpsctx diff main experiment # compareProvenance (blame)
Section titled “Provenance (blame)”Every write carries intent metadata: an agent ID, a category (Observe,
Checkpoint, Refine, etc.), a description, and a confidence score. ctx blame <path> returns the full chain of commits that touched a path, so you can
answer “where did this fact come from?” and “when was it last updated?”
This turns the memory graph from a black box into an auditable, accountable ledger.
Token tracking is built in, not bolted on
Section titled “Token tracking is built in, not bolted on”The Hub maintains session-wide counters for tokens sent and tokens saved.
Every recall updates them. ctx stats and GET /api/stats/tokens return
the cumulative totals — making the savings claim measurable, not a vibes
improvement.
The full stack
Section titled “The full stack”┌─────────────────────────────────────────────────────────┐│ Your AI tools ││ ┌──────────────┐ ┌──────────┐ ┌────────┐ ┌───────┐ ││ │ Claude Code │ │ Cursor │ │ VSCode │ │ Codex │ ││ └──────┬───────┘ └────┬─────┘ └────┬───┘ └───┬───┘ ││ │ │ │ │ ││ │ MCP stdio │ │ ││ │ │ │ │ │└─────────┼───────────────┼──────────────┼──────────┼──────┘ │ │ │ │ ▼ ▼ ▼ ▼ ┌───────────────────────────────────────────────────┐ │ CTXone Hub (ctxone-hub) │ │ MCP tools: remember / recall / prime / ... │ │ HTTP API: /api/memory/*, /api/state/* │ │ Token tracker: session savings │ └──────────────────────┬────────────────────────────┘ │ ▼ ┌───────────────────────────────────────────────────┐ │ CTXone Engine (AgentStateGraph) │ │ Versioned state store, branches, blame, │ │ intent metadata, confidence scoring │ └──────────────────────┬────────────────────────────┘ │ ▼ ┌─────────────────┐ │ SQLite / │ │ Postgres / │ │ in-memory │ └─────────────────┘Alongside the stdio MCP path, two other surfaces talk to the Hub over HTTP:
ctxCLI — scripting and inspection (remember,recall,search,log,diff,tail, …)- CTXone Lens — SvelteKit web UI for browsing the graph visually
All three surfaces hit the same Hub. There’s no parallel memory.
What the Hub is not
Section titled “What the Hub is not”- Not a vector store. Retrieval is structural path + substring search, not similarity. This keeps it predictable and blame-able — you know exactly why each result came back.
- Not a message queue. The graph is state, not events. Use a log/stream for audit trails beyond commit-level provenance.
- Not a Claude-specific tool. MCP is the default surface, but the HTTP
API works for any client. The
ctxCLI proves this.