Use case

Real memory for self-hosted chat.

You run Open WebUI at home. The model forgets, between messages, which framework you picked last week. You paste the same context again. And again. The whole point of self-hosting was to own your data — but the model isn't using it.

CTXone ships a native Open WebUI plugin with two parts: a Tool the model can call explicitly (remember, recall, forget, list_pinned), and a Filter that runs around every chat turn and auto-injects relevant memory into the system prompt. The Filter works even with models that don't support tool calling at all.

The one-command self-hosted stack

We ship a reference docker-compose at examples/self-hosted-stack/ that stands up the full environment in one command:

git clone https://github.com/ctxone/ctxone
cd ctxone/examples/self-hosted-stack
cp .env.example .env        # edit WEBUI_SECRET_KEY
docker compose up -d

This gives you:

  • CTXone Hub on port 3001 — the memory layer
  • CTXone Lens on port 5173 — the web UI for browsing, blame, diff, and forget
  • Ollama on port 11434 — local LLM runtime
  • Open WebUI on port 8080 — chat UI with the CTXone plugin file bind-mounted at /app/ctxone-plugin.py, ready to drop into Admin Panel → Functions

What the Filter does, in practice

On every message the user sends, the Filter intercepts the request in its inlet() hook, calls hub.recall(topic=last_user_message), and prepends a new system message to the conversation that looks like this:

## Relevant memory from CTXone
Retrieved for topic: 'licensing'

- [pinned] **Vision** — ship a BSL-1.1 product with MIT clients
- [fact] CTXone uses BSL-1.1
- [fact] Converts to Apache 2.0 four years after release

_(CTXone: this retrieval is 14.2× smaller than loading
the full memory graph.)_

The model sees that before it generates its reply. It doesn't have to ask for it. The user doesn't have to know CTXone exists. It just works — and because the Filter runs entirely in Open WebUI's Python process talking to the Hub over HTTP, there's no extra wire protocol to debug.

Per-user memory isolation

The plugin sets both the X-CTXone-Session and X-CTXone-Agent request headers to the Open WebUI user's email. That means:

  • Each user gets their own session in the Hub's token stats (GET /api/stats/tokens/<[email protected]>), so you can tell who's using the most tokens and which users are getting the most value from memory.
  • Every fact written via the Tool is attributed to that user in ctx blame, so a shared install still has clear provenance.
  • Users can optionally pin a private branch (via UserValves) so their memory doesn't leak into anyone else's chat.

Opt-in self-teaching

The Filter has an outlet() hook that's off by default. When you turn it on per-user, it stores the assistant's replies as low-importance facts so every conversation becomes training data for the next one. It's off by default because auto-capture is useful but can also fill your memory with junk — your call.

Ready to try it?