Open-source Rust/Axum backend service for storing and retrieving long-term user memory from conversations. The service persists raw turns and messages, extracts structured memories, tracks supersession, and serves recall/search results with evidence while filtering common memory-pollution cases such as third-party facts, hypotheticals, quoted or sample text, unsupported OpenAI candidates, and stale current-state facts.
memory-service is a single Rust 2021 binary built with Axum, Tokio, and SQLite. It exposes HTTP endpoints for ingesting conversation turns, recalling user memory, searching stored knowledge, listing memories, and deleting session or user data.
The default path is deterministic and local: rule-based extraction, SQLite persistence, FTS5 lexical search, RRF-lite rank fusion, and structured ranking. OpenAI extraction and embeddings are optional overlays when an API key is configured.
- Ingests multi-message turns with user, assistant, and tool messages.
- Stores raw turns/messages as provenance and structured memories as queryable state.
- Extracts current employment, role/title, legal employer names, current location, pets, household/family facts, hobbies, caffeine preferences, scheduled or recent events, dietary restrictions, allergies, answer style, favorites, and selected programming preferences.
- Tracks mutable fact evolution through active/inactive records, evidence, and audit operations.
- Handles explicit same-message corrections, contextual cross-turn corrections, and current-state negations for mutable facts such as employment, role/title, location, diet, caffeine, and allergies.
- Rejects many low-signal inputs: examples, quoted sample facts, draft/translation/summarization traps, JSON/test-fixture payload traps, fictional framing, future wishes, hypotheticals, generic corrections, pure negations, third-party claims, and digit-only durable values.
- Uses SQLite WAL mode, foreign keys, FTS5 tables, and optional sqlite-vec vector search.
- Supports optional bearer authentication for memory endpoints.
- Includes fixture-driven recall evaluation, adversarial weakness evaluation, an HTTP black-box script, and Docker restart persistence verification.
flowchart LR
Client["HTTP client"] --> Router["Axum router"]
Router --> Handlers["HTTP handlers and DTO validation"]
Handlers --> Services["Turn, recall, search, memory services"]
Services --> Extractors["Deterministic rules plus optional OpenAI extraction"]
Services --> Repo["Repository layer"]
Repo --> SQLite["SQLite database"]
SQLite --> FTS["FTS5 tables"]
SQLite --> Vec["sqlite-vec table"]
Services --> OpenAI["Optional OpenAI APIs"]
See docs/ARCHITECTURE.md for the detailed component model.
- Rust stable toolchain with Cargo.
- Docker and Docker Compose for containerized runs.
- Optional:
OPENAI_API_KEYfor OpenAI Structured Outputs extraction and embeddings.
git clone https://github.com/Berektassuly/memory-service.git
cd memory-service
cargo test --locked -j 1Set a writable local database path before running. The compiled default path is /data/memory.sqlite3, which is primarily intended for Docker.
MEMORY_HOST=127.0.0.1 MEMORY_PORT=8080 MEMORY_DB_PATH="$PWD/memory.sqlite3" cargo run --lockedVerify health:
curl -sS http://127.0.0.1:8080/healthThe health endpoint is public and checks SQLite readiness with a lightweight query.
Ingest a minimal turn:
curl -sS -X POST http://127.0.0.1:8080/turns \
-H "Content-Type: application/json" \
-d '{
"session_id": "session-1",
"user_id": "user-1",
"messages": [
{ "role": "user", "content": "I work at Notion as a product manager" }
],
"timestamp": "2026-05-12T10:00:00Z",
"metadata": { "source": "quick-start" }
}'Recall the stored memory:
curl -sS -X POST http://127.0.0.1:8080/recall \
-H "Content-Type: application/json" \
-d '{
"query": "Where does the user work now?",
"session_id": "session-1",
"user_id": "user-1",
"max_tokens": 1024
}'docker compose up -d --build
curl -sS http://127.0.0.1:8080/healthCompose stores SQLite data in the named memory-data volume mounted at /data. Stop without deleting persisted data:
docker compose downFor local PowerShell runs, set environment variables before cargo run:
$env:MEMORY_HOST = "127.0.0.1"
$env:MEMORY_PORT = "8080"
$env:MEMORY_DB_PATH = "$PWD\memory.sqlite3"
cargo run --locked| Variable | Default | Source | Description |
|---|---|---|---|
MEMORY_HOST |
0.0.0.0 |
src/config.rs, Docker |
Bind host. |
MEMORY_PORT |
8080 |
src/config.rs, Docker |
Bind port. |
MEMORY_DB_PATH |
/data/memory.sqlite3 |
src/config.rs, Docker |
SQLite database file path. |
MEMORY_AUTH_TOKEN |
unset | .env.example, src/config.rs, Docker |
Optional bearer token for memory endpoints. |
OPENAI_API_KEY |
unset | .env.example, src/config.rs |
Enables configured OpenAI features when present. |
ENABLE_OPENAI_EXTRACTION |
true |
.env.example, src/config.rs |
Enables OpenAI extraction only when an API key is present. |
ENABLE_OPENAI_EMBEDDINGS |
true |
.env.example, src/config.rs |
Enables embeddings only when an API key is present and dimensions are 1536. |
OPENAI_EXTRACTION_MODEL |
gpt-4o-mini |
.env.example, src/config.rs |
Chat Completions model used for structured extraction. |
OPENAI_EMBEDDING_MODEL |
text-embedding-3-small |
.env.example, src/config.rs |
Embeddings model. |
OPENAI_EMBEDDING_DIMENSIONS |
1536 |
.env.example, src/config.rs |
Vector dimensions. Other values disable embeddings in current code. |
OPENAI_REQUEST_TIMEOUT_MS |
5000 |
.env.example, src/config.rs |
OpenAI request timeout. |
OPENAI_MAX_RETRIES |
1 |
.env.example, src/config.rs |
OpenAI retry count after the first attempt. |
RUST_LOG |
info,memory_service=debug |
src/telemetry.rs |
Tracing filter. |
MEMORY_HOST_PORT is a Docker Compose host-port override, not an application setting.
| Method | Path | Purpose | Auth |
|---|---|---|---|
GET |
/health |
SQLite readiness check. | Public |
POST |
/turns |
Ingest a conversation turn. | Optional bearer |
POST |
/recall |
Build recall context with citations. | Optional bearer |
POST |
/search |
Return ranked memory/message results. | Optional bearer |
GET |
/users/{user_id}/memories |
List active and inactive memories for a user. | Optional bearer |
DELETE |
/sessions/{session_id} |
Delete a session and related data. | Optional bearer |
DELETE |
/users/{user_id} |
Delete all data for a user. | Optional bearer |
If MEMORY_AUTH_TOKEN is unset, memory endpoints are open. If it is set, send Authorization: Bearer <token>. GET /health is always public.
JSON request bodies are capped at 1 MiB, and each message content field is capped at 65,536 characters. Oversized bodies or oversized individual messages return 413 Payload Too Large; malformed JSON and missing required fields return Axum or application 4xx errors.
GET /users/{user_id}/memories returns canonical memory rows plus compatibility aliases for common legacy-style keys when they can be derived from canonical records: employer, lives_in, has_pet, diet, and allergic_to. Alias IDs are prefixed with alias: and alias supersession pointers mirror the canonical lifecycle.
/recall.max_tokens must be between 1 and 4096; the examples use 1024. Recall context packing uses approximate token counting and may include a small internal packing margin.
Full contract details are in docs/API.md. An OpenAPI specification is not currently implemented.
curl -sS -X POST http://127.0.0.1:8080/turns \
-H "Content-Type: application/json" \
-d '{"session_id":"session-1","user_id":"user-1","messages":[{"role":"user","content":"I work at Notion as a product manager"}],"timestamp":"2026-05-12T10:00:00Z","metadata":{"source":"manual"}}'Response:
{
"id": "generated-turn-id"
}curl -sS -X POST http://127.0.0.1:8080/recall \
-H "Content-Type: application/json" \
-d '{"query":"Where does the user work now?","session_id":"session-1","user_id":"user-1","max_tokens":1024}'Response shape:
{
"context": "## Known facts about this user\n- Current employment: Works at Notion as a product manager.",
"citations": [
{
"turn_id": "generated-turn-id",
"score": 17.0,
"snippet": "I work at Notion as a product manager"
}
]
}curl -sS -X POST http://127.0.0.1:8080/search \
-H "Content-Type: application/json" \
-d '{"query":"Notion product manager","session_id":"session-1","user_id":"user-1","limit":10}'The service separates raw episodic data from structured semantic state:
sessions,turns, andmessagespreserve conversation provenance.memoriesstores structured facts, preferences, opinions, and events.memory_evidencelinks memories to source messages and quotes.memory_opsrecords add, supersede, and noop operations.- Mutable keys such as
employment.currentandlocation.currentkeep only the latest value active while preserving superseded rows. - Cleared allergies use
allergy_cleared.<substance>instead of activeallergy.<substance>so current recall does not imply the allergy is still present. - User memory listing may include derived compatibility aliases alongside canonical keys; canonical rows remain the storage source of truth.
See docs/MEMORY_MODEL.md.
The deterministic extractor is always available and inspects user messages only. It recognizes supported current facts, oblique employment like my paycheck comes from Anthropic, conservative badge employment like my Figma badge says design systems lead, employment paraphrases such as I work for Stripe, I left Stripe for Notion, and I'm now at Linear, legal employer-name corrections, role/title phrases such as my current role is... or promoted to..., location phrases like I'm currently based in Raleigh, I'm in Porto these days, my current city is Lisbon, and home base is Prague, replacement location negations such as I don't live in Lisbon anymore; I live in Porto now, explicit correction clauses inside a message, contextual cross-turn corrections such as I meant Berlin, not Munich, diet-state negations, cleared allergies, household allergy facts, matcha/caffeine preference changes, hobbies, family roles, implicit pet ownership from vet/leash/litter-box evidence, multi-pet phrasing, episodic context such as React debugging, interview prep, relocation reasons, and marathon goals/results, non-Latin current locations such as 東京, scoped programming opinion arcs such as React/Svelte or TypeScript/Python preferences, answer-style requests for concise bullets/no fluff or terse practical replies, and low-signal/numeric-only traps. Draft/email, translation, summarization, grammar/example, sample-copy, JSON/test-fixture payload, and fictional screenplay/script/dialogue frames are treated as non-durable even when they contain first-person text. Optional OpenAI extraction runs after deterministic extraction when configured, but candidates are validated against the same low-signal source-message guard and incomplete current-employment candidates must include or recover an employer before they can become memories. OpenAI failures degrade to deterministic extraction.
Details and examples are in docs/EXTRACTION.md.
Recall uses a candidate-first structured retrieval pipeline: bounded memory FTS, evidence/message FTS mapped through memory_evidence, sqlite-vec KNN with scope/model/dimension/active metadata filters applied before top-k selection, direct key/entity SQL lookups, and profile/history anchors are unioned before batch-loading memory rows and evidence for scoring. Ranking still prioritizes key/entity intent, active-state semantics, recency, same-session hints, one-hop associations, RRF-lite over independent lexical/evidence/vector/direct lists, and optional vector similarity. Tight profile summaries reserve tiny budgets for core stable facts such as current employment, role/title, and location before lower-priority pet/style/family facts. Query-key inference also covers household allergies, hobbies, caffeine/drink preference, family roles, relocation reasons, debugging, planning, and marathon context. Recall citations are deduplicated by source turn and snippet while keeping the highest score. Search returns ranked memory results first, then raw message FTS fallback when useful, while digit-only, meta-task/sample, and fixture/payload raw messages are treated as low-signal fallback material.
See docs/RECALL_AND_SEARCH.md.
OpenAI integration is optional. Extraction uses Chat Completions with strict JSON Schema response format and sends only user-authored turn messages to the model. Memory embeddings are batched once per ingested turn, use text-embedding-3-small by default, and are persisted through sqlite-vec only for 1536-dimensional vectors. Vec rows carry the memory active flag and embedding model/dimension metadata used by vector retrieval filters. Structured memories and rendered memory text remain canonical; embedding rows are rebuildable derived indexes. OpenAI HTTP retries are limited to transient failures and use bounded backoff. Without an API key, OpenAI extraction and embeddings are disabled even if their enable flags are true.
See docs/EXTRACTION.md and docs/OPERATIONS.md.
SQLite is opened with foreign keys, WAL mode, normal synchronous mode, and a 5000 ms busy timeout. The repository uses parameterized queries. FTS rows are maintained manually by repository methods, with repository rebuild helpers for memory_fts, message_fts, or both if those derived rows become stale. sqlite-vec is registered on startup and used for optional vector retrieval with eligibility metadata stored in the vec0 table.
See docs/OPERATIONS.md.
CI runs:
cargo fmt -- --check
cargo test --locked -j 1 fixture_recall_self_eval -- --nocapture
cargo test --locked -j 1 weakness_recall_eval -- --nocapture
cargo test --locked -j 1
cargo clippy --locked -j 1 --all-targets --all-features -- -D warningsOptional operational checks are also documented in docs/TESTING.md: the Bash Docker restart persistence script and the committed Python HTTP black-box evaluator.
fixtures/eval_scenarios.json defines the golden recall self-evaluation. Current fixture thresholds require full fact recall, full noise rejection, full citation recall, and 23 passing scenarios. fixtures/weakness_scenarios.json adds 21 adversarial scenarios with 36 recall probes and 5 search checks covering meta-task quoted facts, clause-boundary cleanup, non-Latin locations, contradiction chains, third-party pollution, tight budgets, citation provenance, and recall/search shape. HTTP regression tests also cover oblique and badge employment, role/title extraction, household allergy scoping, matcha/caffeine preferences, hobbies, episodic context, compatibility aliases, replacement location negation, cross-turn contextual corrections, litter-box cat inference, citation deduplication, scratch that corrections, JSON/test-fixture pollution, pet-owner location recall, single-message content limits, and bare coordinated live/work clauses. The fixture runners write reports under target/fixture-eval/ and target/weakness-eval/.
For a service-level check that does not import Rust test code, run scripts/blackbox_http_eval.py against a live server:
docker compose up -d --build
mkdir -p target/blackbox-http
python scripts/blackbox_http_eval.py --base-url http://127.0.0.1:8080 --json-output target/blackbox-http/latest.json- Optional bearer auth protects all memory endpoints when
MEMORY_AUTH_TOKENis set. - User and session scopes are enforced in repository queries.
user_id: nullnormalizes toanon:session:<session_id>.- Reusing a
session_idfor a different user returns409 Conflict. - OpenAI API keys are read from environment variables and redacted in debug formatting.
- The system filters common memory-pollution cases, but it is not a complete security boundary for hostile natural-language input.
See docs/SECURITY.md.
- There are no background workers. Ingestion performs extraction, an optional batched memory-embedding call, SQLite writes, FTS updates, and vector writes in the request path.
- Docker runs as a non-root
memoryuser and declares/dataas a volume. - Vector storage currently supports only 1536 dimensions at runtime. Dimension/model changes require an explicit new-table, backfill, dual-read/switch, and cleanup migration path; see
docs/MEMORY_MODEL.md.
| Scenario | Behavior |
|---|---|
| Cold session or no matching memory | /recall returns {"context":"","citations":[]}. /search returns {"results":[]} when nothing is relevant or the request has no effective user_id or session_id scope. |
| Missing OpenAI API key | Deterministic extraction still runs. OpenAI extraction and embeddings are disabled, so structured rule memories, FTS, direct recall, and lexical search still work. |
| OpenAI timeout, rate limit, refusal, or malformed response | Extraction failures are logged and fall back to deterministic extraction. Embedding failures are logged; the turn and structured memories can still persist without vector rows, so /turns can return 201 if local validation and the SQLite transaction succeed. |
| Unsupported embedding dimensions or incompatible vector store | Embeddings and vector candidates are skipped or refused. Structured memories plus lexical and direct retrieval still work; vector recall quality may degrade. |
| Malformed JSON, missing fields, invalid timestamp, empty messages, oversized body, or oversized message content | Invalid client input returns a 4xx response; oversized JSON and message content over 65,536 characters return 413 Payload Too Large. Unicode content is valid JSON input and is persisted like other message text. |
| Auth token absent or invalid | If MEMORY_AUTH_TOKEN is unset, memory endpoints are open. If it is set, missing or invalid bearer tokens return 401; GET /health remains public. |
| Cross-user session reuse | Reusing a session_id with a different normalized user_id returns 409 Conflict, preventing session data bleed. |
| SQLite busy, slow disk, or storage error | SQLite uses WAL mode, foreign keys, and a 5000 ms busy timeout. If storage still fails, clients receive a generic 500 internal error; disk corruption is not automatically repaired. |
| Derived indexes out of sync | FTS and vector rows are derived from canonical tables. Normal repository writes keep them in sync; FTS rebuild helpers can repair stale derived FTS rows. |
| Container restart | Data persists when the Docker named volume mounted at /data is kept. Deleting the volume deletes stored memory. |
| Synchronous ingestion | There are no background workers. After POST /turns returns 201, the turn, FTS rows, and extracted memories written by that transaction are immediately queryable. If the final SQLite transaction fails, no queryability of extracted memories is claimed. |
| Token budget pressure | /recall uses approximate token counting, and max_tokens must be 1..=4096. Tight profile summaries prioritize stable/current facts such as employment and location before lower-priority context. |
- Rule-based extraction is deterministic but incomplete for phrasing outside implemented patterns.
- Correction and negation handling is deliberately phrase-sensitive; unsupported wording should be added as narrow regression-backed rules.
- OpenAI live behavior is optional and not exercised by default tests; mocked tests cover integration behavior.
- FTS consistency depends on repository methods because there are no FTS triggers; rebuild helpers provide the repair path.
- Recall ranking is heuristic, not learned; candidate caps are explicit and conservative, and RRF-lite is rank-level fusion with an internal future-reranker boundary.
- OpenAPI is not currently implemented.
src/
config.rs Runtime configuration
http/ Axum routes, handlers, DTOs
service/ Ingest, memory write, recall, search orchestration
extraction/ Deterministic rule extractors
openai.rs Optional OpenAI extraction and embeddings
storage/ SQLite DB, migrations, repository, models
tests/ Unit and integration tests
fixtures/ Recall self-evaluation scenarios
scripts/ Operational verification scripts
docs/ Architecture, API, model, operations, testing, security docs
- Architecture
- API Reference
- Memory Model
- Extraction
- Recall And Search
- Operations
- Testing
- Security
- Original Architecture Decision
See CONTRIBUTING.md for local setup, quality gates, fixture guidance, and pull request expectations.
For questions or review issues, contact Mukhammedali at mukhammedali@berektassuly.com.
Licensed under the MIT License.