Skip to content

Berektassuly/memory-service

Repository files navigation

memory-service

Open-source Rust/Axum backend service for storing and retrieving long-term user memory from conversations. The service persists raw turns and messages, extracts structured memories, tracks supersession, and serves recall/search results with evidence while filtering common memory-pollution cases such as third-party facts, hypotheticals, quoted or sample text, unsupported OpenAI candidates, and stale current-state facts.

Overview

memory-service is a single Rust 2021 binary built with Axum, Tokio, and SQLite. It exposes HTTP endpoints for ingesting conversation turns, recalling user memory, searching stored knowledge, listing memories, and deleting session or user data.

The default path is deterministic and local: rule-based extraction, SQLite persistence, FTS5 lexical search, RRF-lite rank fusion, and structured ranking. OpenAI extraction and embeddings are optional overlays when an API key is configured.

Key Capabilities

  • Ingests multi-message turns with user, assistant, and tool messages.
  • Stores raw turns/messages as provenance and structured memories as queryable state.
  • Extracts current employment, role/title, legal employer names, current location, pets, household/family facts, hobbies, caffeine preferences, scheduled or recent events, dietary restrictions, allergies, answer style, favorites, and selected programming preferences.
  • Tracks mutable fact evolution through active/inactive records, evidence, and audit operations.
  • Handles explicit same-message corrections, contextual cross-turn corrections, and current-state negations for mutable facts such as employment, role/title, location, diet, caffeine, and allergies.
  • Rejects many low-signal inputs: examples, quoted sample facts, draft/translation/summarization traps, JSON/test-fixture payload traps, fictional framing, future wishes, hypotheticals, generic corrections, pure negations, third-party claims, and digit-only durable values.
  • Uses SQLite WAL mode, foreign keys, FTS5 tables, and optional sqlite-vec vector search.
  • Supports optional bearer authentication for memory endpoints.
  • Includes fixture-driven recall evaluation, adversarial weakness evaluation, an HTTP black-box script, and Docker restart persistence verification.

Architecture At A Glance

flowchart LR
    Client["HTTP client"] --> Router["Axum router"]
    Router --> Handlers["HTTP handlers and DTO validation"]
    Handlers --> Services["Turn, recall, search, memory services"]
    Services --> Extractors["Deterministic rules plus optional OpenAI extraction"]
    Services --> Repo["Repository layer"]
    Repo --> SQLite["SQLite database"]
    SQLite --> FTS["FTS5 tables"]
    SQLite --> Vec["sqlite-vec table"]
    Services --> OpenAI["Optional OpenAI APIs"]
Loading

See docs/ARCHITECTURE.md for the detailed component model.

Quick Start

Prerequisites

  • Rust stable toolchain with Cargo.
  • Docker and Docker Compose for containerized runs.
  • Optional: OPENAI_API_KEY for OpenAI Structured Outputs extraction and embeddings.

Clone And Test

git clone https://github.com/Berektassuly/memory-service.git
cd memory-service
cargo test --locked -j 1

Run Locally

Set a writable local database path before running. The compiled default path is /data/memory.sqlite3, which is primarily intended for Docker.

MEMORY_HOST=127.0.0.1 MEMORY_PORT=8080 MEMORY_DB_PATH="$PWD/memory.sqlite3" cargo run --locked

Verify health:

curl -sS http://127.0.0.1:8080/health

The health endpoint is public and checks SQLite readiness with a lightweight query.

Ingest a minimal turn:

curl -sS -X POST http://127.0.0.1:8080/turns \
  -H "Content-Type: application/json" \
  -d '{
    "session_id": "session-1",
    "user_id": "user-1",
    "messages": [
      { "role": "user", "content": "I work at Notion as a product manager" }
    ],
    "timestamp": "2026-05-12T10:00:00Z",
    "metadata": { "source": "quick-start" }
  }'

Recall the stored memory:

curl -sS -X POST http://127.0.0.1:8080/recall \
  -H "Content-Type: application/json" \
  -d '{
    "query": "Where does the user work now?",
    "session_id": "session-1",
    "user_id": "user-1",
    "max_tokens": 1024
  }'

Run With Docker Compose

docker compose up -d --build
curl -sS http://127.0.0.1:8080/health

Compose stores SQLite data in the named memory-data volume mounted at /data. Stop without deleting persisted data:

docker compose down

Windows Notes

For local PowerShell runs, set environment variables before cargo run:

$env:MEMORY_HOST = "127.0.0.1"
$env:MEMORY_PORT = "8080"
$env:MEMORY_DB_PATH = "$PWD\memory.sqlite3"
cargo run --locked

Configuration

Variable Default Source Description
MEMORY_HOST 0.0.0.0 src/config.rs, Docker Bind host.
MEMORY_PORT 8080 src/config.rs, Docker Bind port.
MEMORY_DB_PATH /data/memory.sqlite3 src/config.rs, Docker SQLite database file path.
MEMORY_AUTH_TOKEN unset .env.example, src/config.rs, Docker Optional bearer token for memory endpoints.
OPENAI_API_KEY unset .env.example, src/config.rs Enables configured OpenAI features when present.
ENABLE_OPENAI_EXTRACTION true .env.example, src/config.rs Enables OpenAI extraction only when an API key is present.
ENABLE_OPENAI_EMBEDDINGS true .env.example, src/config.rs Enables embeddings only when an API key is present and dimensions are 1536.
OPENAI_EXTRACTION_MODEL gpt-4o-mini .env.example, src/config.rs Chat Completions model used for structured extraction.
OPENAI_EMBEDDING_MODEL text-embedding-3-small .env.example, src/config.rs Embeddings model.
OPENAI_EMBEDDING_DIMENSIONS 1536 .env.example, src/config.rs Vector dimensions. Other values disable embeddings in current code.
OPENAI_REQUEST_TIMEOUT_MS 5000 .env.example, src/config.rs OpenAI request timeout.
OPENAI_MAX_RETRIES 1 .env.example, src/config.rs OpenAI retry count after the first attempt.
RUST_LOG info,memory_service=debug src/telemetry.rs Tracing filter.

MEMORY_HOST_PORT is a Docker Compose host-port override, not an application setting.

API Overview

Method Path Purpose Auth
GET /health SQLite readiness check. Public
POST /turns Ingest a conversation turn. Optional bearer
POST /recall Build recall context with citations. Optional bearer
POST /search Return ranked memory/message results. Optional bearer
GET /users/{user_id}/memories List active and inactive memories for a user. Optional bearer
DELETE /sessions/{session_id} Delete a session and related data. Optional bearer
DELETE /users/{user_id} Delete all data for a user. Optional bearer

If MEMORY_AUTH_TOKEN is unset, memory endpoints are open. If it is set, send Authorization: Bearer <token>. GET /health is always public.

JSON request bodies are capped at 1 MiB, and each message content field is capped at 65,536 characters. Oversized bodies or oversized individual messages return 413 Payload Too Large; malformed JSON and missing required fields return Axum or application 4xx errors.

GET /users/{user_id}/memories returns canonical memory rows plus compatibility aliases for common legacy-style keys when they can be derived from canonical records: employer, lives_in, has_pet, diet, and allergic_to. Alias IDs are prefixed with alias: and alias supersession pointers mirror the canonical lifecycle.

/recall.max_tokens must be between 1 and 4096; the examples use 1024. Recall context packing uses approximate token counting and may include a small internal packing margin.

Full contract details are in docs/API.md. An OpenAPI specification is not currently implemented.

Example Usage

Ingest A Turn

curl -sS -X POST http://127.0.0.1:8080/turns \
  -H "Content-Type: application/json" \
  -d '{"session_id":"session-1","user_id":"user-1","messages":[{"role":"user","content":"I work at Notion as a product manager"}],"timestamp":"2026-05-12T10:00:00Z","metadata":{"source":"manual"}}'

Response:

{
  "id": "generated-turn-id"
}

Recall User Memory

curl -sS -X POST http://127.0.0.1:8080/recall \
  -H "Content-Type: application/json" \
  -d '{"query":"Where does the user work now?","session_id":"session-1","user_id":"user-1","max_tokens":1024}'

Response shape:

{
  "context": "## Known facts about this user\n- Current employment: Works at Notion as a product manager.",
  "citations": [
    {
      "turn_id": "generated-turn-id",
      "score": 17.0,
      "snippet": "I work at Notion as a product manager"
    }
  ]
}

Search

curl -sS -X POST http://127.0.0.1:8080/search \
  -H "Content-Type: application/json" \
  -d '{"query":"Notion product manager","session_id":"session-1","user_id":"user-1","limit":10}'

Memory Model

The service separates raw episodic data from structured semantic state:

  • sessions, turns, and messages preserve conversation provenance.
  • memories stores structured facts, preferences, opinions, and events.
  • memory_evidence links memories to source messages and quotes.
  • memory_ops records add, supersede, and noop operations.
  • Mutable keys such as employment.current and location.current keep only the latest value active while preserving superseded rows.
  • Cleared allergies use allergy_cleared.<substance> instead of active allergy.<substance> so current recall does not imply the allergy is still present.
  • User memory listing may include derived compatibility aliases alongside canonical keys; canonical rows remain the storage source of truth.

See docs/MEMORY_MODEL.md.

Extraction Pipeline

The deterministic extractor is always available and inspects user messages only. It recognizes supported current facts, oblique employment like my paycheck comes from Anthropic, conservative badge employment like my Figma badge says design systems lead, employment paraphrases such as I work for Stripe, I left Stripe for Notion, and I'm now at Linear, legal employer-name corrections, role/title phrases such as my current role is... or promoted to..., location phrases like I'm currently based in Raleigh, I'm in Porto these days, my current city is Lisbon, and home base is Prague, replacement location negations such as I don't live in Lisbon anymore; I live in Porto now, explicit correction clauses inside a message, contextual cross-turn corrections such as I meant Berlin, not Munich, diet-state negations, cleared allergies, household allergy facts, matcha/caffeine preference changes, hobbies, family roles, implicit pet ownership from vet/leash/litter-box evidence, multi-pet phrasing, episodic context such as React debugging, interview prep, relocation reasons, and marathon goals/results, non-Latin current locations such as 東京, scoped programming opinion arcs such as React/Svelte or TypeScript/Python preferences, answer-style requests for concise bullets/no fluff or terse practical replies, and low-signal/numeric-only traps. Draft/email, translation, summarization, grammar/example, sample-copy, JSON/test-fixture payload, and fictional screenplay/script/dialogue frames are treated as non-durable even when they contain first-person text. Optional OpenAI extraction runs after deterministic extraction when configured, but candidates are validated against the same low-signal source-message guard and incomplete current-employment candidates must include or recover an employer before they can become memories. OpenAI failures degrade to deterministic extraction.

Details and examples are in docs/EXTRACTION.md.

Recall And Search Pipeline

Recall uses a candidate-first structured retrieval pipeline: bounded memory FTS, evidence/message FTS mapped through memory_evidence, sqlite-vec KNN with scope/model/dimension/active metadata filters applied before top-k selection, direct key/entity SQL lookups, and profile/history anchors are unioned before batch-loading memory rows and evidence for scoring. Ranking still prioritizes key/entity intent, active-state semantics, recency, same-session hints, one-hop associations, RRF-lite over independent lexical/evidence/vector/direct lists, and optional vector similarity. Tight profile summaries reserve tiny budgets for core stable facts such as current employment, role/title, and location before lower-priority pet/style/family facts. Query-key inference also covers household allergies, hobbies, caffeine/drink preference, family roles, relocation reasons, debugging, planning, and marathon context. Recall citations are deduplicated by source turn and snippet while keeping the highest score. Search returns ranked memory results first, then raw message FTS fallback when useful, while digit-only, meta-task/sample, and fixture/payload raw messages are treated as low-signal fallback material.

See docs/RECALL_AND_SEARCH.md.

OpenAI Integration

OpenAI integration is optional. Extraction uses Chat Completions with strict JSON Schema response format and sends only user-authored turn messages to the model. Memory embeddings are batched once per ingested turn, use text-embedding-3-small by default, and are persisted through sqlite-vec only for 1536-dimensional vectors. Vec rows carry the memory active flag and embedding model/dimension metadata used by vector retrieval filters. Structured memories and rendered memory text remain canonical; embedding rows are rebuildable derived indexes. OpenAI HTTP retries are limited to transient failures and use bounded backoff. Without an API key, OpenAI extraction and embeddings are disabled even if their enable flags are true.

See docs/EXTRACTION.md and docs/OPERATIONS.md.

Persistence

SQLite is opened with foreign keys, WAL mode, normal synchronous mode, and a 5000 ms busy timeout. The repository uses parameterized queries. FTS rows are maintained manually by repository methods, with repository rebuild helpers for memory_fts, message_fts, or both if those derived rows become stale. sqlite-vec is registered on startup and used for optional vector retrieval with eligibility metadata stored in the vec0 table.

See docs/OPERATIONS.md.

Testing

CI runs:

cargo fmt -- --check
cargo test --locked -j 1 fixture_recall_self_eval -- --nocapture
cargo test --locked -j 1 weakness_recall_eval -- --nocapture
cargo test --locked -j 1
cargo clippy --locked -j 1 --all-targets --all-features -- -D warnings

Optional operational checks are also documented in docs/TESTING.md: the Bash Docker restart persistence script and the committed Python HTTP black-box evaluator.

Evaluation Fixtures

fixtures/eval_scenarios.json defines the golden recall self-evaluation. Current fixture thresholds require full fact recall, full noise rejection, full citation recall, and 23 passing scenarios. fixtures/weakness_scenarios.json adds 21 adversarial scenarios with 36 recall probes and 5 search checks covering meta-task quoted facts, clause-boundary cleanup, non-Latin locations, contradiction chains, third-party pollution, tight budgets, citation provenance, and recall/search shape. HTTP regression tests also cover oblique and badge employment, role/title extraction, household allergy scoping, matcha/caffeine preferences, hobbies, episodic context, compatibility aliases, replacement location negation, cross-turn contextual corrections, litter-box cat inference, citation deduplication, scratch that corrections, JSON/test-fixture pollution, pet-owner location recall, single-message content limits, and bare coordinated live/work clauses. The fixture runners write reports under target/fixture-eval/ and target/weakness-eval/.

For a service-level check that does not import Rust test code, run scripts/blackbox_http_eval.py against a live server:

docker compose up -d --build
mkdir -p target/blackbox-http
python scripts/blackbox_http_eval.py --base-url http://127.0.0.1:8080 --json-output target/blackbox-http/latest.json

Security And Isolation

  • Optional bearer auth protects all memory endpoints when MEMORY_AUTH_TOKEN is set.
  • User and session scopes are enforced in repository queries.
  • user_id: null normalizes to anon:session:<session_id>.
  • Reusing a session_id for a different user returns 409 Conflict.
  • OpenAI API keys are read from environment variables and redacted in debug formatting.
  • The system filters common memory-pollution cases, but it is not a complete security boundary for hostile natural-language input.

See docs/SECURITY.md.

Operational Notes

  • There are no background workers. Ingestion performs extraction, an optional batched memory-embedding call, SQLite writes, FTS updates, and vector writes in the request path.
  • Docker runs as a non-root memory user and declares /data as a volume.
  • Vector storage currently supports only 1536 dimensions at runtime. Dimension/model changes require an explicit new-table, backfill, dual-read/switch, and cleanup migration path; see docs/MEMORY_MODEL.md.

Failure Modes

Scenario Behavior
Cold session or no matching memory /recall returns {"context":"","citations":[]}. /search returns {"results":[]} when nothing is relevant or the request has no effective user_id or session_id scope.
Missing OpenAI API key Deterministic extraction still runs. OpenAI extraction and embeddings are disabled, so structured rule memories, FTS, direct recall, and lexical search still work.
OpenAI timeout, rate limit, refusal, or malformed response Extraction failures are logged and fall back to deterministic extraction. Embedding failures are logged; the turn and structured memories can still persist without vector rows, so /turns can return 201 if local validation and the SQLite transaction succeed.
Unsupported embedding dimensions or incompatible vector store Embeddings and vector candidates are skipped or refused. Structured memories plus lexical and direct retrieval still work; vector recall quality may degrade.
Malformed JSON, missing fields, invalid timestamp, empty messages, oversized body, or oversized message content Invalid client input returns a 4xx response; oversized JSON and message content over 65,536 characters return 413 Payload Too Large. Unicode content is valid JSON input and is persisted like other message text.
Auth token absent or invalid If MEMORY_AUTH_TOKEN is unset, memory endpoints are open. If it is set, missing or invalid bearer tokens return 401; GET /health remains public.
Cross-user session reuse Reusing a session_id with a different normalized user_id returns 409 Conflict, preventing session data bleed.
SQLite busy, slow disk, or storage error SQLite uses WAL mode, foreign keys, and a 5000 ms busy timeout. If storage still fails, clients receive a generic 500 internal error; disk corruption is not automatically repaired.
Derived indexes out of sync FTS and vector rows are derived from canonical tables. Normal repository writes keep them in sync; FTS rebuild helpers can repair stale derived FTS rows.
Container restart Data persists when the Docker named volume mounted at /data is kept. Deleting the volume deletes stored memory.
Synchronous ingestion There are no background workers. After POST /turns returns 201, the turn, FTS rows, and extracted memories written by that transaction are immediately queryable. If the final SQLite transaction fails, no queryability of extracted memories is claimed.
Token budget pressure /recall uses approximate token counting, and max_tokens must be 1..=4096. Tight profile summaries prioritize stable/current facts such as employment and location before lower-priority context.

Limitations

  • Rule-based extraction is deterministic but incomplete for phrasing outside implemented patterns.
  • Correction and negation handling is deliberately phrase-sensitive; unsupported wording should be added as narrow regression-backed rules.
  • OpenAI live behavior is optional and not exercised by default tests; mocked tests cover integration behavior.
  • FTS consistency depends on repository methods because there are no FTS triggers; rebuild helpers provide the repair path.
  • Recall ranking is heuristic, not learned; candidate caps are explicit and conservative, and RRF-lite is rank-level fusion with an internal future-reranker boundary.
  • OpenAPI is not currently implemented.

Repository Layout

src/
  config.rs              Runtime configuration
  http/                  Axum routes, handlers, DTOs
  service/               Ingest, memory write, recall, search orchestration
  extraction/            Deterministic rule extractors
  openai.rs              Optional OpenAI extraction and embeddings
  storage/               SQLite DB, migrations, repository, models
tests/                   Unit and integration tests
fixtures/                Recall self-evaluation scenarios
scripts/                 Operational verification scripts
docs/                    Architecture, API, model, operations, testing, security docs

Further Documentation

Contributing

See CONTRIBUTING.md for local setup, quality gates, fixture guidance, and pull request expectations.

Contact

For questions or review issues, contact Mukhammedali at mukhammedali@berektassuly.com.

License

Licensed under the MIT License.

About

Rust/Axum backend for conversational long-term memory, semantic recall, and structured memory extraction

Topics

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Contributors

Languages