Project Recall is a contextual memory and re-engagement prototype for an AI emotional support companion. It ingests structured session summaries, extracts memory cards, retrieves relevant memories, judges their relevance, applies safety policies, and generates warm, memory-aware LLM responses.
⚠️ Prototype status: This is a technical assessment project, not a production clinical system.
The problem: returning users feel the AI does not remember prior sessions. Generic reminders are weak. The goal is to remember naturally, safely, and selectively — without stuffing all history into the prompt.
Project Recall solves this by:
- Ingesting structured session JSON (not raw transcripts)
- Extracting rich memory cards with emotion metadata
- Using vector DB retrieval to find candidate memories
- Applying an LLM relevance judge to confirm topical match
- Applying response policy rules per emotion and memory type
- Injecting only approved memories into a turn-local prompt
- Supporting direct canonical lookup for exact recall questions
- Generating re-engagement notifications based on unresolved topics
| Problem | Why It Happens | Our Solution |
|---|---|---|
| AI forgets past sessions | No persistent memory | Structured memory cards + vector DB |
| Generic "check-ins" | No emotional context | Emotion-aware retrieval + policy |
| Wrong memory injected | Semantic-only retrieval | LLM judge + topic-aware selection |
| Unsafe memory leaks | No safety controls | Sensitivity scoring + YAML policy |
| Prompt bloat | Full history stuffed | Turn-local injection of top-1 memory |
Session JSON
↓
Memory Extraction → Memory Cards (with emotion + safety metadata)
↓
Vector DB / Hybrid Retrieval → Top candidates
↓
LLM Memory Relevance Judge → Approved / Rejected
↓
Response Policy YAML → Detail level, tone, safety
↓
Turn-Local Prompt Injection
↓
LLM Response
Direct recall path:
"What was my grounding phrase?"
↓
Canonical lookup (exact match)
↓
Answer with exact value
Re-engagement path:
Unresolved memories + user state
↓
Rule engine
↓
Safe notification preview (vague, no exact values)
The system expects session history in this compact JSON format:
{
"user_id": "demo_user",
"session_id": "sess_001",
"timestamp": "2026-05-19T09:00:00Z",
"theme": "work stress and burnout",
"emotional_tone": ["anxious", "overwhelmed", "hopeful"],
"key_moments": [
"User reported panic before meetings",
"Discussed grounding exercise",
"User committed to sleep routine"
],
"summary": "User described increasing stress...",
"risk_flags": [],
"follow_up_topics": [
"sleep hygiene",
"manager conflict"
]
}Field mapping:
| Field | Becomes |
|---|---|
theme |
topic tags for retrieval |
emotional_tone |
emotion metadata (primary, secondary, intensity) |
key_moments |
memory cards (exact_value, summary, type) |
summary |
session_summary memory card |
follow_up_topics |
follow_up_intent memory cards |
risk_flags |
sensitivity + safety controls |
| Term | Definition |
|---|---|
| Memory card | A structured object with summary, emotion, metadata, and optional exact value |
| Session summary card | A memory card capturing the overall session theme |
| Key moment card | A memory card from a specific moment in the session |
| Follow-up card | A memory card tracking unresolved topics for re-engagement |
| Exact canonical memory | A memory the user explicitly asked to remember, with is_canonical=true |
| Vector DB | ChromaDB storing memory embeddings for semantic search |
| LLM relevance judge | An LLM that validates whether retrieved memories match the user's query |
| Response policy YAML | config/response_policy.yaml — controls behavior per emotion |
| Turn-local prompt injection | Approved memories injected near the current user message, not buried in system prompt |
| Re-engagement | Rule-based notification generation for inactive users |
# Create virtual environment
python -m venv .venv
# Windows:
.venv\Scripts\activate
# macOS/Linux:
source .venv/bin/activate
# Install dependencies
pip install -r requirements.txtConfigure .env:
cp .env.example .env| Variable | Default | Purpose |
|---|---|---|
LLM_PROVIDER |
gemini |
ollama or gemini |
OLLAMA_HOST |
http://localhost:11434 |
Local Ollama server |
OLLAMA_MODEL |
llama3.1 |
Ollama model name |
GEMINI_API_KEY |
— | Gemini API key |
GEMINI_MODEL |
gemini-3.1-flash-lite-preview |
Gemini model name |
# 1. Generate sample sessions
python generate_project_recall_sessions.py
# 2. Validate schema
python app/validate_project_recall_sessions.py
# 3. Extract memories
python app/memory_extractor.py
# 4. Build vector index
python app/build_memory_index.py
# 5. Verify end-to-end
python app/validate_active_project_recall_pipeline.pyuvicorn app.main:app --reloadOpen http://localhost:8000 for the chat UI.
Main endpoints:
| Endpoint | Description |
|---|---|
POST /chat |
Main chat endpoint (with memory support) |
GET /new-session/{user_id}?memory=true |
Start new session |
GET /debug/memories/{user_id} |
List extracted memories |
GET /debug/retrieve-memory/{user_id}?q=... |
Test vector retrieval |
GET /debug/reengagement/{user_id} |
Preview re-engagement decision |
# Schema + ingestion
python app/project_recall_ingestion_test.py
# Pipeline validation
python app/validate_active_project_recall_pipeline.py
# Policy adherence (requires LLM)
python app/policy_adherence_test.py --provider ollama --model llama3.1
# Re-engagement
python app/reengagement_test.py
# Reliability guardrails
python app/reliability_guardrail_test.py- Generate sessions → Creates 11 varied sessions
- Extract memories → 66 structured memory cards
- Build vector DB → Embedded + indexed
- Exact recall: "What was my grounding phrase?" → "Quiet room, soft blanket, slow breath."
- Episode recall: "Do you remember that family dinner?" → Contextual summary
- Emotional query: "I've been feeling anxious" → Validates + offers related memory
- Edit YAML: Change
anxiety→ask_direct_question: true→ Responses become more direct - Re-engagement preview:
/debug/reengagement/demo_user→ Safe notification copy
- Prototype, not production — no auth, no clinical escalation
- LLM judge can fail — handled by confidence gate + no-memory fallback
- Extraction depends on summary quality — better
key_moments= better memories - No full cross-session graph reasoning — memories are isolated cards
- No temporal trend analysis — basic timestamp tracking only
- No user-facing memory edit/delete UI — memories are read-only
- No production database — local JSON + ChromaDB files
- LLM extraction with stronger evaluation metrics
- Hybrid dense + sparse retrieval (BM25 + vector + reranker)
- Memory edit/delete UI for users
- Temporal/session graph for cross-session reasoning
- Better policy adherence monitoring + A/B testing
- Clinical review workflow
- Analytics dashboard for memory quality
Detailed documentation is in ./docs/:
| Doc | Topic |
|---|---|
docs/01-system-overview.md |
Architecture and modules |
docs/02-setup-and-running.md |
Full setup guide |
docs/03-data-schema-and-memory-cards.md |
Schema and card types |
docs/04-memory-extraction.md |
Extraction pipeline |
docs/05-vector-db-and-retrieval.md |
Retrieval system |
docs/06-llm-relevance-judge.md |
Relevance judge |
docs/07-response-policy-yaml.md |
YAML policy reference |
docs/08-prompt-injection.md |
Turn-local injection |
docs/09-direct-memory-lookup.md |
Exact recall |
docs/10-reengagement-logic.md |
Notifications |
docs/11-testing-and-reports.md |
Test suite |
This is a technical assessment project. Not for production use.
This project was developed with assistance from Kimi-k2.6 (Moonshot AI), an AI coding assistant that helped with architecture design, implementation, testing, and documentation.