A comprehensive, production-ready guide to implementing memory systems in AI agents. From fundamental concepts to multi-tenant security, these notebooks provide everything you need to build intelligent, context-aware agents.
Memory is what transforms a simple chatbot into an intelligent assistant. This repository contains three progressive notebooks that take you from basic concepts to production-grade, secure memory implementations.
Perfect for:
- AI/ML Engineers building agentic systems
- Software Engineers implementing conversational AI
- Product teams designing multi-user AI applications
- Anyone interested in production-grade agent architecture
- β Short-term vs long-term memory architecture
- β Session state management
- β Cross-session memory persistence
- β Memory search and retrieval patterns
- β Production deployment strategies
- β Multi-tenant security and isolation
- β GDPR/privacy compliance patterns
- β Security testing and validation
1οΈβ£ Memory & Session Management
Foundation: Understanding Memory Fundamentals
Learn the core concepts of agent memory through practical, runnable examples.
Topics Covered:
-
Session State (Short-Term Memory)
- Storing temporary data within a conversation
- Using
session.statefor working memory - Tools:
save_preference(),get_preference()
-
Memory Service (Long-Term Memory)
- Persisting information across sessions
- InMemoryMemoryService for development
- Multi-session conversation examples
-
Memory Tools
load_memorytool for agent-controlled recall- Automatic memory search and retrieval
- When to use which memory type
-
Conversation History
- Automatic context maintenance
- Multi-turn conversations
- Context accumulation patterns
Real-World Examples:
- β¨ User preference storage and retrieval
- β¨ Multi-session project tracking
- β¨ Birthday party planning with context accumulation (8-year-old, unicorns, purple theme)
- β¨ Contact management system
Key Takeaway: Master the difference between session state and memory service, and learn when to use each.
Production: Testing, Validation & Deployment
Comprehensive testing patterns and production-ready memory implementations.
Topics Covered:
-
Memory Service Comparison
InMemoryMemoryService vs Production Memory Bank ββ Storage: Full text vs Extracted facts ββ Search: Keyword vs Semantic ββ Persistence: RAM vs Cloud Storage ββ Best for: Development vs Production -
Testing Patterns
- Memory recall accuracy testing
- Multi-session memory accumulation
- Search quality validation
- Memory consolidation testing
-
Automatic Persistence
after_agent_callbackfor auto-save- Session-to-memory pipeline
- Error handling and recovery
-
Production Memory Bank
- VertexAI Memory Bank setup (production-ready)
- Semantic search with embeddings
- Async memory generation
- Memory consolidation and deduplication
Real-World Testing:
- β Test 1: InMemory baseline (food preferences)
- β Test 2: Memory recall with load_memory tool
- β Test 3: Multi-session accumulation (work, hobbies, language learning)
- β Test 4: Automatic persistence with callbacks
- β Test 5: Memory search accuracy
Key Takeaway: Learn production deployment patterns and comprehensive testing strategies.
Security: Multi-User, Multi-Tenant Protection
Critical security patterns for production multi-user applications.
Topics Covered:
-
Understanding Memory Scope
Memory Isolation = app_name + user_id Memories are ONLY retrieved when BOTH match: β Same app + Same user = Access granted β Same app + Different user = Blocked β Different app + Same user = Blocked β Different app + Different user = Blocked -
User-Level Isolation
- Preventing cross-user data leakage
- Healthcare example: Alice vs Bob allergies
- Search isolation validation
-
App-Level Isolation
- Multi-tenant SaaS security
- Banking vs Health app separation
- Same user, different apps = complete isolation
-
Multi-Tenant Security Matrix
- Company A vs Company B isolation
- User 1 vs User 2 within each company
- 4-way isolation testing
-
Security Attack Simulation
- Password/credit card search attempts (blocked)
- Wildcard search attacks (blocked)
- User impersonation attempts (blocked)
- SQL injection-style queries (blocked)
Real-World Security Tests:
- π Test 1: User-level isolation (healthcare data)
- π Test 2: App-level isolation (banking vs health)
- π Test 3: Multi-tenant matrix (Company A vs B)
- π Test 4: Security attack simulation
Production Security Best Practices:
# β
CORRECT: Use authenticated user from session
current_user_id = get_authenticated_user_id() # From auth middleware
results = await memory_service.search_memory(
app_name=APP_NAME,
user_id=current_user_id, # Never trust user input!
query=user_query
)
# β WRONG: Never use user-provided IDs
user_id = request.params.get('user_id') # Can be manipulated!
results = await memory_service.search_memory(
app_name=APP_NAME,
user_id=user_id, # Security vulnerability!
query=user_query
)Key Takeaway: Memory isolation is automatic when you always use authenticated user_id from server-side sessions.
pip install google-adk google-cloud-aiplatform- Start with Notebook 1 to understand fundamentals
- Progress to Notebook 2 for production patterns
- Complete with Notebook 3 for security mastery
Each notebook is standalone and fully runnable with example code.
import os
os.environ["GOOGLE_CLOUD_PROJECT"] = "your-project-id"
os.environ["GOOGLE_CLOUD_LOCATION"] = "us-central1"
os.environ["GOOGLE_GENAI_USE_VERTEXAI"] = "TRUE"# Use for: Temporary conversation context
session.state['user_preference'] = "dark_mode"# Use for: Cross-session persistence
await memory_service.add_session_to_memory(session)# Use for: Automatic memory persistence
async def auto_save_callback(callback_context):
session = callback_context._invocation_context.session
await memory_service.add_session_to_memory(session)
agent = Agent(after_agent_callback=auto_save_callback)# Use for: Agent-controlled memory search
agent = Agent(
tools=[load_memory],
instruction="Use load_memory to recall past information"
)- Remember customer preferences across sessions
- Recall past issues and solutions
- Provide personalized support experiences
- Store patient preferences securely (HIPAA-compliant isolation)
- Maintain medical history context
- User-level isolation prevents data leakage
- Track investment preferences
- Remember risk tolerance
- Maintain transaction context
- Remember shopping preferences
- Track past purchases
- Personalize product recommendations
- Complete tenant isolation
- Secure multi-user environments
- GDPR/privacy compliance
- Use InMemoryMemoryService for testing
- Implement session state for temporary data
- Add memory service for persistence
- Add load_memory tool to agents
- Test memory recall accuracy
- Switch to persistent memory storage
- Implement auto-save callbacks
- Configure memory consolidation
- Set up monitoring and logging
- Test memory search performance
- Always use authenticated user_id from server-side sessions
- Implement app-level isolation for multi-tenant systems
- Test against common attacks (covered in Notebook 3)
- Audit memory access patterns
- Ensure GDPR/privacy compliance
- Monitor memory usage and growth
- Track search performance metrics
- Handle memory search failures gracefully
- Implement memory cleanup/TTL policies
- Log memory access for security audits
-
Never trust client-provided user IDs
- Always use server-side authenticated sessions
- Validate user identity before memory operations
-
Use app-level isolation for multi-tenant systems
- Separate
app_namefor each tenant - Prevents cross-tenant data leakage
- Separate
-
Test security thoroughly
- Run all tests from Notebook 3
- Simulate attack scenarios
- Validate isolation boundaries
-
Comply with privacy regulations
- GDPR: Right to be forgotten (memory deletion)
- HIPAA: Secure health data storage
- PCI DSS: Separate payment data contexts
Production memory systems automatically:
- Extract key facts from conversations
- Deduplicate redundant information
- Consolidate related memories
- Update outdated information
Production systems use:
- Embedding-based similarity search
- Context-aware retrieval
- Relevance ranking
- Multi-query support
- Creation: Auto-save with callbacks
- Storage: Persistent cloud storage
- Retrieval: Semantic search with load_memory
- Consolidation: Automatic deduplication
- Expiration: TTL policies and cleanup
| Metric | InMemory | Production |
|---|---|---|
| Latency | < 10ms | 50-200ms |
| Storage | RAM-limited | Cloud-scale |
| Search Quality | Keyword | Semantic |
| Persistence | None | Permanent |
| Cost | Free | Pay-per-use |
- Use InMemory for development/testing
- Implement caching for frequent queries
- Batch memory saves when possible
- Monitor search performance
- Set appropriate TTL policies
Found an issue or have a suggestion?
- Open an issue for bugs or questions
- Submit a PR for improvements
- Share your use cases and learnings
This project is provided as educational material. Feel free to use, modify, and distribute with attribution.
Built with:
- Google ADK (Agent Development Kit)
- Vertex AI
- Gemini 2.5 Flash
Author: Saurabh Email: saurabh.m14@gmail.com GitHub: @analyticsrepo01
If you found this helpful, please star the repository and share it with others building agentic systems!
Happy Building! π
Memory is what makes agents truly intelligent. Use it wisely, secure it properly, and build amazing experiences.