The Antara AI is a sophisticated conversational AI system built on the LangGraph framework, featuring persistent memory capabilities through MongoDB and a modular, extensible architecture. The agent implements four distinct memory types (Episodic, Semantic, Procedural, and Associative) using the LangMem library, enabling it to learn from interactions, remember user preferences, and apply procedural knowledge across conversations.
- Stateless LLM with Persistent Memory: The underlying language model is stateless, but the agent maintains persistent memories across conversations
- Modular Design: Clean separation of concerns with distinct layers for service, core logic, tools, and configuration
- Memory-Driven Intelligence: Leverages multiple memory types to provide contextual, personalized responses
- Tool-Based Extensibility: Easily extensible through a standardized tool interface
- Framework: LangGraph for state management and execution flow
- Language Models: Support for both Groq (cloud) and Ollama (local) providers
- Memory Storage: MongoDB with vector indexing for semantic search
- Embeddings: HuggingFace embeddings (all-MiniLM-L6-v2, 384 dimensions)
- Tools: Memory management tools and internet search capabilities
┌─────────────────────────────────────────────────────┐
│ User Interface Layer │
│ (CLI, Streamlit, Web Interface) │
├─────────────────────────────────────────────────────┤
│ Service Layer │
│ (LTMService) │
├─────────────────────────────────────────────────────┤
│ Core Layer │
│ Agent Logic │ Graph Builder │ State Manager │
├─────────────────────────────────────────────────────┤
│ Memory Layer │
│ Memory Manager │ MongoDB Store │ Vector Index │
├─────────────────────────────────────────────────────┤
│ Tools Layer │
│ Memory Tools │ Internet Search │
├─────────────────────────────────────────────────────┤
│ Configuration Layer │
│ App Config │ Prompt Templates │ Environment │
└─────────────────────────────────────────────────────┘
Purpose: Provides a clean, high-level interface between UI components and core functionality.
Key Responsibilities:
- Model initialization and configuration
- User and thread management
- Message processing orchestration
- Error handling and logging
Key Methods:
get_model_info(): Returns current model configurationcreate_user_id()/create_thread_id(): Generate unique identifiersprocess_message(): Main entry point for message processing
class State(MessagesState):
recall_memories: List[str]- Extends LangGraph's MessagesState
- Maintains conversation messages and recalled memories
- Thread-safe and serializable for checkpointing
Core Functions:
agent(): Main processing function that combines messages with recalled memoriesload_memories(): Placeholder for automatic memory loading (currently delegated to tools)route_tools(): Determines whether to execute tools or end conversation
Creates the execution graph with the following flow:
START → load_memories → agent → [route_tools] → [tools → agent] → END
Features:
- MongoDB checkpointer for persistent state
- Conditional edges for tool execution
- Stream-based output for real-time responses
Purpose: Captures specific learning experiences and interaction patterns
class Episode(BaseModel):
observation: str # Context and situation
thoughts: str # Agent's reasoning process
action: str # What was done and how
result: str # Outcome and analysisUse Cases:
- Learning from successful interactions
- Recording problem-solving approaches
- Building experience base for future reference
Purpose: Stores factual information as structured relationships
class Triple(BaseModel):
subject: str # Entity being described
predicate: str # Relationship or property
object: str # Target of relationship
context: str # Additional clarificationUse Cases:
- User preferences and facts
- Relationships between concepts
- Structured knowledge representation
Purpose: Stores instructions, rules, and repeatable procedures
class Procedural(BaseModel):
task: str # Task or process name
steps: List[str] # Step-by-step instructions
conditions: str # When to apply procedure
outcome: str # Expected resultsUse Cases:
- How-to knowledge
- Standard operating procedures
- Rule-based decision making
Purpose: Flexible memory type for associations and mixed content
Use Cases:
- Cross-domain associations
- Contextual relationships
- Mixed-type memory storage
┌─────────────────────────────────────────────────────┐
│ MongoDB Database │
│ (ltm_agent) │
├─────────────────────────────────────────────────────┤
│ Collections Structure │
│ │
│ memories/ │
│ ├── {user_id}/ │
│ │ ├── episodes/ (Episodic memories) │
│ │ ├── triples/ (Semantic memories) │
│ │ ├── procedures/ (Procedural memories) │
│ │ └── general/ (Associative memories) │
│ └── vector_indexes/ (Vector search indexes)│
└─────────────────────────────────────────────────────┘
Key Features:
- User-scoped memory isolation
- Vector indexing for semantic search
- HuggingFace embeddings (384 dimensions)
- Automated index creation and management
Each memory type has dedicated management and search tools:
manage_episodic_memory_tool: CRUD operations for episodesmanage_semantic_memory_tool: CRUD operations for triplesmanage_procedural_memory_tool: CRUD operations for proceduresmanage_general_memory_tool: CRUD operations for general memories
search_episodic_memory_tool: Semantic search through experiencessearch_semantic_memory_tool: Search facts and relationshipssearch_procedural_memory_tool: Search procedures and rulessearch_general_memory_tool: General associative search
- Tool:
SearxSearchResults - Purpose: Web search capabilities for real-time information retrieval
- Configuration: Configurable Searx host for privacy-focused search
- Integration: Seamlessly integrated with agent decision-making process
All tools follow a standardized integration pattern:
# Tool Registration
all_tools = [
search_internet_tool,
*memory_tools, # 8 memory management tools
]
# Model Binding
model_with_tools = model.bind_tools(all_tools)CONFIG = {
# Model Configuration
"model_provider": "groq|ollama",
"model_name": "meta-llama/llama-4-maverick-17b-128e-instruct",
# Services
"searx_host": "http://127.0.0.1:8080",
"ollama_host": "http://localhost:11434",
# Memory Configuration
"mongodb_uri": "mongodb://localhost:27017",
"mongodb_db": "agent-memory",
"vector_k_results": 3,
# Embedding Configuration
"memory_index": {
"dims": "384",
"embed": "hf:sentence-transformers/all-MiniLM-L6-v2"
}
}- Sensitive Data: Database URIs and service endpoints stored in environment variables
- Runtime Configuration: Development vs. production settings
- Security: No hardcoded credentials in source code
- Input Reception: User message received through UI layer
- Service Orchestration: LTMService coordinates processing
- State Initialization: Current state loaded from MongoDB checkpoint
- Memory Loading: Relevant memories retrieved (currently tool-driven)
- Agent Processing: LLM processes messages with memory context
- Tool Execution: If needed, memory tools or internet search are executed for additional functionality
- Response Generation: Final response generated and returned
- State Persistence: Updated state saved to MongoDB checkpoint
- Tool Invocation: Agent calls appropriate memory management tool
- Schema Validation: Input validated against memory type schema
- Embedding Generation: Content embedded using HuggingFace model
- Database Storage: Memory stored in MongoDB with vector index
- Confirmation: Success/failure status returned to agent
- Search Query: Agent calls appropriate search tool
- Vector Search: Query embedded and compared against stored vectors
- Similarity Ranking: Results ranked by semantic similarity
- Context Assembly: Retrieved memories formatted for agent context
- Response Integration: Memories integrated into agent response
The agent's behavior is guided by a comprehensive system prompt that includes:
- Identity and Capabilities: Clear definition of the agent's role and memory capabilities
- Memory Guidelines: Specific instructions for using different memory types
- Tool Usage Instructions: Guidance on when and how to use memory tools
- Personality and Tone: Jarvis-inspired communication style
- Interaction Patterns: Natural conversation flow with seamless memory integration
- Proactive Memory Management: Agent actively stores important information
- Context-Aware Retrieval: Search memories before responding
- Personalization: Use memories to tailor responses to user preferences
- Learning from Experience: Build knowledge base through episodic memory
- Adaptive Behavior: Recognize and adapt to changing user needs
- Features:
- User and thread management
- Real-time conversation
- Model configuration display
- Stream-based output
- Features:
- Web-based interaction
- Visual conversation history
- Configuration management
- Multi-user support
The service layer provides a clean interface for adding new UI implementations:
- Desktop applications
- Mobile interfaces
- API endpoints
- Chatbot integrations
- User Scoping: All memories are scoped to individual users
- Thread Isolation: Conversations are isolated by thread ID
- Access Control: No cross-user data access
- Environment Variables: Sensitive data stored securely
- Service Authentication: Secure authentication for search services
- Input Validation: Schema-based validation for all memory operations
- Error Handling: Graceful error handling without information leakage
- Vector Indexing: Efficient semantic search through MongoDB vector indexes
- Streaming Responses: Real-time output through LangGraph streaming
- Connection Pooling: MongoDB connection optimization
- Embedding Caching: Efficient embedding generation and storage
- Horizontal Scaling: MongoDB supports clustering and sharding
- Model Flexibility: Support for both cloud (Groq) and local (Ollama) models
- Memory Management: Configurable memory retention and cleanup
- Resource Monitoring: Built-in system resource monitoring
- Local MongoDB: Development with local MongoDB instance
- Environment Configuration:
.envfile for development settings - Model Options: Support for local and cloud model providers
- Core Tools: Memory management and internet search capabilities
- MongoDB Atlas: Cloud-based MongoDB for production
- Environment Management: Production-specific configuration
- Service Configuration: Secure endpoint and service management
- Monitoring and Logging: Comprehensive logging and error tracking
- Advanced Memory Analytics: Memory usage and effectiveness metrics
- Multi-Modal Memory: Support for image and audio memories
- Memory Sharing: Controlled memory sharing between users
- Enhanced Search Capabilities: More sophisticated information retrieval
- Microservices: Potential evolution to microservices architecture
- Event-Driven Processing: Asynchronous memory processing
- Advanced AI Integration: Multi-agent collaboration capabilities
- Enhanced Security: Advanced authentication and authorization
The AI Agent represents a sophisticated approach to building conversational AI with persistent memory capabilities. Its modular architecture, comprehensive memory system, and extensible tool framework provide a robust foundation for building intelligent, context-aware applications that can learn and adapt over time.
The combination of multiple memory types, vector-based semantic search, and a clean service-oriented architecture makes this system both powerful and maintainable, suitable for a wide range of applications from personal assistants to specialized domain experts.