AutoBot is a sophisticated ReAct (Reason + Act) agent system powered by local LLMs, designed to provide intelligent, context-aware responses through semantic search and tool integration. It combines reasoning, action execution, and memory management into a cohesive autonomous agent framework.
- π§ ReAct Agent Architecture: Implements Reason + Act pattern with iterative reasoning loops
- π Semantic Search: FAISS-based vector similarity search with ChromaDB support
- πΎ 3-Tier Memory System: Short-term, working, and long-term memory with episodic storage
- π οΈ Tool Integration: Web search pipeline with extensible tool registry
- π Document Processing: Multi-format ingestion (PDF, DOCX, HTML, CSV, Code files)
- π€ Local LLM Support: LFM2.5-1.2B-Instruct with GGUF quantization
- β‘ Parallel Processing: Multi-threaded document ingestion and processing
- π― Quality Filtering: Intelligent content scoring and deduplication
- Architecture Overview
- Installation
- Quick Start
- Usage Examples
- Configuration
- Project Structure
- Components
- API Reference
- Performance
- Contributing
- License
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β USER INPUT (Natural Language) β
ββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββΌβββββββββββββββββ
β ReAct ORCHESTRATOR β
β (Reason + Act Pattern) β
βββββββββββββ¬βββββββββββββββββ
β
βββββββββββββββββΌββββββββββββββββββ
β LLM INTERFACE β
β (LFM2.5-1.2B-Instruct) β
β - GGUF Support β
β - Chat Templates β
βββββββββββββββββ¬ββββββββββββββββββ
β
βββββββββββββββββΌββββββββββββββββββ
β TOOL REGISTRY β
β - Web Search β
β - Extensible Framework β
βββββββββββββββββ¬ββββββββββββββββββ
β
βββββββββββββββββΌββββββββββββββββββ
β MEMORY MANAGER β
β - Short-term (Session) β
β - Working (Shared State) β
β - Long-term (Persistent) β
β - Vector Store (FAISS/Chroma) β
βββββββββββββββββββββββββββββββββββ
- User Input β ReAct Orchestrator
- First Pass: LLM analyzes query and decides if tools are needed
- Tool Execution: If needed, executes web search or other tools
- Second Pass: LLM generates final answer grounded in tool results
- Memory Storage: Conversation stored in multi-tier memory system
- Python 3.9 or higher
- 8GB+ RAM (recommended for local LLM)
- CUDA-compatible GPU (optional, for faster inference)
git clone https://github.com/gajjalaashok75-UI/gakrai.git
cd gakrai# Create virtual environment
python -m venv venv
# Activate virtual environment
# On Windows:
venv\Scripts\activate
# On macOS/Linux:
source venv/bin/activatepip install -r requirements.txtThe model will be downloaded automatically on first run, or you can pre-download:
# The LFM2.5-1.2B-Instruct model will be downloaded to ./models/
# Approximately 1.2GB download# Configuration is automatically loaded from config/settings.yaml
# Customize settings if needed (see Configuration section)import asyncio
from core.react_orchestrator import ReActOrchestrator
import yaml
# Load configuration
with open('config/settings.yaml', 'r') as f:
config = yaml.safe_load(f)
# Initialize orchestrator
orchestrator = ReActOrchestrator(config)
async def main():
# Initialize all components
await orchestrator.initialize()
# Ask a question
response = await orchestrator.handle_input(
"What are the latest developments in artificial intelligence?"
)
print("Response:", response.response)
print("Steps taken:", response.total_steps)
print("Execution time:", f"{response.execution_time:.2f}s")
# Run the example
asyncio.run(main())# Run the interactive command-line interface
python main.py
# Example interaction:
> What is machine learning?
[AutoBot analyzes the question, searches for current information, and provides a comprehensive answer]
> history
[Shows conversation history]
> clear
[Clears session memory]# Run with ctransformers demo
python main.py --ctransformers-demo "Explain quantum computing"
# Custom configuration
python main.py --config custom_config.yaml# AutoBot automatically determines when to search the web
response = await orchestrator.handle_input(
"What are the current Python 3.12 features?"
)
# AutoBot will:
# 1. Recognize this needs current information
# 2. Execute web search
# 3. Analyze results
# 4. Provide comprehensive answer with sourcesfrom memory.ingestion_pipeline import AdvancedIngestionPipeline
# Ingest documents into vector store
pipeline = AdvancedIngestionPipeline(
store_path="./memory/vector_store",
min_quality_score=0.4
)
# Process documents
stats = pipeline.ingest(
input_dirs=["./documents/pdfs", "./documents/code"],
max_workers=4
)
print(f"Indexed {stats['chunks_indexed']} chunks from {stats['new_files']} files")
# Now AutoBot can answer questions about your documents
response = await orchestrator.handle_input(
"Based on my documents, explain the main concepts"
)# AutoBot maintains conversation context
await orchestrator.handle_input("What is Python?")
await orchestrator.handle_input("What are its main advantages?") # Refers to Python
await orchestrator.handle_input("Show me some code examples") # Still about Python
# Access conversation history
history = await orchestrator.memory.get_recent_interactions(limit=10)
for interaction in history:
print(f"Q: {interaction['user_input']}")
print(f"A: {interaction['response'][:100]}...")# AutoBot can be extended with custom tools
from tools.tool_registry import ToolRegistry
# The tool registry is extensible - add your own tools
# Tools are automatically detected and used by the ReAct agentassistant:
name: "AutoBot"
version: "0.2.0"
llm:
intent_model:
name: "LFM2.5-1.2B-Instruct-Q5_K_M"
local_path: "./models/LFM2.5-1.2B-Instruct-Q5_K_M.gguf"
max_tokens: 2048
temperature: 0.3
memory:
short_term_limit: 100
long_term_db: "./memory/long_term.db"
short_term_db: "./memory/short_term.db"
vector_store: "./memory/vector_store"
tools:
enabled:
- "web_search"
agentic:
react_max_steps: 8
max_context_length: 32768
max_tokens_hard_limit: 4096
debug:
enabled: true# Optional environment variables
export RAG_VECTOR_STORE_PATH="./memory/vector_store"
export RAG_EMBEDDING_MODEL="BAAI/bge-large-en-v1.5"
export RAG_MIN_QUALITY="0.4"
export RAG_WORKERS="4"# Adjust for your hardware
performance:
cache_ttl: 300
batch_size: 5
# Memory settings
memory:
short_term_limit: 50 # Reduce for lower memory usage
# LLM settings
llm:
intent_model:
max_tokens: 1024 # Reduce for faster responses
temperature: 0.1 # Lower for more deterministic responsesautobot/
βββ π main.py # Entry point & CLI interface
βββ π core/ # Core agent logic
β βββ π§ react_orchestrator.py # ReAct agent implementation
β βββ π€ llm_interface.py # Local LLM management
β βββ π __init__.py
βββ π memory/ # Memory & knowledge management
β βββ πΎ memory_manager.py # 3-tier memory system
β βββ π ingestion_pipeline.py # Document processing
β βββ π rag_pipeline.py # Retrieval-augmented generation
β βββ π INGESTION_PIPELINE.md # Detailed ingestion docs
β βββ π RAG_PIPELINE.md # Detailed RAG docs
β βββ ποΈ long_term.db # Persistent memory
β βββ ποΈ short_term.db # Session memory
β βββ π vector_store/ # FAISS/ChromaDB index
βββ π tools/ # Tool integrations
β βββ π§ tool_registry.py # Tool management
β βββ π tool_detector.py # Parse tool calls from LLM
β βββ π web_search/ # Web search implementation
β βββ π search.py # Main search pipeline
β βββ β‘ quick_scrape.py # Search execution
β βββ π§Ή main_content_cleaner.py # Content extraction
βββ π models/ # LLM model management
β βββ π₯ load-autobot-instruct.py # Model loading utilities
β βββ βοΈ generate-autobot-instruct.py # Generation logic
β βββ π€ LFM2.5-1.2B-Instruct-Q5_K_M.gguf # Model weights (downloaded)
βββ π config/ # Configuration
β βββ βοΈ settings.yaml # Main configuration file
βββ π logs/ # Application logs
β βββ π autobot.log # Execution logs
βββ π requirements.txt # Python dependencies
- Purpose: Implements the Reason + Act pattern for intelligent decision making
- Features: Multi-step reasoning, tool integration, conversation management
- Models: Uses LFM2.5-1.2B-Instruct for reasoning and action decisions
- Purpose: Manages local model loading and inference
- Support: GGUF via ctransformers (CPU) and transformers (GPU)
- Features: Chat templates, bfloat16 precision, streaming generation
- Short-term: Session-level interactions (in-memory + SQLite)
- Working: Shared state for current reasoning loops
- Long-term: Persistent SQLite with semantic search via ChromaDB
- Features: Episodic memory, semantic memory, conversation history
- Current Tools: Web search (DuckDuckGo-based)
- Architecture: Extensible framework for adding new tools
- Features: Automatic tool detection, parallel execution, retry logic
- Formats: PDF, DOCX, HTML, CSV, Code files (Python, JS, Java), TXT, Markdown
- Features: Adaptive chunking, quality scoring, deduplication, parallel processing
- Output: FAISS vector index with metadata for RAG retrieval
- Search: FAISS-based semantic similarity search
- Features: Metadata filtering, source deduplication, context building
- Integration: Works seamlessly with ReAct agent for knowledge retrieval
class ReActOrchestrator:
async def initialize() -> bool
async def handle_input(user_input: str) -> ReActResult
async def get_conversation_history() -> List[Dict]
async def clear_session_memory() -> boolclass MemoryManager:
async def store_interaction(user_input: str, response: str, intent: str)
async def get_recent_interactions(limit: int = 10) -> List[Dict]
async def search_memories(query: str, limit: int = 5) -> List[Dict]
async def flush_short_to_long_term() -> Dictclass ToolRegistry:
async def execute_tool(tool_name: str, **kwargs) -> Dict
def get_available_tools() -> List[Dict]
def register_tool(name: str, function: Callable, schema: Dict)class AdvancedIngestionPipeline:
def ingest(input_dirs: List[str], max_workers: int = 4) -> Dict
def process_file(file_path: str) -> int
def get_stats() -> Dictclass RAGPipeline:
def query(query: str, top_k: int = 3, temperature: float = 0.1) -> Dict
def retrieve_context(query: str, top_k: int = 5) -> Tuple[RAGContext, List[RAGResult]]
def get_stats() -> Dict- Model Size: 1.2B parameters (quantized to ~800MB)
- Inference Speed: 1-5 seconds per response
- Memory Usage: 4-8 GB (including model and search indices)
- Search Latency: 100-500ms for semantic search
- Web Search: 2-10 seconds depending on query complexity
- GPU Acceleration: Use CUDA for 3-5x faster inference
- Memory Management: Adjust
short_term_limitfor memory constraints - Parallel Processing: Use
max_workers=4-8for document ingestion - Quality Filtering: Set
min_quality_score=0.6+for better results - Context Length: Reduce
max_context_lengthfor faster responses
Minimum:
- 8GB RAM
- 4-core CPU
- 2GB storage
Recommended:
- 16GB RAM
- 8-core CPU
- NVIDIA GPU with 4GB+ VRAM
- 10GB storage
Optimal:
- 32GB RAM
- 16-core CPU
- NVIDIA GPU with 8GB+ VRAM
- SSD storage
# 1. Create tool function
async def my_custom_tool(param1: str, param2: int) -> Dict:
# Your tool logic here
return {"result": "success", "data": "..."}
# 2. Register tool
tool_schema = {
"name": "my_custom_tool",
"description": "Description of what the tool does",
"parameters": {
"type": "object",
"properties": {
"param1": {"type": "string", "description": "Parameter 1"},
"param2": {"type": "integer", "description": "Parameter 2"}
},
"required": ["param1", "param2"]
}
}
# 3. Add to tool registry
tool_registry.register_tool("my_custom_tool", my_custom_tool, tool_schema)# Custom memory backend
class CustomMemoryBackend:
async def store(self, key: str, value: Any):
# Custom storage logic
pass
async def retrieve(self, key: str) -> Any:
# Custom retrieval logic
pass
# Integrate with memory manager
memory_manager.add_backend("custom", CustomMemoryBackend())# Support for custom models
class CustomLLMInterface(LLMInterface):
def _load_model(self):
# Load your custom model
pass
async def generate(self, messages: List[Dict]) -> str:
# Custom generation logic
passWe welcome contributions! Please see our Contributing Guidelines for details.
# Clone repository
git clone https://github.com/gajjalaashok75-UI/gakrai.git
cd gakrai
# Create development environment
python -m venv dev-env
source dev-env/bin/activate # or dev-env\Scripts\activate on Windows
# Install development dependencies
pip install -r requirements.txt
pip install -r requirements-dev.txt # If available
# Run tests
python -m pytest tests/
# Run linting
flake8 .
black .- π οΈ Tool Development: Add new tools (email, calendar, databases, APIs)
- π§ Model Integration: Support for new LLM models and providers
- π Document Formats: Add support for new file formats
- π Search Improvements: Enhanced semantic search and ranking
- π¨ UI/UX: Web interface, mobile app, desktop GUI
- π Analytics: Usage metrics, performance monitoring
- π Security: Authentication, authorization, data privacy
This project is licensed under the MIT License - see the LICENSE file for details.
- LFM2.5: Local language model for reasoning and generation
- FAISS: Efficient similarity search and clustering
- ChromaDB: Vector database for semantic search
- LangChain/LangGraph: Agent orchestration framework
- Transformers: HuggingFace model integration
- DuckDuckGo: Privacy-focused web search
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Documentation: See
/memory/folder for detailed component docs
- Web Interface: React-based web UI
- API Server: REST API for external integrations
- Plugin System: Dynamic tool loading
- Multi-Modal: Image and audio processing
- Distributed: Multi-agent collaboration
- Cloud Integration: AWS/Azure/GCP deployment
- Mobile App: iOS/Android applications
AutoBot - Intelligent automation through local AI reasoning and action. Built with β€οΈ for developers and researchers. by Gakr team