The Vector Search Agent is an enterprise-grade conversational AI platform designed to transform how organizations access and utilize technical automotive documentation. Built for automotive manufacturers, dealership networks, and technical service centers, this solution enables employees at all levels to retrieve precise technical information through natural language queries.
The Business Problem: Automotive technical documentation is vast, complex, and constantly evolving. Service technicians, support staff, and engineers spend significant time searching through manuals, diagnostic codes, and technical specifications. Traditional keyword-based search often fails to understand context, leading to frustration and inefficiency.
Our Solution: This platform uses advanced AI to understand the intent behind questions, search through technical documents intelligently, and provide accurate answers with traceable sources. When a technician asks "What is the torque specification for the cylinder head bolts on a Fiat Linea?", the system understands the technical context, retrieves relevant documentation, and provides a precise answer with source citations.
Key Business Benefits:
- Reduced Resolution Time: Technical staff find answers in seconds rather than minutes or hours
- Improved Accuracy: AI-powered responses include source citations, reducing errors from misinterpreted documentation
- Knowledge Preservation: Institutional knowledge becomes searchable and accessible to all team members
- Scalable Support: Handle increased query volumes without proportional staffing increases
- Audit Trail: All responses are traceable to source documents for compliance and quality assurance
Intended Users:
- Technical Service Staff: Mechanics and technicians seeking repair procedures and specifications
- Support Centers: Call center agents helping customers with vehicle issues
- Training Departments: Creating and validating training materials
- Quality Assurance Teams: Verifying technical information accuracy
- Technical Overview
- System Architecture
- Agent Workflow
- Installation and Setup
- Application Usage
- Container Management
- Development Guide
- Monitoring and Diagnostics
- Security Considerations
- Troubleshooting
- Performance Optimization
- Contributing
- License and Acknowledgments
This section provides the technical foundation for IT teams and developers who will implement and maintain the system.
| Capability | Description |
|---|---|
| Specialized Agent | LangGraph-based system with specialized processing nodes for planning, retrieval, evaluation, and formatting |
| Automotive Domain | Knowledge base covering Fiat Linea vehicles, DTC diagnostic codes, technical procedures, and specifications |
| Intelligent Vector Search | Document processing (PDF and JSON) with semantic embedding techniques |
| Scope Validation | Automatic validation ensuring questions fall within the automotive knowledge domain |
| Real-time Streaming | Server-Sent Events (SSE) interface showing agent progress at each processing step |
| Traceable Sources | All responses include detailed references to consulted documents |
AI and Machine Learning:
- Google Gemini (gemini-2.0-flash-exp) for language model processing
- Google Gemini Embeddings (768 dimensions) for semantic document search
- LangChain framework for LLM application integration
- LangGraph for agent workflow orchestration
Backend Infrastructure:
- FastAPI asynchronous web framework with automatic OpenAPI documentation
- ChromaDB vector database for embeddings and metadata storage
- Pydantic for data validation and schema management
- Python 3.12.0 runtime environment
Document Processing:
- PyPDF for PDF text extraction
- JSON loaders for structured data
- Text splitters with configurable chunking and overlap
Frontend and Interface:
- Chainlit conversational interface
- Server-Sent Events for real-time streaming
- Command-based knowledge management
Infrastructure:
- Docker Compose for container orchestration
- Persistent volume storage for ChromaDB
- Health check monitoring
- Hot reload for development
The system uses a containerized microservices architecture designed for reliability, scalability, and ease of deployment.
The application runs in two primary Docker containers that communicate over an internal network.
Container 1: FastAPI Backend (Port 8000)
This container houses the core application logic including the REST API, AI agent, and vector database.
- Complete REST API with endpoints for queries, file management, and knowledge base operations
- LangGraph-based agent with structured workflow nodes
- Integrated ChromaDB for persistent vector storage
- Google Gemini API integration for language model and embeddings
- Logging, metrics collection, and performance monitoring
Container 2: Chainlit Frontend (Port 8001)
This container provides the user-facing conversational interface.
- Interactive chat with real-time response streaming
- Visual progress tracking of agent processing steps
- Knowledge base management through chat commands
- HTTP communication with the FastAPI backend
The agent uses a graph-based workflow where specialized nodes handle different aspects of query processing.
graph TD
A[User Question] --> B[Scope Validation]
B -->|Out of Scope| C[Guidance Response]
B -->|In Scope| D[Planner Node]
D --> E[Retriever Node]
E --> F[Joiner Node]
F -->|Insufficient| D
F -->|Sufficient| G[Output Formatter]
G --> H[Final Response + Sources]
subgraph "Automotive Specialization"
I[Automotive Knowledge]
J[Technical Specs]
K[Vehicle Database]
end
D -.-> I
E -.-> J
G -.-> K
The following diagram illustrates how data moves through the system layers.
graph LR
subgraph "Frontend Layer"
A[Chainlit UI] --> C[Chat Interface]
A --> D[Command Interface]
end
subgraph "API Layer"
E[FastAPI Endpoints] --> F[Authentication]
E --> G[Validation]
E --> H[Streaming SSE]
end
subgraph "Agent Layer"
I[LangGraph Workflow] --> J[Scope Validation]
I --> K[Knowledge Retrieval]
I --> L[Response Generation]
end
subgraph "Data Layer"
M[ChromaDB] --> N[Vector Embeddings]
M --> O[Document Metadata]
P[Knowledge Files] --> Q[PDF Documents]
P --> R[JSON Solutions]
end
subgraph "AI Services"
S[Google Gemini] --> T[LLM Processing]
S --> U[Embeddings Generation]
end
A --> E
E --> I
I --> M
I --> S
P --> M
vector-agent-search/
├── app/ # FastAPI Main Application
│ ├── agents/ # LangGraph Agent System
│ │ ├── agent.py # Main agent class and workflow
│ │ └── nodes.py # Specialized nodes (Planner, Retriever, etc.)
│ ├── api/ # REST APIs and Endpoints
│ │ └── endpoints.py # Complete routes (query, upload, knowledge)
│ ├── core/ # Core Configurations
│ │ └── config.py # Application and environment settings
│ ├── models/ # Models and Schemas
│ │ ├── schemas.py # Pydantic models (State, Request/Response)
│ │ ├── prompts.py # Specialized prompt templates
│ │ ├── automotive_knowledge.py # Automotive knowledge base
│ │ └── scope_prompts.py # Scope validation prompts
│ ├── services/ # Core Application Services
│ │ ├── llm_service.py # Google Gemini (LLM + Embeddings)
│ │ ├── vector_service.py # ChromaDB and vector operations
│ │ ├── scope_service.py # Question scope validation
│ │ └── intelligent_scope_service.py # LLM-based intelligent validation
│ ├── evaluation/ # Evaluation System
│ │ ├── metrics.py # Quality metrics
│ │ ├── response_evaluator.py # Response evaluation
│ │ └── retrieval_evaluator.py # Retrieval evaluation
│ ├── monitoring/ # Monitoring and Observability
│ │ ├── logger.py # Structured logging system
│ │ ├── metrics_tracker.py # Metrics tracking
│ │ └── performance_monitor.py # Performance monitoring
│ ├── utils/ # Utilities and Helpers
│ │ ├── file_utils.py # File manipulation and extraction
│ │ ├── response_utils.py # Response formatting
│ │ └── prompt_enhancer.py # Prompt improvement
│ ├── config/ # Data Configurations
│ │ └── escopo.json # Automotive scope definitions
│ └── main.py # FastAPI main application
├── frontend/ # Chainlit Interface
│ └── app.py # Chainlit application with streaming
├── docker/ # Docker Configurations
│ ├── Dockerfile.fastapi # FastAPI + Agent container
│ └── Dockerfile.chainlit # Chainlit Frontend container
├── scripts/ # Automation Scripts
│ ├── setup_gemini.md # Google Gemini configuration guide
│ ├── reset_chromadb.bat # ChromaDB reset (Windows)
│ ├── reset_chromadb.sh # ChromaDB reset (Linux/Mac)
│ └── chainlit_autoreload.py # Development auto-reload
├── tests/ # Automated Tests
│ ├── test_agent.py # Main agent tests
│ ├── test_api.py # API tests
│ ├── test_vector_service.py # Vector service tests
│ ├── test_automotive_cases.py # Automotive test cases
│ ├── test_evaluation.py # Evaluation system tests
│ ├── test_monitoring.py # Monitoring tests
│ ├── conftest.py # Pytest configurations
│ └── run_full_validation.py # Complete system validation
├── knowledge/ # Knowledge Base
│ ├── files_id/ # Technical PDF documents
│ └── direct_hits_id/ # Direct solutions (JSON)
├── Guides/ # Usage Guides
│ ├── DOCKER_TESTING_GUIDE.md # Docker testing guide
│ └── RESET_CHROMADB_GUIDE.md # Vector database reset guide
├── docker-compose.yml # Complete container orchestration
├── requirements.txt # FastAPI dependencies
├── requirements.chainlit.txt # Chainlit dependencies
├── start.bat / start.sh # Startup scripts
└── README.md # Main documentation
The agent workflow represents the intelligent decision-making process that transforms user questions into accurate, sourced responses. This section explains how the system processes queries.
When a user submits a question, it passes through a series of specialized processing nodes, each designed to handle a specific aspect of the query.
graph TD
A[User Question] --> B[Scope Validator]
B -->|Out of Scope| C[Guidance Response]
B -->|In Scope| D[Planner - Strategy]
D --> E[Retriever - Vector Search]
E --> F[Joiner - Evaluation]
F -->|Insufficient| D
F -->|Sufficient| G[Output Formatter]
G --> H[Final Response + Sources]
subgraph "Automotive Specialization"
I[Automotive Knowledge]
J[Technical Specs DB]
K[DTC Codes]
L[Vehicle Context]
end
D -.-> I
E -.-> J
F -.-> K
G -.-> L
Scope Validator
The Scope Validator determines whether a question falls within the automotive domain that the system is designed to answer.
Technical Details:
- Uses LLM combined with a scope knowledge base for validation
- Outputs include:
in_scope(boolean),category(topic area),confidence(certainty level),reason(explanation) - Specialized recognition for Fiat Linea context, DTC codes, and technical procedures
Planner
The Planner analyzes validated questions and creates an optimized search strategy.
Technical Details:
- Identifies relevant automotive systems (engine, electrical, fuel)
- Optimizes search by document type (procedure, specification, diagnostic)
- Outputs a structured plan with specific technical terms for retrieval
Retriever
The Retriever performs semantic search across the vectorized document database.
Technical Details:
- Uses ChromaDB with Google Gemini Embeddings (768 dimensions)
- Applies specialized filters by document type and automotive relevance
- Implements diversification to avoid redundant chunks from the same source
- Extracts metadata including technical specifications (torques, pressures, codes)
Joiner
The Joiner evaluates whether the retrieved documents contain sufficient information for a complete response.
Technical Details:
- Uses AI to determine if more information or replanning is needed
- Implements a confidence scoring system for response quality
- Includes loop prevention logic to avoid infinite retrieval cycles
Output Formatter
The Output Formatter assembles the final response with automotive expertise and source citations.
Technical Details:
- Formats responses with proper technical structure (procedures, specifications, diagnostics)
- Creates traceable source list with file names and relevant excerpts
- Includes relevant technical metadata
The system provides real-time feedback on processing progress through Server-Sent Events.
sequenceDiagram
participant U as User
participant C as Chainlit
participant A as Agent
participant L as LLM
participant V as VectorDB
U->>C: Question
C->>A: Stream Request
A->>C: Validating scope...
A->>L: Scope validation
A->>C: Creating plan...
A->>L: Planning
A->>C: Searching documents...
A->>V: Vector search
A->>C: Evaluating results...
A->>L: Evaluation
A->>C: Formatting response...
A->>L: Final formatting
A->>C: Complete response
C->>U: Final result
The system collects detailed metrics for each query to enable performance analysis and quality assurance.
Example metrics collected:
{
"query_type": "automotive",
"processing_time": "2.3s",
"documents_retrieved": 6,
"confidence_level": "high",
"automotive_terms_matched": 8,
"technical_specs_found": True,
"sources_with_downloads": 4
}This section provides step-by-step instructions for deploying the Vector Search Agent.
Before installation, ensure the following are available:
- Docker and Docker Compose
- Git
- Python 3.12.0 (for local development only)
- Google Gemini API key
git clone <repository-url>
cd vector-agent-searchCopy the example configuration file and customize as needed:
cp .env.example .envEdit the .env file to set your Google Gemini API key and other configuration options. Default settings work for the Docker environment.
Obtain an API key from Google AI Studio:
- Visit https://aistudio.google.com/app/apikey
- Create a new API key
- Add the key to your
.envfile:
echo "GOOGLE_API_KEY=your_api_key_here" >> .env- Verify configuration:
python tests/check_gemini_config.py# Start all services
docker-compose up -d
# Check logs to verify startup
docker-compose logs -fAfter containers are running, load the knowledge base:
curl -X POST "http://localhost:8000/api/v1/knowledge/load"This application uses Google Gemini embeddings with 768 dimensions for optimal vector quality.
If migrating from a previous embedding configuration (such as 384 dimensions), reset ChromaDB:
# Stop containers and reset volume
docker-compose down
docker volume rm vector-agent-search_chroma_data
docker-compose up -d
# Or use the migration script
docker exec -it vector-search-fastapi bash -c "cd /app && python reset_for_gemini_embeddings.py"All tests should be executed inside Docker containers to ensure consistency with the production environment.
Automated Testing (Recommended):
# Windows
run_tests_docker.bat
# Linux/Mac
./run_tests_docker.shManual Testing:
# Test Gemini embeddings
docker exec -it vector-search-fastapi bash -c "cd /app && python test_gemini_embeddings.py"
# Check configuration
docker exec -it vector-search-fastapi bash -c "cd /app && python tests/check_gemini_config.py"
# Run all tests
docker exec -it vector-search-fastapi bash -c "cd /app && python -m pytest tests/ -v"This section covers day-to-day usage of the Vector Search Agent through both the web interface and REST API.
The web interface provides an intuitive chat-based experience for interacting with the knowledge base.
Access the interface at: http://localhost:8001
Loading Documents:
- Use the
/carregarcommand to load the existing knowledge base - Documents can be added via API endpoints or by placing files in the knowledge directory
Asking Questions:
- Type questions about the loaded documents in natural language
- The agent processes queries in real time with visible progress indicators
- View the search plan, retrieved documents, and final response
The REST API enables programmatic integration with other systems and applications.
Access API documentation at: http://localhost:8000/docs
Simple Query:
curl -X POST "http://localhost:8000/api/v1/query" \
-H "Content-Type: application/json" \
-d '{
"question": "What is the main content of the documents?",
"use_confidence": true
}'Streaming Query:
curl -N -X POST "http://localhost:8000/api/v1/query/stream" \
-H "Content-Type: application/json" \
-H "Accept: text/event-stream" \
-d '{
"question": "Explain the main points of the documents",
"use_confidence": true
}'File Upload:
curl -X POST "http://localhost:8000/api/v1/files/upload" \
-F "files=@document.pdf" \
-F "files=@data.json"Scope Validation:
curl -X POST "http://localhost:8000/api/v1/scope/check" \
-H "Content-Type: application/json" \
-d '{
"question": "Question about the documents"
}'Knowledge Base Operations:
# Load knowledge base
curl -X POST "http://localhost:8000/api/v1/knowledge/load"
# Vectorize documents
curl -X POST "http://localhost:8000/api/v1/knowledge/vectorize"
# Check available files
curl http://localhost:8000/api/v1/files/knowledge
# Reset ChromaDB
curl -X DELETE "http://localhost:8000/api/v1/knowledge/reset"Health Check:
curl http://localhost:8000/healthThis section provides commands for managing the Docker containers that run the application.
# Start all services
docker-compose up -d
# Stop all services
docker-compose down
# Restart all containers
docker-compose restart
# Check container status
docker-compose ps
# View logs from all services
docker-compose logs -f# Complete rebuild with clean cache
docker-compose down
docker-compose build --no-cache --pull
docker-compose up -d
# Rebuild specific service
docker-compose up -d --build --force-recreate fastapi-agent
# Rebuild frontend only
docker-compose up -d --build --force-recreate chainlit-frontend
# Restart without rebuild
docker-compose restart fastapi-agent
docker-compose restart chainlit-frontend# Access main container shell
docker exec -it vector-search-fastapi bash
# Access frontend shell
docker exec -it vector-search-chainlit bash
# View specific service logs
docker-compose logs -f fastapi-agent
docker-compose logs -f chainlit-frontend
# Monitor resource usage
docker stats
# Execute commands in container
docker exec -it vector-search-fastapi python -c "print('Hello from container')"# Basic cleanup (removes stopped containers)
docker-compose down
docker system prune -f
# Complete cleanup (WARNING: deletes volumes and persistent data)
docker-compose down -v --rmi all
docker system prune -a --volumes -f
docker volume prune -f
# Selective image cleanup
docker rmi $(docker images -q vector-agent-search*)Vector Database Reset:
# Method 1: Automated Scripts (Recommended)
# Windows
scripts\reset_chromadb.bat
# Linux/Mac
./scripts/reset_chromadb.sh
# Method 2: Via REST API
curl -X DELETE "http://localhost:8000/api/v1/knowledge/reset"
# Method 3: Docker Volume Reset
docker-compose down
docker volume rm vector-agent-search_chroma_data
docker-compose up -d
# Method 4: Local Reset (development without Docker)
# Windows PowerShell
Remove-Item -Recurse -Force .\chroma_store
# Linux/Mac
rm -rf ./chroma_storeDiagnosis and Verification:
# Check application and ChromaDB status
curl http://localhost:8000/health
# Verify persisted data
docker exec -it vector-search-fastapi ls -la /app/chroma_store
# Check collection status
docker exec -it vector-search-fastapi python -c "
from app.services.vector_service import vector_service
print('Vector DB ready:', vector_service._is_vectordb_ready())
print('Scope DB ready:', vector_service._is_scope_db_ready())
print('Statistics:', vector_service.get_automotive_statistics())
"
# Check Docker volumes
docker volume ls | grep chroma
docker volume inspect vector-agent-search_chroma_dataData Reloading:
# Load existing knowledge base
curl -X POST "http://localhost:8000/api/v1/knowledge/load"
# Vectorize loaded documents
curl -X POST "http://localhost:8000/api/v1/knowledge/vectorize"
# Upload new documents
curl -X POST "http://localhost:8000/api/v1/files/upload" \
-F "files=@document.pdf" \
-F "files=@data.json"
# Check available files
curl http://localhost:8000/api/v1/files/knowledge| Volume | Purpose |
|---|---|
chroma_data |
Persistent ChromaDB database |
./app |
Application code (mounted for development) |
./knowledge |
Knowledge base documents (mounted as volume) |
This section provides guidance for developers working on the Vector Search Agent codebase.
For development without Docker (requires Python 3.12.0):
Install Dependencies:
# Verify Python version
python --version # Should show Python 3.12.0
pip install -r requirements.txt
pip install -r requirements.chainlit.txtConfigure Environment:
export GOOGLE_API_KEY=your_api_key_here
export CHROMA_PERSIST_DIRECTORY=./chroma_storeRun Services:
# FastAPI
uvicorn app.main:app --reload --port 8000
# Chainlit
chainlit run frontend/app.py --port 8001The FastAPI container is configured with volume bind mount for hot reload during development. Changes to code in ./app/ are reflected automatically without container restart.
This application uses Python 3.12.0 features for improved performance and code clarity.
Performance Improvements:
- Runtime optimizations for faster code execution
- Better memory management with reduced RAM consumption
- Improved bytecode cache for faster application startup
Modern Syntax:
- Native type hints:
list[str]instead ofList[str] - Union syntax:
str | Noneinstead ofUnion[str, None] - Match statements for complex conditional logic
Example of Modern Syntax:
def extract_confidence_from_text(text: str) -> str | None:
"""Extract confidence using modern match syntax"""
text_lower = text.lower()
match text_lower:
case s if "high confidence" in s:
return "High"
case s if "medium confidence" in s:
return "Medium"
case s if "low confidence" in s:
return "Low"
case _:
return NoneThis section covers monitoring capabilities and diagnostic procedures for maintaining system health.
# View all logs
docker-compose logs
# View specific service logs
docker-compose logs fastapi-agent
docker-compose logs chainlit-frontend
# Follow logs in real-time
docker-compose logs -f
# Filtered logs
docker-compose logs fastapi-agent | grep -E "(ERROR|WARNING|INFO)"
docker-compose logs fastapi-agent | grep -E "(POST|GET|PUT|DELETE)"| Endpoint | Purpose |
|---|---|
| http://localhost:8000/health | FastAPI service health |
| http://localhost:8000/docs | API documentation |
# Real-time resource monitoring
docker stats --format "table {{.Container}}\t{{.CPUPerc}}\t{{.MemUsage}}\t{{.MemPerc}}"
# Check space used by volumes
docker system df
docker volume ls --format "table {{.Name}}\t{{.Driver}}\t{{.Scope}}"
# Container performance analysis
docker exec -it vector-search-fastapi python -c "
import psutil
print(f'CPU Usage: {psutil.cpu_percent()}%')
print(f'Memory Usage: {psutil.virtual_memory().percent}%')
print(f'Disk Usage: {psutil.disk_usage(\"/\").percent}%')
"This section outlines security recommendations for production deployments.
For production environments, implement the following security measures:
| Area | Recommendation |
|---|---|
| CORS | Configure allowed origins in app/main.py |
| Authentication | Implement authentication on all endpoints |
| Rate Limiting | Add request rate limiting to prevent abuse |
| HTTPS | Configure SSL/TLS certificates for encrypted communication |
| Firewall | Restrict access to container ports |
Store all sensitive configuration in environment variables:
- API keys (Google Gemini, etc.)
- Authentication tokens
- Database connection strings
- Any other credentials
Never commit sensitive information to version control.
This section addresses common issues and their solutions.
Error Message:
ChromaDB.errors.InvalidArgumentError: Collection expecting embedding with dimension of 384, got 768
Solution:
docker-compose down
docker volume rm vector-agent-search_chroma_data
docker-compose up -dDiagnosis:
docker volume ls
docker exec -it vector-search-fastapi ls -la /app/chroma_storeSolution: Verify volume mounts are correctly configured in docker-compose.yml.
Diagnosis:
python tests/check_gemini_config.py
python tests/test_gemini_integration.py
echo $GOOGLE_API_KEYSolution: Verify API key is correctly set in environment variables.
Diagnosis:
docker exec -it vector-search-chainlit curl http://fastapi-agent:8000/healthSolution: Verify both containers are on the same Docker network and FastAPI is running.
This section provides guidance for optimizing system performance.
| Parameter | Purpose | Recommendation |
|---|---|---|
| Embeddings | Vector dimension affects quality and speed | Use smaller models for high-volume production |
| CHUNK_SIZE | Document chunk size | Adjust based on document characteristics |
| CHUNK_OVERLAP | Overlap between chunks | Balance context preservation with storage |
| GEMINI_TEMPERATURE | Response creativity | Lower for more deterministic responses |
| GEMINI_MAX_TOKENS | Response length limit | Set based on expected response size |
# Container resource usage
docker stats
# Volume space usage
docker system df
# Cleanup unused resources
docker system pruneWe welcome contributions to improve the Vector Search Agent.
- Fork the project
- Create a feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
This project is licensed under the MIT License. See the LICENSE file for details.
This project builds upon several excellent open-source technologies:
| Technology | Purpose |
|---|---|
| LangChain | Framework for LLM applications |
| LangGraph | Building agents with graphs |
| FastAPI | Modern web framework for APIs |
| Chainlit | Chat interface for LLM applications |
| ChromaDB | Vector database |
| Google Gemini | Large language model and embeddings |