Multi-agent system for querying OEWS employment data using natural language, built with LangGraph, LangChain, and FastAPI.
The system uses a Planner-Executor pattern with specialized sub-agents:
- Planner - Creates execution plan from user query (DeepSeek-R1 reasoning model)
- Executor - Routes to appropriate agents based on plan
- Cortex Researcher (Text2SQL) - Queries OEWS database with secure parameterized queries
- Chart Generator - Creates chart specifications for visualizations
- Synthesizer - Creates text summaries of findings
- Response Formatter - Formats final JSON response for API
- Task 1.2: Schema metadata with LLM-optimized descriptions
- Task 1.3: YAML-based LLM configuration (DeepSeek, GPT-4o, Ollama support)
- Task 1.4: Multi-provider LLM factory (Azure AI, OpenAI, Anthropic, Ollama)
- Task 2.1: Parameterized query tools with SQL injection protection
get_schema_info,validate_sql,execute_sql_querysearch_areas,search_occupations
- Task 5.1: LangGraph state management with MessagesState
- Task 6.3: Executor node with routing logic and replan handling
- Task 6.4: Complete workflow assembly with all agents
- Task 7.1: Pydantic models and REST endpoints
/api/v1/query- Process natural language queries/api/v1/models- List available LLM models/health- Health check endpoint
- Task 7.2: Server entry point and startup scripts
- Task 8.1: Large result set handling (>1000 rows auto-summarized)
pip install -e .# Required for Azure AI models (DeepSeek)
export AZURE_AI_API_KEY="your-key-here"
export AZURE_AI_ENDPOINT="https://your-endpoint.azure.com"
# Or use OpenAI
export OPENAI_API_KEY="your-key-here"
# Database configuration
export DATABASE_ENV="dev" # or "prod" for Azure SQL
export SQLITE_DB_PATH="data/oews.db"# Using the startup script
./scripts/start_server.sh
# Or directly with Python
python -m src.mainThe API will be available at http://localhost:8000
# Health check
curl http://localhost:8000/health
# Query example
curl -X POST http://localhost:8000/api/v1/query \
-H "Content-Type: application/json" \
-d '{
"query": "What are the median salaries for software developers in Seattle?",
"enable_charts": false
}'
# List available models
curl http://localhost:8000/api/v1/modelsInteractive API docs available at:
- Swagger UI:
http://localhost:8000/docs - ReDoc:
http://localhost:8000/redoc
src/
├── agents/ # LangGraph workflow nodes
│ ├── planner.py # Plan creation
│ ├── executor.py # Agent routing
│ ├── text2sql_agent.py # Database queries
│ ├── chart_generator.py # Visualization specs
│ ├── response_formatter.py # API output
│ └── state.py # Workflow state
├── api/ # FastAPI application
│ ├── endpoints.py # REST endpoints
│ └── models.py # Pydantic models
├── config/ # Configuration
│ ├── llm_config.py # Model registry
│ └── llm_factory.py # LLM instantiation
├── database/ # Data layer
│ ├── connection.py # DB abstraction
│ └── schema.py # Schema metadata
├── prompts/ # Prompt templates
│ ├── planner_prompts.py
│ └── executor_prompts.py
├── tools/ # LangChain tools
│ └── database_tools.py # Secure SQL tools
├── workflow/ # LangGraph assembly
│ └── graph.py # Workflow graph
└── main.py # Server entry point
config/
└── llm_models.yaml # Model configuration
tests/ # Test suite
├── test_database.py
├── test_schema.py
├── test_llm_config.py
├── test_llm_factory.py
└── test_database_tools.py
- Parameterized Queries: All SQL queries use
?placeholders to prevent SQL injection - SQL Validation: Dangerous operations (DROP, DELETE, etc.) are rejected
- Connection Pooling: Production-ready database connection management
- API Key Management: Secure environment variable configuration
Edit config/llm_models.yaml to:
- Add/remove LLM models
- Change default reasoning/implementation models
- Configure cost tracking and model parameters
# Run all tests
pytest
# Run specific test file
pytest tests/test_database_tools.py -v
# Run with coverage
pytest --cov=src --cov-report=htmlRecent commits implementing the plan:
aed862f- FastAPI endpoints, server entry point, large result handlingabe032d- Executor node, workflow assembly, agents138568e- Secure database tools with parameterized queries44b934f- LLM factory for multi-provider supportc4e790f- YAML-based LLM configurationab16ba2- Schema metadata with security notese817980- Database connection with parameterized queries
- Populate Database: Import OEWS data using existing CLI tools
- Add Tests: Write integration tests for workflow execution
- Web Research: Implement Tavily integration for external data
- Frontend: Build Next.js UI consuming the API
- Deployment: Deploy to Azure with production SQL database
- LangGraph for workflow orchestration (vs. custom state machine)
- Planner-Executor pattern for flexible multi-agent routing
- Parameterized queries for security (vs. string formatting)
- Large result summarization for performance (>1000 rows)
- YAML configuration for model flexibility (vs. hardcoded)
- ReAct agents for tool-using agents (Text2SQL, Charts)
The system tracks which models are used for each agent:
- Planner: reasoning model (default: DeepSeek-R1)
- Agents: implementation model (default: DeepSeek-V3)
- Can override per-request via API
Metadata includes:
- Models used per agent
- Execution time
- Plan structure
- Number of replans
- Workflow Graph Initialization: Created once at startup (lifespan context)
- Connection Pooling: Reuses database connections in production
- Large Result Handling: Auto-summarizes queries returning >1000 rows
- Agent Caching: Agent instances reused across requests
For issues or questions:
- Check API documentation at
/docs - Review test files for usage examples
- See plan analysis in
docs/plans/