- Python 3.11+
- PostgreSQL 16+
- Groq API key (console.groq.com)
- IBM AML dataset (HI-Small) from Kaggle placed in
data/raw/
git clone https://github.com/yourusername/FIU_CoAgents.git
cd FIU_CoAgents
cp .env.example .envEdit .env with your values:
GROQ_API_KEY=gsk_your_actual_key
DATABASE_URL=postgresql://fiu_user:fiu_pass@localhost:5432/fiu_coagentspip install -e ".[dev]"
python -m spacy download en_core_web_lg# macOS
brew install postgresql@16
brew services start postgresql@16
createdb fiu_coagents
createuser -s fiu_user
# Initialize schema and load data
make setupThis runs:
scripts/setup_db.py— Creates tables via Alembic migrationsscripts/load_data.py— Loads IBM AML CSV into PostgreSQLscripts/build_memory.py— Builds FAISS indices from regulatory docs
| Variable | Required | Default | Description |
|---|---|---|---|
GROQ_API_KEY |
Yes | — | Groq API key for LLM inference |
DATABASE_URL |
No | postgresql://fiu_user:fiu_pass@localhost:5432/fiu_coagents |
PostgreSQL connection |
GROQ_MODEL_PRIMARY |
No | llama-3.3-70b-versatile |
Primary LLM model |
GROQ_MODEL_FAST |
No | llama-3.1-8b-instant |
Fast LLM model |
EMBEDDING_MODEL |
No | all-MiniLM-L6-v2 |
Sentence transformer model |
FAISS_INDEX_DIR |
No | ./data/faiss_indices |
FAISS index directory |
PRESIDIO_NLP_MODEL |
No | en_core_web_lg |
spaCy model for Presidio |
MLFLOW_TRACKING_URI |
No | sqlite:///mlflow.db |
MLflow tracking URI |
API_HOST |
No | 0.0.0.0 |
FastAPI bind host |
API_PORT |
No | 8000 |
FastAPI bind port |
STREAMLIT_PORT |
No | 8501 |
Streamlit server port |
make run-api
# or
uvicorn src.api.main:app --host 0.0.0.0 --port 8000 --reloadAPI docs available at http://localhost:8000/docs
make run-ui
# or
streamlit run ui/app.py --server.port 8501make run-all# Full test suite with coverage
make test
# Unit tests only
pytest tests/unit/ -v
# Integration tests (requires PostgreSQL)
pytest tests/integration/ -v
# Specific test file
pytest tests/unit/test_typology_agents.py -vCoverage reports are generated in htmlcov/.
# Run full evaluation suite
make eval
# With options
python evaluation/run_eval.py --golden-dir data/golden --report-dir evaluation/reports -vResults are logged to MLflow (sqlite:///mlflow.db) and written as JSON to evaluation/reports/.
make lint
# Runs: ruff check, ruff format --check, mypyThese are not implemented (portfolio project) but would be needed for production:
- Authentication: Add OAuth2/JWT middleware to FastAPI
- TLS: Terminate TLS at a reverse proxy (nginx/Caddy)
- Database: Use connection pooling (pgbouncer), enable WAL archiving
- Secrets: Use a vault (AWS Secrets Manager, HashiCorp Vault) instead of
.env - Containers: Dockerfile provided in
docker/for containerized deployment - Monitoring: Add Prometheus metrics, structured JSON logging, alerting
- Rate limiting: Add rate limiting to API endpoints
- PII encryption: Encrypt PII mappings at rest in PostgreSQL