The Vector DB module of InboxAI runs ChromaDB as the vector database backend for storing and querying email embeddings. ChromaDB enables high-performance semantic search and similarity matching using OpenAI-generated embeddings.
This service is deployed via Docker Compose and is integrated with Airflow and MLflow for full-stack experiment tracking and query processing.
- Docker and Docker Compose installed on the server
- Port
8000open and available on the deployment host - External Docker network
chroma_network(shared with Airflow)
| Secret Name | Description |
|---|---|
HOST |
Public IP or DNS of the target server |
SMTP_PASSWORD |
Gmail App Password for notifications |
git clone https://github.com/yourusername/kprakhar27-inboxai.git
cd kprakhar27-inboxai/vector_dbEnsure a shared Docker network exists so Chroma can interact with other services (e.g. Airflow):
docker network create chroma_networkdocker compose up --build -dThis starts the Chroma server at:
📍 http://localhost:8000
You can check Chroma's health via:
curl http://localhost:8000/api/v1/heartbeatExpected response: HTTP 200 OK
The CI/CD pipeline for ChromaDB is defined in .github/workflows/vector-setup.yml. It includes the following steps:
- Checkout repository code
- Perform a health check on the Airflow scheduler & triggerer
- Clean up old ChromaDB deployment (if exists)
- Copy and deploy updated
docker-compose.ymlfor Chroma - Verify health of ChromaDB via
/api/v2/heartbeat - Send email notifications on success or failure
The workflow checks for successful deployment using:
curl -s -o /dev/null -w "%{http_code}" http://<HOST>:8000/api/v2/heartbeatExpected HTTP status: 200
If the check fails, the workflow exits and sends an error alert to the configured email.