Docker Setup for DataDetox

This guide explains how to run the entire DataDetox application using Docker Compose.

Prerequisites

Docker Desktop installed (Get Docker)
Docker Compose (included with Docker Desktop)

Quick Start

Copy the environment file:
```
cp .env.example .env
```

Edit .env with your credentials:

# Required: Your OpenAI API key
OPENAI_API_KEY=sk-proj-...

# Required: Your HuggingFace token
HF_TOKEN=hf_...

# Required: Your Neo4j credentials
NEO4J_URI=neo4j+s://your-instance.databases.neo4j.io
NEO4J_USER=neo4j
NEO4J_PASSWORD=your-password

Start all services:

docker-compose up --build

Or run in detached mode:

docker-compose up -d --build

Access the application:
- Frontend: http://localhost:3000
- Chatbot: http://localhost:3000/chatbot
- Backend API: http://localhost:8000
- API Docs: http://localhost:8000/docs

Services

The docker-compose.yml starts the following services:

1. Backend (Port 8000)

FastAPI application
Handles search agent queries
Connects to Neo4j and HuggingFace

2. Frontend (Port 3000)

React + Vite application
User interface for the chatbot
Pre-configured to connect to backend at http://localhost:8000

3. Neo4j (Ports 7474, 7687) - Optional

Graph database for model lineage
Browser UI at http://localhost:7474
Can use cloud Neo4j instead (configure in .env)

4. Model Lineage Scraper - Optional

Scrapes HuggingFace model relationships
Populates Neo4j database

Common Commands

Start services

docker-compose up

Start in background

docker-compose up -d

Rebuild containers

docker-compose up --build

Stop services

docker-compose down

View logs

# All services
docker-compose logs -f

# Specific service
docker-compose logs -f backend
docker-compose logs -f frontend

Restart a service

docker-compose restart backend

Troubleshooting

Port already in use

If you get "port already in use" errors:

# Stop any running processes
lsof -ti:3000 | xargs kill -9
lsof -ti:8000 | xargs kill -9

Environment variables not loading

Make sure:

.env file exists in the project root
All required variables are set
No extra quotes around values

Backend can't connect to Neo4j

Check your Neo4j credentials in .env:

Ensure NEO4J_URI includes the protocol (neo4j+s://)
Verify username and password are correct
Test connection at https://neo4j.com/cloud

Frontend shows API errors

Check backend is running: docker-compose ps
View backend logs: docker-compose logs backend
Test API directly: http://localhost:8000/docs

Development Mode

For development with hot-reload, you can run services separately:

Backend only

docker-compose up backend

Frontend only (assumes backend is running)

docker-compose up frontend

Production Deployment

For production:

Remove volume mounts from docker-compose.yml
Use production-grade secrets management
Set proper Neo4j authentication
Configure CORS properly in backend
Use environment-specific .env files

File Structure

.
├── docker-compose.yml      # Orchestrates all services
├── .env                    # Your secrets (git-ignored)
├── .env.example           # Template for .env
├── backend/
│   ├── Dockerfile         # Backend container config
│   └── .dockerignore      # Files to exclude from build
└── frontend/
    ├── Dockerfile         # Frontend container config
    └── .dockerignore      # Files to exclude from build

Next Steps

After starting the services:

Visit http://localhost:3000/chatbot
Try example queries like "Tell me about BERT models"
Check the backend logs to see the agent in action
Explore the API at http://localhost:8000/docs

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Docker Setup for DataDetox

Prerequisites

Quick Start

Services

1. Backend (Port 8000)

2. Frontend (Port 3000)

3. Neo4j (Ports 7474, 7687) - Optional

4. Model Lineage Scraper - Optional

Common Commands

Start services

Start in background

Rebuild containers

Stop services

View logs

Restart a service

Troubleshooting

Port already in use

Environment variables not loading

Backend can't connect to Neo4j

Frontend shows API errors

Development Mode

Backend only

Frontend only (assumes backend is running)

Production Deployment

File Structure

Next Steps

FilesExpand file tree

DOCKER_SETUP.md

Latest commit

History

DOCKER_SETUP.md

File metadata and controls

Docker Setup for DataDetox

Prerequisites

Quick Start

Services

1. Backend (Port 8000)

2. Frontend (Port 3000)

3. Neo4j (Ports 7474, 7687) - Optional

4. Model Lineage Scraper - Optional

Common Commands

Start services

Start in background

Rebuild containers

Stop services

View logs

Restart a service

Troubleshooting

Port already in use

Environment variables not loading

Backend can't connect to Neo4j

Frontend shows API errors

Development Mode

Backend only

Frontend only (assumes backend is running)

Production Deployment

File Structure

Next Steps