AI-powered assistant for querying San Francisco government data, built for NVIDIA Spark Hack 2026.
SF GPT provides natural language access to San Francisco's 160,000+ government datasets from data.sfgov.org. Using NVIDIA AI models running locally on Dell GB10 (DGX Spark), it enables:
- Natural Language Queries - Ask questions about any SF government data
- Semantic Search - Find relevant information across all datasets
- Voice Input - Speak your questions using ASR
- SQL Generation - Generate SQL queries from natural language
- RAG Pipeline - Retrieval-augmented generation for accurate answers
| Model | HuggingFace ID | VRAM | Purpose |
|---|---|---|---|
| Omni-Embed-Nemotron-3B | nvidia/omni-embed-nemotron-3b |
~10 GB | Universal multimodal embeddings |
| Nemotron 3 Nano | nvidia/nemotron-3-nano |
~8 GB | Reasoning & generation |
| Nemotron Speech ASR | nvidia/nemotron-speech-asr |
~2 GB | Voice transcription |
| PersonaPlex-7B (optional) | nvidia/personaplex-7b-v1 |
~14 GB | Conversational voice |
Total VRAM: ~20 GB required (GB10 has 128GB unified memory)
- uv - Fast Python package manager
- bun - Fast JavaScript runtime (for frontend)
- Python 3.12+
- CUDA-capable GPU (or run on CPU with reduced performance)
# Install uv (macOS/Linux)
curl -LsSf https://astral.sh/uv/install.sh | sh
# Clone and setup
cd ~/Documents/311ML
uv sync
# Activate virtual environment
source .venv/bin/activate# Download required models (embeddings, llm, asr)
uv run python scripts/download_models.py
# Or download specific models
uv run python scripts/download_models.py --models embeddings llm
# Download all including optional voice model
uv run python scripts/download_models.py --models all# Start FastAPI server
uv run uvicorn backend.main:app --host 0.0.0.0 --port 8000 --reload
# Or using python directly
uv run python -m backend.main# Health check
curl http://localhost:8000/api/health
# Query SF data (requires index to be built)
curl -X POST http://localhost:8000/api/query \
-H "Content-Type: application/json" \
-d '{"question": "What is the population of San Francisco?"}'
# Semantic search
curl -X POST http://localhost:8000/api/search \
-H "Content-Type: application/json" \
-d '{"query": "crime statistics Mission District"}'| Endpoint | Method | Description |
|---|---|---|
/api/health |
GET | System health and model status |
/api/health/gpu |
GET | Detailed GPU information |
/api/query |
POST | Natural language query with RAG |
/api/query/sql |
POST | Generate SQL from natural language |
/api/query/examples |
GET | Example queries |
/api/search |
POST | Semantic search across datasets |
/api/search/status |
GET | Search index status |
/api/voice/transcribe |
POST | Audio to text transcription |
/api/voice/query |
POST | Voice-based query (ASR + RAG) |
/api/datasets |
GET | List available datasets |
/api/datasets/{id} |
GET | Get dataset details |
311ML/
├── pyproject.toml # Python dependencies (uv)
├── .python-version # Python version (3.12)
├── backend/
│ ├── __init__.py
│ ├── main.py # FastAPI app entry point
│ ├── config.py # Model & app configuration
│ ├── api/
│ │ ├── schemas.py # Pydantic models
│ │ └── routes/
│ │ ├── health.py # Health endpoints
│ │ ├── query.py # RAG query endpoints
│ │ ├── search.py # Semantic search
│ │ ├── voice.py # ASR endpoints
│ │ └── datasets.py # Dataset management
│ ├── ml/
│ │ ├── embeddings.py # Omni-Embed + FAISS
│ │ ├── llm.py # Nemotron LLM
│ │ ├── asr.py # Speech recognition
│ │ └── rag.py # RAG pipeline orchestrator
│ └── ingestion/
│ ├── sf_data_client.py # SF Open Data API client
│ └── document_processor.py # Document chunking
├── scripts/
│ ├── download_models.py # Model downloader
│ ├── process_data.py # Data processing
│ └── setup_gb10.sh # GB10 setup
├── frontend-resident/ # Public-facing app (bun)
├── frontend-dashboard/ # Internal dashboard (bun)
└── data/
├── raw/ # Downloaded datasets
├── processed/ # Processed data
└── embeddings/ # FAISS index
SF GPT can query data across all San Francisco government categories:
- City Infrastructure - Streets, utilities, facilities
- Public Safety - Police, fire, emergency services
- Health and Social Services - Healthcare, social programs
- Transportation - MUNI, parking, traffic
- Housing and Buildings - Permits, inspections, housing
- Economy and Community - Businesses, jobs, events
- Energy and Environment - Sustainability, parks
- Geographic Locations - Boundaries, districts, maps
# Install dependencies
uv sync
# Run tests
uv run pytest
# Format code
uv run ruff format .
# Lint code
uv run ruff check .# Ingest SF Open Data (downloads and processes datasets)
uv run python scripts/ingest_sfdata.py
# Build FAISS embeddings index
uv run python scripts/build_index.py# SSH into GB10
ssh <username>@<gb10-ip>
# Clone repository
git clone <repo-url>
cd 311ML
# Run setup script
bash scripts/setup_gb10.sh
# Start API
uv run uvicorn backend.main:app --host 0.0.0.0 --port 8000- Omni-Embed-Nemotron-3B for universal embeddings
- Nemotron 3 Nano for reasoning and generation
- All NVIDIA models running locally on GB10
- Dell GB10 uses Grace Hopper (Arm Neoverse V2)
- 128GB unified memory = zero CPU/GPU transfer overhead
- Optimized for edge deployment
- Access to 160,000+ government datasets via natural language
- Democratizes access to city data
- Supports government transparency
| Role | Focus |
|---|---|
| Backend/ML Lead | NVIDIA models, RAG pipeline, API |
| Frontend - Resident | Public query interface |
| Frontend - Dashboard | Admin & analytics dashboard |
| Data Engineering | Ingestion, embeddings, infrastructure |
MIT License - Built for NVIDIA Spark Hack 2026