Skip to content

SF Intelligence Platform - FastAPI backend with RAG, FAISS, vLLM

Notifications You must be signed in to change notification settings

bledden/sf-intelligence-backend

Repository files navigation

SF GPT - San Francisco Government AI Assistant

AI-powered assistant for querying San Francisco government data, built for NVIDIA Spark Hack 2026.

Overview

SF GPT provides natural language access to San Francisco's 160,000+ government datasets from data.sfgov.org. Using NVIDIA AI models running locally on Dell GB10 (DGX Spark), it enables:

  1. Natural Language Queries - Ask questions about any SF government data
  2. Semantic Search - Find relevant information across all datasets
  3. Voice Input - Speak your questions using ASR
  4. SQL Generation - Generate SQL queries from natural language
  5. RAG Pipeline - Retrieval-augmented generation for accurate answers

Models Used (January 2026)

Model HuggingFace ID VRAM Purpose
Omni-Embed-Nemotron-3B nvidia/omni-embed-nemotron-3b ~10 GB Universal multimodal embeddings
Nemotron 3 Nano nvidia/nemotron-3-nano ~8 GB Reasoning & generation
Nemotron Speech ASR nvidia/nemotron-speech-asr ~2 GB Voice transcription
PersonaPlex-7B (optional) nvidia/personaplex-7b-v1 ~14 GB Conversational voice

Total VRAM: ~20 GB required (GB10 has 128GB unified memory)

Quick Start

Prerequisites

  • uv - Fast Python package manager
  • bun - Fast JavaScript runtime (for frontend)
  • Python 3.12+
  • CUDA-capable GPU (or run on CPU with reduced performance)

1. Install Dependencies

# Install uv (macOS/Linux)
curl -LsSf https://astral.sh/uv/install.sh | sh

# Clone and setup
cd ~/Documents/311ML
uv sync

# Activate virtual environment
source .venv/bin/activate

2. Download Models

# Download required models (embeddings, llm, asr)
uv run python scripts/download_models.py

# Or download specific models
uv run python scripts/download_models.py --models embeddings llm

# Download all including optional voice model
uv run python scripts/download_models.py --models all

3. Start the API

# Start FastAPI server
uv run uvicorn backend.main:app --host 0.0.0.0 --port 8000 --reload

# Or using python directly
uv run python -m backend.main

4. Test the API

# Health check
curl http://localhost:8000/api/health

# Query SF data (requires index to be built)
curl -X POST http://localhost:8000/api/query \
  -H "Content-Type: application/json" \
  -d '{"question": "What is the population of San Francisco?"}'

# Semantic search
curl -X POST http://localhost:8000/api/search \
  -H "Content-Type: application/json" \
  -d '{"query": "crime statistics Mission District"}'

API Endpoints

Endpoint Method Description
/api/health GET System health and model status
/api/health/gpu GET Detailed GPU information
/api/query POST Natural language query with RAG
/api/query/sql POST Generate SQL from natural language
/api/query/examples GET Example queries
/api/search POST Semantic search across datasets
/api/search/status GET Search index status
/api/voice/transcribe POST Audio to text transcription
/api/voice/query POST Voice-based query (ASR + RAG)
/api/datasets GET List available datasets
/api/datasets/{id} GET Get dataset details

Project Structure

311ML/
├── pyproject.toml              # Python dependencies (uv)
├── .python-version             # Python version (3.12)
├── backend/
│   ├── __init__.py
│   ├── main.py                 # FastAPI app entry point
│   ├── config.py               # Model & app configuration
│   ├── api/
│   │   ├── schemas.py          # Pydantic models
│   │   └── routes/
│   │       ├── health.py       # Health endpoints
│   │       ├── query.py        # RAG query endpoints
│   │       ├── search.py       # Semantic search
│   │       ├── voice.py        # ASR endpoints
│   │       └── datasets.py     # Dataset management
│   ├── ml/
│   │   ├── embeddings.py       # Omni-Embed + FAISS
│   │   ├── llm.py              # Nemotron LLM
│   │   ├── asr.py              # Speech recognition
│   │   └── rag.py              # RAG pipeline orchestrator
│   └── ingestion/
│       ├── sf_data_client.py   # SF Open Data API client
│       └── document_processor.py # Document chunking
├── scripts/
│   ├── download_models.py      # Model downloader
│   ├── process_data.py         # Data processing
│   └── setup_gb10.sh           # GB10 setup
├── frontend-resident/          # Public-facing app (bun)
├── frontend-dashboard/         # Internal dashboard (bun)
└── data/
    ├── raw/                    # Downloaded datasets
    ├── processed/              # Processed data
    └── embeddings/             # FAISS index

Data Categories

SF GPT can query data across all San Francisco government categories:

  • City Infrastructure - Streets, utilities, facilities
  • Public Safety - Police, fire, emergency services
  • Health and Social Services - Healthcare, social programs
  • Transportation - MUNI, parking, traffic
  • Housing and Buildings - Permits, inspections, housing
  • Economy and Community - Businesses, jobs, events
  • Energy and Environment - Sustainability, parks
  • Geographic Locations - Boundaries, districts, maps

Development

Backend Commands

# Install dependencies
uv sync

# Run tests
uv run pytest

# Format code
uv run ruff format .

# Lint code
uv run ruff check .

Building the Search Index

# Ingest SF Open Data (downloads and processes datasets)
uv run python scripts/ingest_sfdata.py

# Build FAISS embeddings index
uv run python scripts/build_index.py

GB10 Deployment

# SSH into GB10
ssh <username>@<gb10-ip>

# Clone repository
git clone <repo-url>
cd 311ML

# Run setup script
bash scripts/setup_gb10.sh

# Start API
uv run uvicorn backend.main:app --host 0.0.0.0 --port 8000

Hackathon Bounties

NVIDIA Nemotron Track

  • Omni-Embed-Nemotron-3B for universal embeddings
  • Nemotron 3 Nano for reasoning and generation
  • All NVIDIA models running locally on GB10

Arm Architecture Innovation

  • Dell GB10 uses Grace Hopper (Arm Neoverse V2)
  • 128GB unified memory = zero CPU/GPU transfer overhead
  • Optimized for edge deployment

Human Impact

  • Access to 160,000+ government datasets via natural language
  • Democratizes access to city data
  • Supports government transparency

Team

Role Focus
Backend/ML Lead NVIDIA models, RAG pipeline, API
Frontend - Resident Public query interface
Frontend - Dashboard Admin & analytics dashboard
Data Engineering Ingestion, embeddings, infrastructure

License

MIT License - Built for NVIDIA Spark Hack 2026

About

SF Intelligence Platform - FastAPI backend with RAG, FAISS, vLLM

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •