@@ -36,11 +36,11 @@ The code in this repository implements the **NeuralNav Phase 2 MVP** with produc
3636## Prerequisites
3737
3838** Required before running ` make setup ` :**
39+
3940- ** macOS or Linux** (Windows via WSL2)
4041- ** Docker Desktop** (must be running)
41-
42- ** Installed automatically by ` make setup ` :**
43- - ** Python 3.11+**
42+ - ** Python 3.13** - ` brew install python@3.13 `
43+ - ** uv** - ` curl -LsSf https://astral.sh/uv/install.sh | sh `
4444- ** Ollama** - ` brew install ollama `
4545- ** kubectl** - ` brew install kubectl `
4646- ** KIND** - ` brew install kind `
@@ -71,41 +71,14 @@ make cluster-stop # Delete cluster (optional)
7171
72721 . ** Describe your use case** in the chat interface
7373 - Example: "I need a customer service chatbot for 5000 users with low latency"
74- 2 . ** Review recommendations** - Model, GPU configuration, SLO predictions, costs
75- 3 . ** Edit specifications** if needed (traffic, SLO targets, constraints)
76- 4 . ** Generate deployment YAML** - Click "Generate Deployment YAML"
77- 5 . ** Deploy to cluster** - Click "Deploy to Kubernetes"
78- 6 . ** Monitor deployment** - Switch to "Deployment Management" tab to see status
79- 7 . ** Test inference** - Send test prompts once deployment is Ready
80-
81- ## Demo Scenarios
82-
83- The POC includes 3 pre-configured scenarios (see [ data/configuration/demo_scenarios.json] ( data/configuration/demo_scenarios.json ) ):
84-
85- 1 . ** Customer Service Chatbot** - High volume (5000 users), strict latency (<500ms)
86- - Expected: Llama 3.1 8B on 2x A100-80GB
87-
88- 2 . ** Code Generation Assistant** - Developer team (500 users), quality > speed
89- - Expected: Llama 3.1 70B on 4x A100-80GB (tensor parallel)
74+ 2 . ** Analyze use case** - Click "Analyze Use Case" to extract intent
75+ 3 . ** Generate specification** - Click "Generate Specification" to create traffic profile and SLO targets
76+ 4 . ** Review specification** - Edit SLO targets, priorities, or constraints if needed
77+ 5 . ** Generate recommendations** - Click "Generate Recommendations" to find optimal configurations
78+ 6 . ** Select a recommendation** - Review ranked options and click "Select"
79+ 7 . ** Deploy** - Go to the "Deployment" tab to review, copy, or download generated deployment files
9080
91- 3 . ** Document Summarization** - Batch processing (2000 users/day), cost-sensitive
92- - Expected: Mistral 7B on 2x A10G
93-
94- ## Architecture Highlights
95-
96- NeuralNav implements an ** 8-component architecture** with:
97-
98- - ** Conversational Interface** (Streamlit) - Chat-based requirement gathering with interactive exploration
99- - ** Context & Intent Engine** - LLM-powered extraction of deployment specs
100- - ** Recommendation Engine** - Traffic profiling, model scoring, capacity planning
101- - ** Deployment Automation** - YAML generation and Kubernetes deployment
102- - ** Knowledge Base** - Benchmarks, SLO templates, model catalog
103- - ** LLM Backend** - Ollama (qwen2.5:7b) for conversational AI and business context extraction
104- - ** Orchestration** - Multi-step workflow coordination
105- - ** Inference Observability** - Real-time deployment monitoring
106-
107- ** Development Tools:**
108- - ** vLLM Simulator** - GPU-free local development and testing
81+ ## Architecture
10982
11083See [ docs/ARCHITECTURE.md] ( docs/ARCHITECTURE.md ) for detailed system design.
11184
@@ -128,10 +101,10 @@ See [docs/ARCHITECTURE.md](docs/ARCHITECTURE.md) for detailed system design.
128101| Backend | FastAPI, Pydantic |
129102| Frontend | Streamlit |
130103| LLM | Ollama (qwen2.5:7b) |
131- | Data | ** PostgreSQL (Phase 2) ** , psycopg2, JSON (Phase 1 - deprecated) |
104+ | Data | PostgreSQL |
132105| YAML Generation | Jinja2 templates |
133106| Kubernetes | KIND (local), KServe v0.13.0 |
134- | Deployment | kubectl, Kubernetes Python client |
107+ | Deployment | kubectl |
135108
136109
137110## Development Commands
@@ -142,8 +115,8 @@ make start # Start all services (DB + Ollama + Backend + UI)
142115make stop # Stop Backend + UI (leaves Ollama and DB running)
143116make stop-all # Stop everything including Ollama and DB
144117make restart # Restart all services
145- make logs-backend # Tail backend logs
146- make logs-ui # Tail UI logs
118+ make logs-backend # Show backend logs
119+ make logs-ui # Show UI logs
147120
148121# Database (PostgreSQL)
149122make db-start # Start PostgreSQL (initializes schema on first run)
@@ -178,16 +151,10 @@ NeuralNav includes a **GPU-free simulator** for local development:
178151- ** Realistic latency** - Uses benchmark data to simulate TTFT/ITL
179152- ** Fast deployment** - Pods become Ready in ~ 10-15 seconds
180153
181- ** Simulator Mode (default):**
182- ``` python
183- # In src/neuralnav/api/routes.py
184- deployment_generator = DeploymentGenerator(simulator_mode = True )
185- ```
154+ The deployment mode defaults to ** production** (real vLLM with GPUs). Switch between production and simulator modes at runtime using the ** Configuration** tab in the UI, or via the REST API:
186155
187- ** Production Mode (requires GPU cluster):**
188- ``` python
189- deployment_generator = DeploymentGenerator(simulator_mode = False )
190- ```
156+ - ` GET /api/v1/deployment-mode ` - Check current mode
157+ - ` PUT /api/v1/deployment-mode ` - Set mode (` {"mode": "simulator"} ` or ` {"mode": "production"} ` )
191158
192159See [ docs/DEVELOPER_GUIDE.md] ( docs/DEVELOPER_GUIDE.md#vllm-simulator-details ) for details.
193160
@@ -196,8 +163,6 @@ See [docs/DEVELOPER_GUIDE.md](docs/DEVELOPER_GUIDE.md#vllm-simulator-details) fo
196163- ** [ Developer Guide] ( docs/DEVELOPER_GUIDE.md ) ** - Development workflows, testing, debugging
197164- ** [ Architecture] ( docs/ARCHITECTURE.md ) ** - Detailed system design and component specifications
198165- ** [ Traffic and SLOs] ( docs/traffic_and_slos.md ) ** - Traffic profile framework and experience-driven SLOs (Phase 2)
199- - ** [ PostgreSQL Migration Plan] ( docs/POSTGRESQL_MIGRATION_PLAN.md ) ** - Phase 2 migration details
200- - ** [ Architecture Diagrams] ( docs/architecture-diagram.md ) ** - Visual system representations
201166- ** [ Logging Guide] ( docs/LOGGING.md ) ** - Logging system and debugging
202167- ** [ Claude Code Guidance] ( CLAUDE.md ) ** - AI assistant instructions for contributors
203168
0 commit comments