Skip to content

Commit 604a030

Browse files
committed
docs: update README and Makefile for accuracy
- Fix prerequisites: list all tools as manual installs, correct Python version to 3.13, add uv - Update Using NeuralNav flow to match current UI workflow - Replace inline architecture section with link to ARCHITECTURE.md - Update vLLM simulator section: document runtime toggle via UI/API, correct default to production mode - Fix log command descriptions (show, not tail) - Makefile: suppress misleading note when running make stop-all - Makefile: check for empty database after db-start and remind user to load benchmark data - Delete obsolete docs Signed-off-by: Andre Fredette <afredette@redhat.com>
1 parent b3d0361 commit 604a030

11 files changed

+24
-3958
lines changed

Makefile

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -228,7 +228,9 @@ stop: ## Stop Backend + UI (leaves Ollama and DB running)
228228
@pkill -9 -f "uvicorn neuralnav.api.app:app" 2>/dev/null || true
229229
@printf "$(GREEN)✓ All NeuralNav services stopped$(NC)\n"
230230
@# Don't stop Ollama or DB as they might be used by other apps/tools
231-
@printf "$(YELLOW)Note: Ollama and PostgreSQL left running (use 'make stop-all' to stop everything)$(NC)\n"
231+
@if [ "$(MAKECMDGOALS)" != "stop-all" ]; then \
232+
printf "$(YELLOW)Note: Ollama and PostgreSQL left running (use 'make stop-all' to stop everything)$(NC)\n"; \
233+
fi
232234

233235
restart: stop start ## Restart all services
234236

@@ -425,6 +427,7 @@ db-start: ## Start PostgreSQL (initializes schema on first run)
425427
printf "$(YELLOW)PostgreSQL already running$(NC)\n"; \
426428
else \
427429
$(CONTAINER_TOOL) start neuralnav-postgres; \
430+
sleep 2; \
428431
printf "$(GREEN)✓ PostgreSQL started$(NC)\n"; \
429432
fi \
430433
else \
@@ -438,6 +441,9 @@ db-start: ## Start PostgreSQL (initializes schema on first run)
438441
printf "$(BLUE)Initializing database schema...$(NC)\n"; \
439442
$(CONTAINER_TOOL) exec -i neuralnav-postgres psql -U postgres -d neuralnav < scripts/schema.sql; \
440443
printf "$(GREEN)✓ Schema initialized$(NC)\n"; \
444+
fi
445+
@BENCH_COUNT=$$($(CONTAINER_TOOL) exec -i neuralnav-postgres psql -U postgres -d neuralnav -t -c "SELECT COUNT(*) FROM exported_summaries;" 2>/dev/null | tr -d ' \n'); \
446+
if [ "$$BENCH_COUNT" = "0" ] || [ -z "$$BENCH_COUNT" ]; then \
441447
printf "$(YELLOW)Note: Database is empty. Load benchmark data with one of:$(NC)\n"; \
442448
printf " make db-load-blis # BLIS benchmark data\n"; \
443449
printf " make db-load-estimated # Estimated performance data\n"; \

README.md

Lines changed: 17 additions & 52 deletions
Original file line numberDiff line numberDiff line change
@@ -36,11 +36,11 @@ The code in this repository implements the **NeuralNav Phase 2 MVP** with produc
3636
## Prerequisites
3737

3838
**Required before running `make setup`:**
39+
3940
- **macOS or Linux** (Windows via WSL2)
4041
- **Docker Desktop** (must be running)
41-
42-
**Installed automatically by `make setup`:**
43-
- **Python 3.11+**
42+
- **Python 3.13** - `brew install python@3.13`
43+
- **uv** - `curl -LsSf https://astral.sh/uv/install.sh | sh`
4444
- **Ollama** - `brew install ollama`
4545
- **kubectl** - `brew install kubectl`
4646
- **KIND** - `brew install kind`
@@ -71,41 +71,14 @@ make cluster-stop # Delete cluster (optional)
7171

7272
1. **Describe your use case** in the chat interface
7373
- Example: "I need a customer service chatbot for 5000 users with low latency"
74-
2. **Review recommendations** - Model, GPU configuration, SLO predictions, costs
75-
3. **Edit specifications** if needed (traffic, SLO targets, constraints)
76-
4. **Generate deployment YAML** - Click "Generate Deployment YAML"
77-
5. **Deploy to cluster** - Click "Deploy to Kubernetes"
78-
6. **Monitor deployment** - Switch to "Deployment Management" tab to see status
79-
7. **Test inference** - Send test prompts once deployment is Ready
80-
81-
## Demo Scenarios
82-
83-
The POC includes 3 pre-configured scenarios (see [data/configuration/demo_scenarios.json](data/configuration/demo_scenarios.json)):
84-
85-
1. **Customer Service Chatbot** - High volume (5000 users), strict latency (<500ms)
86-
- Expected: Llama 3.1 8B on 2x A100-80GB
87-
88-
2. **Code Generation Assistant** - Developer team (500 users), quality > speed
89-
- Expected: Llama 3.1 70B on 4x A100-80GB (tensor parallel)
74+
2. **Analyze use case** - Click "Analyze Use Case" to extract intent
75+
3. **Generate specification** - Click "Generate Specification" to create traffic profile and SLO targets
76+
4. **Review specification** - Edit SLO targets, priorities, or constraints if needed
77+
5. **Generate recommendations** - Click "Generate Recommendations" to find optimal configurations
78+
6. **Select a recommendation** - Review ranked options and click "Select"
79+
7. **Deploy** - Go to the "Deployment" tab to review, copy, or download generated deployment files
9080

91-
3. **Document Summarization** - Batch processing (2000 users/day), cost-sensitive
92-
- Expected: Mistral 7B on 2x A10G
93-
94-
## Architecture Highlights
95-
96-
NeuralNav implements an **8-component architecture** with:
97-
98-
- **Conversational Interface** (Streamlit) - Chat-based requirement gathering with interactive exploration
99-
- **Context & Intent Engine** - LLM-powered extraction of deployment specs
100-
- **Recommendation Engine** - Traffic profiling, model scoring, capacity planning
101-
- **Deployment Automation** - YAML generation and Kubernetes deployment
102-
- **Knowledge Base** - Benchmarks, SLO templates, model catalog
103-
- **LLM Backend** - Ollama (qwen2.5:7b) for conversational AI and business context extraction
104-
- **Orchestration** - Multi-step workflow coordination
105-
- **Inference Observability** - Real-time deployment monitoring
106-
107-
**Development Tools:**
108-
- **vLLM Simulator** - GPU-free local development and testing
81+
## Architecture
10982

11083
See [docs/ARCHITECTURE.md](docs/ARCHITECTURE.md) for detailed system design.
11184

@@ -128,10 +101,10 @@ See [docs/ARCHITECTURE.md](docs/ARCHITECTURE.md) for detailed system design.
128101
| Backend | FastAPI, Pydantic |
129102
| Frontend | Streamlit |
130103
| LLM | Ollama (qwen2.5:7b) |
131-
| Data | **PostgreSQL (Phase 2)**, psycopg2, JSON (Phase 1 - deprecated) |
104+
| Data | PostgreSQL |
132105
| YAML Generation | Jinja2 templates |
133106
| Kubernetes | KIND (local), KServe v0.13.0 |
134-
| Deployment | kubectl, Kubernetes Python client |
107+
| Deployment | kubectl |
135108

136109

137110
## Development Commands
@@ -142,8 +115,8 @@ make start # Start all services (DB + Ollama + Backend + UI)
142115
make stop # Stop Backend + UI (leaves Ollama and DB running)
143116
make stop-all # Stop everything including Ollama and DB
144117
make restart # Restart all services
145-
make logs-backend # Tail backend logs
146-
make logs-ui # Tail UI logs
118+
make logs-backend # Show backend logs
119+
make logs-ui # Show UI logs
147120

148121
# Database (PostgreSQL)
149122
make db-start # Start PostgreSQL (initializes schema on first run)
@@ -178,16 +151,10 @@ NeuralNav includes a **GPU-free simulator** for local development:
178151
- **Realistic latency** - Uses benchmark data to simulate TTFT/ITL
179152
- **Fast deployment** - Pods become Ready in ~10-15 seconds
180153

181-
**Simulator Mode (default):**
182-
```python
183-
# In src/neuralnav/api/routes.py
184-
deployment_generator = DeploymentGenerator(simulator_mode=True)
185-
```
154+
The deployment mode defaults to **production** (real vLLM with GPUs). Switch between production and simulator modes at runtime using the **Configuration** tab in the UI, or via the REST API:
186155

187-
**Production Mode (requires GPU cluster):**
188-
```python
189-
deployment_generator = DeploymentGenerator(simulator_mode=False)
190-
```
156+
- `GET /api/v1/deployment-mode` - Check current mode
157+
- `PUT /api/v1/deployment-mode` - Set mode (`{"mode": "simulator"}` or `{"mode": "production"}`)
191158

192159
See [docs/DEVELOPER_GUIDE.md](docs/DEVELOPER_GUIDE.md#vllm-simulator-details) for details.
193160

@@ -196,8 +163,6 @@ See [docs/DEVELOPER_GUIDE.md](docs/DEVELOPER_GUIDE.md#vllm-simulator-details) fo
196163
- **[Developer Guide](docs/DEVELOPER_GUIDE.md)** - Development workflows, testing, debugging
197164
- **[Architecture](docs/ARCHITECTURE.md)** - Detailed system design and component specifications
198165
- **[Traffic and SLOs](docs/traffic_and_slos.md)** - Traffic profile framework and experience-driven SLOs (Phase 2)
199-
- **[PostgreSQL Migration Plan](docs/POSTGRESQL_MIGRATION_PLAN.md)** - Phase 2 migration details
200-
- **[Architecture Diagrams](docs/architecture-diagram.md)** - Visual system representations
201166
- **[Logging Guide](docs/LOGGING.md)** - Logging system and debugging
202167
- **[Claude Code Guidance](CLAUDE.md)** - AI assistant instructions for contributors
203168

0 commit comments

Comments
 (0)