This guide provides step-by-step instructions for developing and testing NeuralNav.
- Development Environment Setup
- Component Startup Sequence
- Development Workflows
- Testing
- Debugging
- Making Changes
- Simulator Development
- Clean Up
- Code Quality
- Useful Commands
- Alternative Setup Methods
- Running Services Manually
- Troubleshooting
- Manual Kubernetes Cluster Setup
- YAML Deployment Generation
- vLLM Simulator Details
- Testing Details
Ensure you have all required tools installed:
make check-prereqsThis checks for:
- Docker or Podman (running)
- Python 3.11+
- Ollama
- kubectl
- KIND
NeuralNav supports both Docker and Podman as container runtimes.
| Component | Docker | Podman | Notes |
|---|---|---|---|
PostgreSQL (db-* targets) |
✅ | ✅ | Works with either |
| Simulator build/push/pull | ✅ | ✅ | Works with either |
KIND cluster (cluster-* targets) |
✅ | ❌ | KIND requires Docker |
Docker Compose (docker-* targets) |
✅ | Requires podman-compose |
The Makefile automatically detects which container runtime is available and running:
- If Docker daemon is running: Docker is used (for KIND compatibility)
- If only Podman daemon is running: Podman is used automatically
- If neither daemon is running: Commands will fail with a helpful error
This means if you quit Docker Desktop, the Makefile will automatically use Podman (if its machine is running), and vice versa.
Option 1: Per-command override
CONTAINER_TOOL=podman make db-start
CONTAINER_TOOL=podman make db-load-guidellmOption 2: Export for your shell session
export CONTAINER_TOOL=podman
make db-start
make db-load-guidellm
# All commands in this session will use PodmanOption 3: Create a .env file (persistent)
echo "CONTAINER_TOOL=podman" >> .envThe Makefile automatically loads .env, so all subsequent make commands will use Podman. The .env file is in .gitignore so it won't affect other developers.
Podman on macOS requires a Linux VM to run containers:
# First time setup - initialize the Podman machine
podman machine init
# Start the Podman machine (required before each use, or after reboot)
podman machine startImportant: When starting the Podman machine, you may see this message:
Another process was listening on the default Docker API socket address.
You can still connect Docker API clients by setting DOCKER_HOST using the
following command in your terminal session:
export DOCKER_HOST='unix:///var/folders/.../podman-machine-default-api.sock'
Do NOT set DOCKER_HOST - this redirects the Docker CLI to Podman, which causes confusion. Instead, use CONTAINER_TOOL=podman as described above.
If you switch between Docker and Podman, you may encounter port conflicts:
Error: listen tcp :5432: bind: address already in use
To resolve:
-
Stop and remove the container in the other runtime:
# If switching TO Podman, clean up Docker first: docker stop neuralnav-postgres && docker rm neuralnav-postgres # If switching TO Docker, clean up Podman first: podman stop neuralnav-postgres && podman rm neuralnav-postgres
-
If the port is still in use, check what's holding it:
lsof -i :5432
-
You may need to restart Docker Desktop if it has a stale port binding.
You can use Podman for simple containers while keeping Docker for KIND:
- Keep Docker Desktop installed (required for KIND clusters)
- Use
CONTAINER_TOOL=podmanor.envfor database operations - KIND cluster commands (
make cluster-*) will use Docker automatically viascripts/kind-cluster.sh
Example workflow:
# Use Podman for database
export CONTAINER_TOOL=podman
make db-start
make db-load-guidellm
# KIND still uses Docker (no change needed)
make cluster-startCreate virtual environments and install dependencies:
make setupThis creates a single shared virtual environment in venv/ (at project root) used by both the backend and UI.
The system consists of 4 main components that must start in order:
Purpose: LLM inference for intent extraction
Start:
make start-ollamaManual start:
ollama serveVerify:
curl http://localhost:11434/api/tags
ollama list # Should show qwen2.5:7bPurpose: Recommendation engine, workflow orchestration, API endpoints
Start:
make start-backendManual start:
source venv/bin/activate
uvicorn neuralnav.api.main:app --reload --host 0.0.0.0 --port 8000Verify:
curl http://localhost:8000/health
# Should return: {"status":"healthy"}API Documentation:
- Swagger UI: http://localhost:8000/docs
- ReDoc: http://localhost:8000/redoc
Purpose: Conversational interface, recommendation display
Start:
make start-uiManual start:
source venv/bin/activate
streamlit run ui/app.pyAccess:
Note: UI runs from project root to access docs/ assets
Purpose: Local Kubernetes for deployment testing
Start:
make cluster-startManual start:
scripts/kind-cluster.sh startVerify:
kubectl cluster-info
kubectl get pods -A
make cluster-statusStart all services:
make startMake code changes, then:
- Backend changes: Auto-reloads (uvicorn
--reloadflag) - UI changes: Refresh browser (Streamlit auto-detects changes)
- Data changes: Restart backend to reload JSON files
Stop services:
make stop # Stop Backend + UI (leaves Ollama and DB running)
make stop-all # Stop everything including Ollama and DBBackend only:
make start-backend
make logs-backend # Tail logsUI only (requires backend running):
make start-ui
make logs-uiTest API endpoints:
# Get recommendation
curl -X POST http://localhost:8000/api/v1/recommend \
-H "Content-Type: application/json" \
-d '{"message": "I need a chatbot for 1000 users"}'Benchmark data can be managed via the CLI (make targets), the REST API, or the UI's Configuration tab.
CLI (local development):
make db-load-blis # Load BLIS benchmark data
make db-load-guidellm # Load GuideLLM benchmark data
make db-reset # Reset database (remove all data and reinitialize)REST API (remote/Kubernetes deployments):
# Check database status
curl http://localhost:8000/api/v1/db/status
# Upload a benchmark JSON file
curl -X POST -F 'file=@data/benchmarks/performance/benchmarks_BLIS.json' \
http://localhost:8000/api/v1/db/upload-benchmarks
# Reset database (remove all benchmark data)
curl -X POST http://localhost:8000/api/v1/db/resetUI (Configuration tab):
- Open the UI at http://localhost:8501
- Go to the Configuration tab
- Use Upload Benchmarks to load a JSON file with a top-level
benchmarksarray - Use Reset Database to remove all benchmark data
- Database statistics (total benchmarks, models, hardware types) are displayed at the top and refresh after each action
All loading methods are append-mode — duplicates (same model/hardware/traffic/load config) are silently skipped via ON CONFLICT (config_id) DO NOTHING.
Core loading logic lives in src/neuralnav/knowledge_base/loader.py and is shared by the CLI script, API endpoints, and UI.
Create cluster:
make cluster-start
# Builds simulator, creates cluster, loads imageDeploy from UI:
- Get recommendation
- Generate YAML
- Click "Deploy to Kubernetes"
- Monitor in "Deployment Management" tab
Manual deployment:
# After generating YAML via UI
kubectl apply -f generated_configs/kserve-inferenceservice.yaml
kubectl get inferenceservices
kubectl get podsClean up deployments:
make clean-deployments # Delete all InferenceServicesRestart cluster:
make cluster-restart # Fresh clusterTest individual components without external dependencies:
make test-unitTest PostgreSQL benchmark queries using an isolated neuralnav_test database
with static fixture data (your production database is never touched):
make test-dbRequires PostgreSQL running (make db-start).
Test the full recommendation workflow including LLM-powered intent extraction:
make test-integrationRequires Ollama running with qwen2.5:7b model and PostgreSQL.
make testRuns all three tiers: unit, database, and integration.
NeuralNav implements comprehensive logging to help you debug and monitor the system. For complete logging documentation, see docs/LOGGING.md.
Quick Start:
Enable debug logging to see full LLM prompts and responses:
# Enable debug mode
export NEURALNAV_DEBUG=true
make start-backend
# Or inline:
NEURALNAV_DEBUG=true make start-backendLog Levels:
- INFO (default): User requests, workflow steps, LLM metadata, results
- DEBUG: Full LLM prompts, complete responses, detailed timing
Log Locations:
- Console output (stdout/stderr)
logs/backend.log- Main application logslogs/neuralnav.log- Structured detailed logs
Common Log Searches:
# View all user requests
grep "\[USER MESSAGE\]" logs/backend.log
# View LLM prompts (DEBUG mode only)
grep "\[LLM PROMPT\]" logs/backend.log
# View extracted intents
grep "\[EXTRACTED INTENT\]" logs/backend.log
# Follow a complete request flow
grep -A 50 "USER REQUEST" logs/backend.logLog Tags:
[USER REQUEST]- User request start[USER MESSAGE]- User's actual message[LLM REQUEST]- Request to LLM (metadata)[LLM PROMPT]- Full prompt text (DEBUG only)[LLM RESPONSE]- Response from LLM (metadata)[LLM RESPONSE CONTENT]- Full response text (DEBUG only)[EXTRACTED INTENT]- Parsed intent from LLMStep 1,Step 2, etc. - Workflow progress
Privacy Note: DEBUG mode logs contain full user messages and LLM interactions. Only use in development/testing.
Backend logs:
make logs-backend
# Or manually:
tail -f .pids/backend.pid.log
# Or for detailed logs:
tail -f logs/backend.logUI logs:
make logs-ui
# Or manually:
tail -f .pids/ui.pid.logKubernetes pod logs:
kubectl logs -f <pod-name>
kubectl describe pod <pod-name>make healthChecks:
- Backend: http://localhost:8000/health
- UI: http://localhost:8501
- Ollama: http://localhost:11434/api/tags
Test LLM client directly:
source venv/bin/activate
python -c "
from neuralnav.llm.ollama_client import OllamaClient
from neuralnav.intent_extraction.extractor import IntentExtractor
client = OllamaClient()
extractor = IntentExtractor(client)
message = 'I need a chatbot for 5000 users with low latency'
intent = extractor.extract_intent(message)
print(intent)
"Test recommendation engine:
source venv/bin/activate
python -c "
from neuralnav.orchestration.workflow import RecommendationWorkflow
workflow = RecommendationWorkflow()
rec = workflow.generate_recommendation('I need a chatbot for 1000 users')
print(rec)
"Check InferenceService status:
kubectl get inferenceservices
kubectl describe inferenceservice <deployment-id>Check pod status:
kubectl get pods
kubectl describe pod <pod-name>
kubectl logs <pod-name>Port-forward to service:
kubectl port-forward svc/<deployment-id>-predictor 8080:80
curl http://localhost:8080/health- Add model to
data/configuration/model_catalog.json:
{
"model_id": "new-model-id",
"name": "New Model Name",
"size_parameters": "7B",
"context_length": 8192,
"supported_tasks": ["chat", "instruction_following"],
"recommended_for": ["chatbot"],
"domain_specialization": ["general"]
}- Add benchmarks to the benchmark database
- Restart backend:
make restart
- Add template to
data/configuration/slo_templates.json:
{
"use_case": "new_use_case",
"description": "Description",
"prompt_tokens_mean": 200,
"generation_tokens_mean": 150,
"ttft_p90_target_ms": 250,
"tpot_p90_target_ms": 60,
"e2e_p90_target_ms": 3000
}- Update
src/neuralnav/intent_extraction/extractor.pyUSE_CASE_MAP - Restart backend
UI code is in ui/app.py. Changes auto-reload in the browser.
Key sections:
render_chat_interface()- Chat input/historyrender_recommendation()- Recommendation tabsrender_deployment_management_tab()- Cluster managementrender_configuration_tab()- Database management (inui/components/settings.py)
Model scoring: src/neuralnav/recommendation/scorer.py
Scorerclass - Adjust scoring weights
Capacity planning: src/neuralnav/recommendation/config_finder.py
plan_capacity()- GPU sizing logic_calculate_required_replicas()- Scaling calculations
Traffic profiling: src/neuralnav/specification/traffic_profile.py
generate_profile()- Traffic estimationgenerate_slo_targets()- SLO target generation
Lint code:
make lintFormat code:
make formatBoth use the shared project venv at root.
make build-simulatorCreates vllm-simulator:latest Docker image.
# Can use podman instead of docker
docker run -p 8080:8080 \
-e MODEL_NAME=mistralai/Mistral-7B-Instruct-v0.3 \
-e GPU_TYPE=NVIDIA-L4 \
-e TENSOR_PARALLEL_SIZE=1 \
vllm-simulator:latest
# Test
curl -X POST http://localhost:8080/v1/completions \
-H "Content-Type: application/json" \
-d '{"prompt": "Hello", "max_tokens": 10}'make push-simulatorAuto-prompts for login if not authenticated.
Remove generated files:
make cleanRemove everything (including venvs):
make clean-allRemove cluster:
make cluster-stopNeuralNav uses Ruff for linting and code formatting.
Run linter:
make lintOr manually:
source venv/bin/activate
ruff check src/ ui/Auto-fix issues:
source venv/bin/activate
ruff check src/ ui/ --fixFormat code:
source venv/bin/activate
ruff format src/ ui/Configuration:
Ruff is configured in pyproject.toml with:
- Line length: 100 characters
- Python 3.11+ syntax
- Import sorting (isort)
- Modern Python upgrades
- Common bug detection
Before committing:
Always run make lint to catch issues early. Most issues can be auto-fixed with ruff check --fix.
See all available make targets:
make helpShow configuration:
make infoOpen UI in browser:
make open-uiOpen API docs:
make open-backenduv syncThe UI shares the same virtual environment as the backend (managed by uv):
uv sync # Same command — all deps are in pyproject.tomlThe POC uses qwen2.5:7b for intent extraction:
ollama pull qwen2.5:7bAlternative models (if needed):
llama3.2:3b- Smaller/faster, less accuratemistral:7b- Good balance of speed and quality
# Test Ollama is working
ollama list # Should show qwen2.5:7bThe easiest way to use NeuralNav:
# Terminal 1 - Start Ollama (if not already running)
ollama serve
# Terminal 2 - Start FastAPI Backend
scripts/run_api.sh
# Terminal 3 - Start Streamlit UI
scripts/run_ui.shThen open http://localhost:8501 in your browser.
Test the complete recommendation workflow with demo scenarios.
Requires Ollama running with qwen2.5:7b and PostgreSQL with benchmark data:
uv run pytest tests/test_recommendation_workflow.py -vThis tests all 3 demo scenarios end-to-end.
Start the API server:
./run_api.shOr manually:
scripts/run_api.shTest the API:
# Health check
curl http://localhost:8000/health
# Full recommendation
curl -X POST http://localhost:8000/api/v1/recommend \
-H "Content-Type: application/json" \
-d '{"message": "I need a chatbot for 5000 users with low latency"}'Test the LLM client:
source venv/bin/activate
python -c "
from neuralnav.llm.ollama_client import OllamaClient
client = OllamaClient(model='llama3.2:3b')
print('Ollama available:', client.is_available())
print('Pulling model...')
client.ensure_model_pulled()
print('Model ready!')
"# Check Ollama is running
curl http://localhost:11434/api/tags
# If not running
ollama serveollama pull llama3.2:3b# Reinstall dependencies
uv syncInstall KIND (if not already installed):
brew install kindCreate cluster with KServe:
# Ensure Docker Desktop is running
# Create cluster
kind create cluster --config config/kind-cluster.yaml
# Install cert-manager
kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.14.4/cert-manager.yaml
# Wait for cert-manager
kubectl wait --for=condition=available --timeout=300s -n cert-manager deployment/cert-manager
# Install KServe
kubectl apply --server-side -f https://github.com/kserve/kserve/releases/download/v0.14.0/kserve.yaml
kubectl apply --server-side -f https://github.com/kserve/kserve/releases/download/v0.14.0/kserve-cluster-resources.yaml
# Wait for KServe
kubectl wait --for=condition=available --timeout=300s -n kserve deployment/kserve-controller-manager
# Configure KServe for RawDeployment mode
kubectl patch configmap/inferenceservice-config -n kserve --type=strategic -p '{"data": {"deploy": "{\"defaultDeploymentMode\": \"RawDeployment\"}"}}'- Get a deployment recommendation from the chat interface
- Click "Generate Deployment YAML" in the Actions section
- If cluster is accessible, click "Deploy to Kubernetes"
- Go to Monitoring tab to see:
- Real Kubernetes deployment status
- InferenceService conditions
- Pod information
- Performance metrics
Deploy generated YAML:
# After generating YAML via UI
kubectl apply -f generated_configs/kserve-inferenceservice.yaml
kubectl get inferenceservices
kubectl get podsView all resources:
kubectl get pods -AView deployments:
kubectl get inferenceservices
kubectl get podsDelete a specific deployment:
kubectl delete inferenceservice <deployment-id>Check cluster info:
kubectl cluster-infoThe system automatically generates production-ready Kubernetes configurations:
- ✅ KServe InferenceService YAML with vLLM configuration
- ✅ HorizontalPodAutoscaler (HPA) for autoscaling
- ✅ Prometheus ServiceMonitor for metrics collection
- ✅ Grafana Dashboard ConfigMap
- ✅ Full YAML validation before generation
- ✅ Files written to
generated_configs/directory
How to use:
- Get a deployment recommendation from the chat interface
- Go to the Cost tab and click "Generate Deployment YAML"
- View generated YAML file paths
- Check
generated_configs/directory for all YAML files
Simulator mode is enabled by default for all deployments:
# Start the UI
scripts/run_ui.sh
# In the UI:
# 1. Get a deployment recommendation
# 2. Click "Generate Deployment YAML"
# 3. Click "Deploy to Kubernetes"
# 4. Go to Monitoring tab
# 5. Pod should become Ready in ~10-15 secondsOnce deployed:
- Go to Monitoring tab
- See "🧪 Inference Testing" section
- Enter a test prompt
- Click "🚀 Send Test Request"
- View the simulated response and metrics
To use real vLLM with actual GPUs (requires GPU-enabled cluster):
# In src/neuralnav/api/routes.py
deployment_generator = DeploymentGenerator(simulator_mode=False)Then deploy to a GPU-enabled cluster with:
- NVIDIA GPU Operator installed
- GPU nodes with appropriate labels
- Sufficient GPU resources
| Feature | Simulator Mode | Real vLLM Mode |
|---|---|---|
| GPU Required | ❌ No | ✅ Yes |
| Model Download | ❌ No | ✅ Yes (from HuggingFace) |
| Inference | Canned responses | Real generation |
| Latency | Simulated (from benchmarks) | Actual GPU performance |
| Use Case | Development, testing, demos | Production deployment |
| Cluster | Works on KIND (local) | Requires GPU-enabled cluster |
Requires Ollama running with qwen2.5:7b and PostgreSQL with benchmark data:
# Test end-to-end workflow
uv run pytest tests/test_recommendation_workflow.py -v
# Test FastAPI endpoints
scripts/run_api.sh # Start server in terminal 1
# In terminal 2:
curl -X POST http://localhost:8000/api/v1/testFor comprehensive testing instructions, see TESTING.md.