Skip to content

Commit a783c03

Browse files
authored
Merge pull request #82 from anfredette/refactor
refactor: Align backend module structure with updated architecture
2 parents 67c11ff + dcf2609 commit a783c03

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

58 files changed

+2396
-1431
lines changed

CLAUDE.md

Lines changed: 52 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -23,17 +23,34 @@ This repository contains the architecture design for **NeuralNav**, an open-sour
2323
- Entity-relationship diagrams for data models
2424

2525
- **backend/**: Python backend implementation
26-
- **api/**: FastAPI REST endpoints with CORS support
27-
- **context_intent/**: Intent extraction, traffic profiles, Pydantic schemas
28-
- **recommendation/**: Multi-criteria scoring and ranking
29-
- `solution_scorer.py`: 4-dimension scoring (accuracy, price, latency, complexity)
30-
- `model_evaluator.py`: Use-case fit scoring
31-
- `usecase_quality_scorer.py`: Artificial Analysis benchmark integration
32-
- `ranking_service.py`: 5 ranked list generation
33-
- `capacity_planner.py`: GPU capacity planning with SLO filtering
34-
- **knowledge_base/**: Data access (benchmark database, JSON catalogs)
26+
- **api/**: FastAPI REST API layer
27+
- `app.py`: FastAPI app factory
28+
- `dependencies.py`: Singleton dependency injection
29+
- **routes/**: Modular endpoint handlers (health, intent, specification, recommendation, configuration, reference_data)
30+
- **intent_extraction/**: Intent Extraction Service
31+
- `extractor.py`: LLM-powered intent extraction from natural language
32+
- `service.py`: IntentExtractionService facade
33+
- **specification/**: Specification Service
34+
- `traffic_profile.py`: Traffic profile and SLO target generation
35+
- `service.py`: SpecificationService facade
36+
- **recommendation/**: Recommendation Service
37+
- `config_finder.py`: GPU capacity planning with SLO filtering
38+
- `scorer.py`: 4-dimension scoring (accuracy, price, latency, complexity)
39+
- `analyzer.py`: 5 ranked list generation
40+
- `service.py`: RecommendationService facade
41+
- **quality/**: Use-case quality scoring (Artificial Analysis benchmarks)
42+
- **configuration/**: Configuration Service
43+
- `generator.py`: Jinja2 YAML generation for KServe/vLLM
44+
- `validator.py`: YAML validation
45+
- `service.py`: ConfigurationService facade
46+
- **templates/**: Jinja2 deployment templates
47+
- **cluster/**: Kubernetes cluster management
48+
- `manager.py`: K8s deployment lifecycle management
49+
- **shared/**: Shared modules
50+
- **schemas/**: Pydantic data models (intent, specification, recommendation)
51+
- **utils/**: Shared utilities (GPU normalization)
52+
- **knowledge_base/**: Data access layer (benchmark database, JSON catalogs)
3553
- **orchestration/**: Workflow coordination
36-
- **deployment/**: Jinja2 templates for KServe/vLLM YAML generation
3754
- **llm/**: Ollama client for intent extraction
3855

3956
- **ui/**: Streamlit UI
@@ -148,11 +165,11 @@ The recommendation engine uses **multi-criteria scoring** to rank configurations
148165
- `balanced`: Sorted by weighted composite score
149166

150167
**Key Files**:
151-
- `backend/src/recommendation/solution_scorer.py` - Calculates 4 scores
152-
- `backend/src/recommendation/model_evaluator.py` - Legacy accuracy scoring (use-case fit)
153-
- `backend/src/recommendation/usecase_quality_scorer.py` - Artificial Analysis benchmark scoring
154-
- `backend/src/recommendation/ranking_service.py` - Generates 5 ranked lists
155-
- `backend/src/recommendation/capacity_planner.py` - Orchestrates scoring during capacity planning
168+
169+
- `backend/src/recommendation/scorer.py` - Calculates 4 scores
170+
- `backend/src/recommendation/quality/usecase_scorer.py` - Artificial Analysis benchmark scoring
171+
- `backend/src/recommendation/analyzer.py` - Generates 5 ranked lists
172+
- `backend/src/recommendation/config_finder.py` - Orchestrates scoring during capacity planning
156173

157174
## Working with This Repository
158175

@@ -196,6 +213,16 @@ The recommendation engine uses **multi-criteria scoring** to rank configurations
196213
- Use "**p95**" for 95th percentile metrics (Phase 2 standard, more conservative than p90)
197214
- GPU configurations: "2x NVIDIA L4" or "4x A100-80GB" (not "2 L4s")
198215

216+
### API Endpoint Conventions
217+
218+
All API endpoints **must** follow these rules:
219+
220+
- **Prefix**: Every route file uses `APIRouter(prefix="/api/v1")`. Individual route decorators use relative paths (e.g., `@router.post("/recommend")`), **not** full paths.
221+
- **Health check exception**: `/health` stays at root with no prefix (standard for load balancer probes). This is the only endpoint outside `/api/v1/`.
222+
- **Versioning**: All endpoints are under `/api/v1/`. When a v2 is needed, add new route files with `prefix="/api/v2"`.
223+
- **Naming**: Use kebab-case for multi-word paths (e.g., `/deploy-to-cluster`, `/ranked-recommend-from-spec`).
224+
- **When adding a new route file**: Set `prefix="/api/v1"` on the `APIRouter` and use relative paths in all decorators. Register the router in `backend/src/api/routes/__init__.py` and include it in `backend/src/api/app.py`.
225+
199226
### Common Editing Patterns
200227

201228
**Adding a new use case template**:
@@ -214,6 +241,13 @@ The recommendation engine uses **multi-criteria scoring** to rank configurations
214241
5. Update dashboard example if applicable
215242
6. Update docs/architecture-diagram.md data model ERD
216243

244+
**Adding a new API endpoint**:
245+
1. Add the route to the appropriate file in `backend/src/api/routes/` (or create a new route file)
246+
2. Use a relative path in the decorator (e.g., `@router.get("/my-endpoint")`) — the `/api/v1` prefix comes from the router
247+
3. If creating a new route file, set `APIRouter(prefix="/api/v1")` and register it in `routes/__init__.py` and `app.py`
248+
4. Update `ui/app.py` if the UI calls the new endpoint
249+
5. Update documentation (docs/DEVELOPER_GUIDE.md, docs/ARCHITECTUREv2.md) with the new endpoint
250+
217251
**Adding a new component**:
218252
1. Add numbered section to docs/ARCHITECTURE.md (maintain sequential numbering)
219253
2. Update "Architecture Components" count in Overview
@@ -294,7 +328,7 @@ The system now supports two deployment modes:
294328
- **Purpose**: GPU-free development and testing on local machines
295329
- **Location**: `simulator/` directory contains the vLLM simulator service
296330
- **Docker Image**: `vllm-simulator:latest` (single image for all models)
297-
- **Configuration**: Set `DeploymentGenerator(simulator_mode=True)` in `backend/src/api/routes.py`
331+
- **Configuration**: Set `DeploymentGenerator(simulator_mode=True)` in `backend/src/api/dependencies.py`
298332
- **Benefits**:
299333
- No GPU hardware required
300334
- Fast deployment (~10-15 seconds to Ready)
@@ -304,7 +338,7 @@ The system now supports two deployment modes:
304338

305339
### Real vLLM Mode (Production)
306340
- **Purpose**: Actual model inference with GPUs
307-
- **Configuration**: Set `DeploymentGenerator(simulator_mode=False)` in `backend/src/api/routes.py`
341+
- **Configuration**: Set `DeploymentGenerator(simulator_mode=False)` in `backend/src/api/dependencies.py`
308342
- **Requirements**:
309343
- GPU-enabled Kubernetes cluster
310344
- NVIDIA GPU Operator installed
@@ -332,7 +366,7 @@ The system now supports two deployment modes:
332366

333367
### Technical Details
334368

335-
The deployment template (`backend/src/deployment/templates/kserve-inferenceservice.yaml.j2`) uses Jinja2 conditionals:
369+
The deployment template (`backend/src/configuration/templates/kserve-inferenceservice.yaml.j2`) uses Jinja2 conditionals:
336370
- `{% if simulator_mode %}` - Uses `vllm-simulator:latest`, no GPU resources, fast health checks
337371
- `{% else %}` - Uses `vllm/vllm-openai:v0.6.2`, requests GPUs, longer health checks
338372

Makefile

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -186,7 +186,7 @@ start-backend: ## Start FastAPI backend
186186
printf "$(YELLOW)Backend already running (PID: $$(cat $(BACKEND_PID)))$(NC)\n"; \
187187
else \
188188
cd $(BACKEND_DIR) && \
189-
( uv run uvicorn src.api.routes:app --reload --host 0.0.0.0 --port 8000 > ../$(LOG_DIR)/backend.log 2>&1 & echo $$! > ../$(BACKEND_PID) ); \
189+
( uv run uvicorn src.api.app:app --reload --host 0.0.0.0 --port 8000 > ../$(LOG_DIR)/backend.log 2>&1 & echo $$! > ../$(BACKEND_PID) ); \
190190
sleep 2; \
191191
printf "$(GREEN)✓ Backend started (PID: $$(cat $(BACKEND_PID)))$(NC)\n"; \
192192
fi
@@ -215,12 +215,12 @@ stop: ## Stop all services
215215
fi
216216
@# Kill any remaining NeuralNav processes by pattern matching
217217
@pkill -f "streamlit run ui/app.py" 2>/dev/null || true
218-
@pkill -f "uvicorn src.api.routes:app" 2>/dev/null || true
218+
@pkill -f "uvicorn src.api.app:app" 2>/dev/null || true
219219
@# Give processes time to exit gracefully
220220
@sleep 1
221221
@# Force kill if still running
222222
@pkill -9 -f "streamlit run ui/app.py" 2>/dev/null || true
223-
@pkill -9 -f "uvicorn src.api.routes:app" 2>/dev/null || true
223+
@pkill -9 -f "uvicorn src.api.app:app" 2>/dev/null || true
224224
@printf "$(GREEN)✓ All NeuralNav services stopped$(NC)\n"
225225
@# Don't stop Ollama as it might be used by other apps
226226
@printf "$(YELLOW)Note: Ollama left running (use 'pkill ollama' to stop manually)$(NC)\n"

backend/TESTING.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,7 @@ cd backend
3636
source venv/bin/activate
3737

3838
python -c "
39-
from src.context_intent.extractor import IntentExtractor
39+
from src.intent_extraction import IntentExtractor
4040
4141
extractor = IntentExtractor()
4242
intent = extractor.extract_intent(
@@ -55,8 +55,8 @@ print(f' Cost Priority: {intent.cost_priority}')
5555

5656
```bash
5757
python -c "
58-
from src.context_intent.schema import DeploymentIntent
59-
from src.recommendation.traffic_profile import TrafficProfileGenerator
58+
from src.shared.schemas import DeploymentIntent
59+
from src.specification import TrafficProfileGenerator
6060
6161
intent = DeploymentIntent(
6262
use_case='chatbot_conversational',
@@ -85,7 +85,7 @@ print(f' E2E p90: {slo.e2e_p90_target_ms}ms')
8585

8686
```bash
8787
python -c "
88-
from src.context_intent.schema import DeploymentIntent
88+
from src.shared.schemas import DeploymentIntent
8989
from src.recommendation.model_evaluator import ModelEvaluator
9090
9191
intent = DeploymentIntent(

backend/src/api/app.py

Lines changed: 66 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,66 @@
1+
"""FastAPI application factory for NeuralNav API."""
2+
3+
import logging
4+
import os
5+
6+
from fastapi import FastAPI
7+
from fastapi.middleware.cors import CORSMiddleware
8+
9+
from .routes import (
10+
configuration_router,
11+
health_router,
12+
intent_router,
13+
recommendation_router,
14+
reference_data_router,
15+
specification_router,
16+
)
17+
18+
# Configure logging
19+
debug_mode = os.getenv("NEURALNAV_DEBUG", "false").lower() == "true"
20+
log_level = logging.DEBUG if debug_mode else logging.INFO
21+
logging.basicConfig(
22+
level=log_level,
23+
format="%(asctime)s - %(name)s - %(levelname)s - %(message)s",
24+
datefmt="%Y-%m-%d %H:%M:%S",
25+
)
26+
logger = logging.getLogger(__name__)
27+
28+
29+
def create_app() -> FastAPI:
30+
"""Create and configure the FastAPI application."""
31+
app = FastAPI(
32+
title="NeuralNav API",
33+
description="API for LLM deployment recommendations",
34+
version="0.1.0",
35+
)
36+
37+
# Add CORS middleware
38+
app.add_middleware(
39+
CORSMiddleware,
40+
allow_origins=["*"], # In production, specify actual origins
41+
allow_credentials=True,
42+
allow_methods=["*"],
43+
allow_headers=["*"],
44+
)
45+
46+
# Include all routers
47+
app.include_router(health_router)
48+
app.include_router(intent_router)
49+
app.include_router(specification_router)
50+
app.include_router(recommendation_router)
51+
app.include_router(configuration_router)
52+
app.include_router(reference_data_router)
53+
54+
logger.info(f"NeuralNav API starting with log level: {logging.getLevelName(log_level)}")
55+
56+
return app
57+
58+
59+
# Create the app instance for uvicorn
60+
app = create_app()
61+
62+
63+
if __name__ == "__main__":
64+
import uvicorn
65+
66+
uvicorn.run(app, host="0.0.0.0", port=8000)

backend/src/api/dependencies.py

Lines changed: 103 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,103 @@
1+
"""Shared dependencies for API routes.
2+
3+
This module provides singleton instances and dependency injection
4+
for the API routes. All shared state is initialized here.
5+
"""
6+
7+
import logging
8+
import os
9+
10+
from ..cluster import KubernetesClusterManager, KubernetesDeploymentError
11+
from ..configuration import DeploymentGenerator, YAMLValidator
12+
from ..knowledge_base.model_catalog import ModelCatalog
13+
from ..knowledge_base.slo_templates import SLOTemplateRepository
14+
from ..orchestration.workflow import RecommendationWorkflow
15+
16+
# Configure logging
17+
debug_mode = os.getenv("NEURALNAV_DEBUG", "false").lower() == "true"
18+
log_level = logging.DEBUG if debug_mode else logging.INFO
19+
logging.basicConfig(
20+
level=log_level,
21+
format="%(asctime)s - %(name)s - %(levelname)s - %(message)s",
22+
datefmt="%Y-%m-%d %H:%M:%S",
23+
)
24+
logger = logging.getLogger(__name__)
25+
26+
# Singleton instances
27+
_workflow: RecommendationWorkflow | None = None
28+
_model_catalog: ModelCatalog | None = None
29+
_slo_repo: SLOTemplateRepository | None = None
30+
_deployment_generator: DeploymentGenerator | None = None
31+
_yaml_validator: YAMLValidator | None = None
32+
_cluster_manager: KubernetesClusterManager | None = None
33+
34+
35+
def get_workflow() -> RecommendationWorkflow:
36+
"""Get the recommendation workflow singleton."""
37+
global _workflow
38+
if _workflow is None:
39+
_workflow = RecommendationWorkflow()
40+
return _workflow
41+
42+
43+
def get_model_catalog() -> ModelCatalog:
44+
"""Get the model catalog singleton."""
45+
global _model_catalog
46+
if _model_catalog is None:
47+
_model_catalog = ModelCatalog()
48+
return _model_catalog
49+
50+
51+
def get_slo_repo() -> SLOTemplateRepository:
52+
"""Get the SLO template repository singleton."""
53+
global _slo_repo
54+
if _slo_repo is None:
55+
_slo_repo = SLOTemplateRepository()
56+
return _slo_repo
57+
58+
59+
def get_deployment_generator() -> DeploymentGenerator:
60+
"""Get the deployment generator singleton."""
61+
global _deployment_generator
62+
if _deployment_generator is None:
63+
# Use simulator mode by default (no GPU required for development)
64+
_deployment_generator = DeploymentGenerator(simulator_mode=True)
65+
return _deployment_generator
66+
67+
68+
def get_yaml_validator() -> YAMLValidator:
69+
"""Get the YAML validator singleton."""
70+
global _yaml_validator
71+
if _yaml_validator is None:
72+
_yaml_validator = YAMLValidator()
73+
return _yaml_validator
74+
75+
76+
def get_cluster_manager(namespace: str = "default") -> KubernetesClusterManager | None:
77+
"""Get or create a cluster manager.
78+
79+
Returns None if cluster is not accessible.
80+
"""
81+
global _cluster_manager
82+
if _cluster_manager is None:
83+
try:
84+
_cluster_manager = KubernetesClusterManager(namespace=namespace)
85+
logger.info("Kubernetes cluster manager initialized successfully")
86+
except KubernetesDeploymentError as e:
87+
logger.warning(f"Kubernetes cluster not accessible: {e}")
88+
return None
89+
return _cluster_manager
90+
91+
92+
def get_cluster_manager_or_raise(namespace: str = "default") -> KubernetesClusterManager:
93+
"""Get or create a cluster manager, raising an exception if not accessible."""
94+
manager = get_cluster_manager(namespace)
95+
if manager is None:
96+
try:
97+
return KubernetesClusterManager(namespace=namespace)
98+
except KubernetesDeploymentError as e:
99+
from fastapi import HTTPException
100+
raise HTTPException(
101+
status_code=503, detail=f"Kubernetes cluster not accessible: {str(e)}"
102+
) from e
103+
return manager

0 commit comments

Comments
 (0)