@@ -23,10 +23,18 @@ This repository contains the architecture design for **Compass**, an open-source
2323 - Entity-relationship diagrams for data models
2424
2525- ** backend/** : Python backend implementation
26- - Component modules: context_intent, recommendation, knowledge_base, orchestration, api
27- - LLM integration with Ollama client
28- - FastAPI REST endpoints with CORS support
29- - Pydantic schemas for type safety
26+ - ** api/** : FastAPI REST endpoints with CORS support
27+ - ** context_intent/** : Intent extraction, traffic profiles, Pydantic schemas
28+ - ** recommendation/** : Multi-criteria scoring and ranking
29+ - ` solution_scorer.py ` : 4-dimension scoring (accuracy, price, latency, complexity)
30+ - ` model_evaluator.py ` : Use-case fit scoring
31+ - ` usecase_quality_scorer.py ` : Artificial Analysis benchmark integration
32+ - ` ranking_service.py ` : 5 ranked list generation
33+ - ` capacity_planner.py ` : GPU capacity planning with SLO filtering
34+ - ** knowledge_base/** : Data access (benchmark database, JSON catalogs)
35+ - ** orchestration/** : Workflow coordination
36+ - ** deployment/** : Jinja2 templates for KServe/vLLM YAML generation
37+ - ** llm/** : Ollama client for intent extraction
3038
3139- ** ui/** : Streamlit UI
3240 - Chat interface for conversational requirement gathering
@@ -35,11 +43,18 @@ This repository contains the architecture design for **Compass**, an open-source
3543 - Action buttons for YAML generation and deployment
3644 - Monitoring dashboard with cluster status, SLO compliance, and inference testing
3745
38- - ** data/** : Synthetic benchmark and catalog data for POC
39- - benchmarks.json: 24 model+GPU combinations with vLLM performance data
40- - model_catalog.json: 10 approved models with metadata
41- - slo_templates.json: 7 use case templates
42- - demo_scenarios.json: 3 test scenarios
46+ - ** data/** : Benchmark and catalog data
47+ - ** model_catalog.json** : 47 curated models with task/domain metadata
48+ - ** slo_templates.json** : 9 use case templates with SLO targets
49+ - ** benchmarks/models/** : Model benchmark data
50+ - ` opensource_all_benchmarks.csv ` : 204 open-source models from Artificial Analysis
51+ - ` model_pricing.csv ` : GPU pricing data
52+ - ** business_context/use_case/** : Use-case specific quality scoring
53+ - ` weighted_scores/ ` : 9 CSV files with pre-ranked models per use case
54+ - ` configs/ ` : Use case configuration files (weights, SLOs, workloads)
55+ - ` USE_CASE_METHODOLOGY.md ` : Explains benchmark weighting strategy
56+ - ** benchmarks_BLIS.json** : Latency/throughput benchmarks from BLIS simulator (loaded into PostgreSQL)
57+ - ** demo_scenarios.json** : 3 test scenarios
4358
4459## Architecture Key Concepts
4560
@@ -75,13 +90,14 @@ Compass is structured as a layered architecture:
7590
7691** Core Engines** (Vertical - Backend Services):
77921 . ** Intent & Specification Engine** - Transform conversation into complete deployment spec
78- - LLM-powered intent extraction (Ollama llama3.1:8b )
93+ - LLM-powered intent extraction (Ollama qwen2.5:7b )
7994 - Use case → traffic profile mapping (4 GuideLLM standards)
8095 - SLO template lookup and specification generation
81962 . ** Recommendation Engine** - Find optimal model + GPU configurations
82- - Model selection and ranking
97+ - Multi-criteria scoring (accuracy, price, latency, complexity)
8398 - Capacity planning (GPU count, deployment topology)
84- - SLO compliance filtering
99+ - SLO compliance filtering with near-miss tolerance
100+ - Ranked lists generation (5 views: best accuracy, lowest cost, etc.)
851013 . ** Deployment Engine** - Generate and deploy Kubernetes configs
86102 - YAML generation (Jinja2 templates)
87103 - K8s deployment lifecycle management
@@ -99,11 +115,41 @@ Compass is structured as a layered architecture:
99115- ** vLLM Simulator** - GPU-free development and testing
100116
101117### Critical Data Collections (Knowledge Base)
102- - ** Model Benchmarks** (PostgreSQL): TTFT/ITL/E2E/throughput for (model, GPU, traffic_profile ) combinations
118+ - ** Model Benchmarks** (PostgreSQL): TTFT/ITL/E2E/throughput benchmarks for (model, GPU, tensor_parallel ) combinations (source: BLIS simulator)
103119- ** Use Case SLO Templates** (JSON): 9 use cases mapped to 4 GuideLLM traffic profiles with experience-driven SLO targets
104- - ** Model Catalog** (JSON): 40 curated, approved models with task/domain metadata
120+ - ** Model Catalog** (JSON): 47 curated, approved models with task/domain metadata
121+ - ** Model Quality Scores** (CSV): Use-case specific scores from Artificial Analysis benchmarks (204 models)
122+ - ** Use Case Configs** (JSON): Benchmark weights, SLO targets, and workload profiles per use case
105123- ** Deployment Outcomes** (PostgreSQL, future): Actual performance data for feedback loop
106124
125+ ### Solution Ranking System
126+
127+ The recommendation engine uses ** multi-criteria scoring** to rank configurations:
128+
129+ ** 4 Scoring Dimensions** (each 0-100 scale):
130+ 1 . ** Accuracy/Quality** : Use-case specific model capability from Artificial Analysis benchmarks
131+ - Source: ` data/business_context/use_case/weighted_scores/*.csv `
132+ - Fallback: Parameter count heuristic if model not in benchmark data
133+ 2 . ** Price** : Cost efficiency (inverse of monthly cost, normalized)
134+ 3 . ** Latency** : SLO compliance and headroom from performance benchmark database
135+ 4 . ** Complexity** : Deployment simplicity (fewer GPUs = higher score)
136+
137+ ** Default Weights** : 40% accuracy, 40% price, 10% latency, 10% complexity
138+
139+ ** 5 Ranked Views** :
140+ - ` best_accuracy ` : Sorted by model capability
141+ - ` lowest_cost ` : Sorted by price efficiency
142+ - ` lowest_latency ` : Sorted by SLO headroom
143+ - ` simplest ` : Sorted by deployment complexity
144+ - ` balanced ` : Sorted by weighted composite score
145+
146+ ** Key Files** :
147+ - ` backend/src/recommendation/solution_scorer.py ` - Calculates 4 scores
148+ - ` backend/src/recommendation/model_evaluator.py ` - Legacy accuracy scoring (use-case fit)
149+ - ` backend/src/recommendation/usecase_quality_scorer.py ` - Artificial Analysis benchmark scoring
150+ - ` backend/src/recommendation/ranking_service.py ` - Generates 5 ranked lists
151+ - ` backend/src/recommendation/capacity_planner.py ` - Orchestrates scoring during capacity planning
152+
107153## Working with This Repository
108154
109155### When Modifying Architecture Documents
@@ -149,10 +195,11 @@ Compass is structured as a layered architecture:
149195### Common Editing Patterns
150196
151197** Adding a new use case template** :
152- 1 . Add to Intent & Specification Engine's USE_CASE_TEMPLATES in docs/ARCHITECTURE.md
153- 2 . Add corresponding entry to data/slo_templates.json
154- 3 . Update Knowledge Base → Use Case SLO Templates schema in docs/ARCHITECTURE.md
155- 4 . Update examples if relevant
198+ 1 . Add corresponding entry to ` data/slo_templates.json `
199+ 2 . Create weighted scores CSV in ` data/business_context/use_case/weighted_scores/ `
200+ 3 . Add use case to ` UseCaseQualityScorer.USE_CASE_FILES ` in ` usecase_quality_scorer.py `
201+ 4 . Update ` USE_CASE_METHODOLOGY.md ` with benchmark weighting rationale
202+ 5 . Update docs/ARCHITECTURE.md if needed
156203
157204** Adding a new SLO metric** :
1582051 . Update DeploymentIntent schema in Intent & Specification Engine (docs/ARCHITECTURE.md)
@@ -213,17 +260,20 @@ Signed-off-by: Your Name <your.email@example.com>
213260
214261- ** Current Implementation Status** :
215262 - ✅ Project structure with synthetic data and LLM client
216- - ✅ Core recommendation engine (intent extraction, traffic profiling, model recommendation, capacity planning)
263+ - ✅ Core recommendation engine (intent extraction, traffic profiling, capacity planning)
264+ - ✅ Multi-criteria solution ranking with 4 scoring dimensions
265+ - ✅ Use-case specific quality scoring from Artificial Analysis benchmarks
266+ - ✅ 5 ranked recommendation views (best accuracy, lowest cost, etc.)
217267 - ✅ Orchestration workflow and FastAPI backend
218268 - ✅ Streamlit UI with chat interface, recommendation display, and editable specifications
219269 - ✅ YAML generation (KServe/vLLM/HPA/ServiceMonitor) and deployment automation
220270 - ✅ KIND cluster support with KServe installation
221271 - ✅ Kubernetes deployment automation and real cluster status monitoring
222272 - ✅ vLLM simulator for GPU-free development
223273 - ✅ Inference testing UI with end-to-end deployment validation
224- - The Knowledge Base schemas are critical - any implementation must support all 7 collections
274+ - The Knowledge Base schemas are critical - any implementation must support all collections
225275- SLO-driven capacity planning is the core differentiator - don't simplify this away
226- - Use synthetic data in data/ directory for POC; production would use a database (e.g., PostgreSQL)
276+ - Use data in data/ directory for POC; production uses PostgreSQL for latency benchmarks
227277- Benchmarks use vLLM default configuration with dynamic batching (no fixed batch_size)
228278
229279## Simulator Mode vs Real vLLM
0 commit comments