-
Notifications
You must be signed in to change notification settings - Fork 6
Open
Description
Summary
Users should be able to specify which GPUs they want to include or exclude in their deployment recommendations through natural language prompts.
Current Behavior
- The
DeploymentIntentschema has a singlepreferred_gpu_typefield that supports one GPU preference (e.g., "H100" or "Any GPU") - The capacity planner filters by this single preferred GPU type in Python after querying the database
- There is no support for:
- Multiple GPU preferences (e.g., "H100 or H200")
- GPU exclusions (e.g., "don't use L4")
Desired Behavior
Users should be able to express GPU preferences in natural language like:
- "I would like to use H100 or H200 GPUs."
- "I don't want to use L4 hardware."
- "Only use A100 or H100 GPUs, but not L4s."
Proposed Implementation
1. Schema Changes (backend/src/context_intent/schema.py)
Replace the single preferred_gpu_type field with two new fields in DeploymentIntent:
gpu_include: list[str] = Field(
default_factory=list,
description="GPU types to include (empty = all GPUs allowed)"
)
gpu_exclude: list[str] = Field(
default_factory=list,
description="GPU types to exclude"
)2. LLM Prompt Changes (backend/src/llm/prompts.py)
Update the intent extraction prompt and schema to:
- Extract
gpu_includelist when user specifies preferred GPUs - Extract
gpu_excludelist when user specifies GPUs to avoid - Add examples for both inclusion and exclusion patterns
3. Database Query Changes (backend/src/knowledge_base/benchmarks.py)
Modify find_configurations_meeting_slo() to accept optional GPU filter parameters:
def find_configurations_meeting_slo(
self,
...,
gpu_include: list[str] | None = None, # Filter to only these GPUs
gpu_exclude: list[str] | None = None, # Exclude these GPUs
) -> list[BenchmarkData]:Add SQL WHERE clauses to filter at the database level:
AND hardware IN (...)whengpu_includeis providedAND hardware NOT IN (...)whengpu_excludeis provided
4. Capacity Planner Changes (backend/src/recommendation/capacity_planner.py)
- Pass the new GPU filter parameters from
DeploymentIntenttofind_configurations_meeting_slo() - Remove the post-query Python filtering for
preferred_gpu_type(lines 191-205)
Benefits
- More efficient: Filtering at DB level reduces data transfer
- More flexible: Supports both inclusions and exclusions
- Better UX: Users can express complex GPU preferences naturally
Acceptance Criteria
- User can specify multiple preferred GPU types in natural language
- User can exclude specific GPU types in natural language
- GPU filtering happens at the database query level
- Estimated benchmarks (JSON) also respect GPU include/exclude filters
- Existing behavior preserved when no GPU preferences specified
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels