Skip to content

Support GPU inclusion/exclusion preferences in natural language prompts #67

@anfredette

Description

@anfredette

Summary

Users should be able to specify which GPUs they want to include or exclude in their deployment recommendations through natural language prompts.

Current Behavior

  • The DeploymentIntent schema has a single preferred_gpu_type field that supports one GPU preference (e.g., "H100" or "Any GPU")
  • The capacity planner filters by this single preferred GPU type in Python after querying the database
  • There is no support for:
    • Multiple GPU preferences (e.g., "H100 or H200")
    • GPU exclusions (e.g., "don't use L4")

Desired Behavior

Users should be able to express GPU preferences in natural language like:

  • "I would like to use H100 or H200 GPUs."
  • "I don't want to use L4 hardware."
  • "Only use A100 or H100 GPUs, but not L4s."

Proposed Implementation

1. Schema Changes (backend/src/context_intent/schema.py)

Replace the single preferred_gpu_type field with two new fields in DeploymentIntent:

gpu_include: list[str] = Field(
    default_factory=list,
    description="GPU types to include (empty = all GPUs allowed)"
)
gpu_exclude: list[str] = Field(
    default_factory=list,
    description="GPU types to exclude"
)

2. LLM Prompt Changes (backend/src/llm/prompts.py)

Update the intent extraction prompt and schema to:

  • Extract gpu_include list when user specifies preferred GPUs
  • Extract gpu_exclude list when user specifies GPUs to avoid
  • Add examples for both inclusion and exclusion patterns

3. Database Query Changes (backend/src/knowledge_base/benchmarks.py)

Modify find_configurations_meeting_slo() to accept optional GPU filter parameters:

def find_configurations_meeting_slo(
    self,
    ...,
    gpu_include: list[str] | None = None,  # Filter to only these GPUs
    gpu_exclude: list[str] | None = None,  # Exclude these GPUs
) -> list[BenchmarkData]:

Add SQL WHERE clauses to filter at the database level:

  • AND hardware IN (...) when gpu_include is provided
  • AND hardware NOT IN (...) when gpu_exclude is provided

4. Capacity Planner Changes (backend/src/recommendation/capacity_planner.py)

  • Pass the new GPU filter parameters from DeploymentIntent to find_configurations_meeting_slo()
  • Remove the post-query Python filtering for preferred_gpu_type (lines 191-205)

Benefits

  • More efficient: Filtering at DB level reduces data transfer
  • More flexible: Supports both inclusions and exclusions
  • Better UX: Users can express complex GPU preferences naturally

Acceptance Criteria

  • User can specify multiple preferred GPU types in natural language
  • User can exclude specific GPU types in natural language
  • GPU filtering happens at the database query level
  • Estimated benchmarks (JSON) also respect GPU include/exclude filters
  • Existing behavior preserved when no GPU preferences specified

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions