Skip to content

Commit 0531a53

Browse files
committed
fix: Prevent incorrect priority inference from use case and hardware
Problem: - "Customer service chatbot for 30 users" incorrectly extracted latency_priority: "low" - "chatbot with H100/H200 gpus" incorrectly extracted cost_priority: "low" - LLM was inferring priorities from use case type and hardware choices instead of explicit user statements Root causes: 1. Experience class guidance contained "low latency needed" text, confusing LLM into thinking latency_priority should be "low" (deprioritized) instead of understanding it as a functional requirement 2. LLM inferred "H100 = expensive GPU = user doesn't care about cost" and set cost_priority: "low" 3. Weak prompt guidance allowed use case and hardware to influence priorities Changes: 1. backend/src/llm/prompts.py: - Remove latency-related language from Experience class guidance to prevent confusion between functional requirements and priority weights - Strengthen cost_priority extraction with "EXPLICITLY" and "ONLY" keywords - Add "DO NOT infer from GPU choice" warning to prevent hardware → cost inference - Strengthen latency_priority extraction to require explicit user statements - Add "FOLLOW STRICTLY" section with clear rules and 4 concrete examples - Default to "medium" for all priorities unless user explicitly states otherwise 2. ui/app.py: - Fix rendering bug where empty priorities_item string caused "</div>" to display as visible text - Use HTML comment placeholder instead of empty string Testing: - "Customer service chatbot for 30 users" → all priorities "medium" ✓ - "chatbot with 300 users, low latency important, H100 gpus" → latency_priority: "high", cost_priority: "medium" ✓ Assisted-by: Claude <noreply@anthropic.com> Signed-off-by: Andre Fredette <afredette@redhat.com>
1 parent d327199 commit 0531a53

File tree

2 files changed

+19
-13
lines changed

2 files changed

+19
-13
lines changed

backend/src/llm/prompts.py

Lines changed: 17 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -27,11 +27,11 @@
2727
- research_legal_analysis: Research/legal document analysis (very long prompts, detailed analysis)
2828
2929
Experience class guidance:
30-
- instant: Extremely low latency required (<200ms TTFT) - code completion, autocomplete
31-
- conversational: Real-time user interaction (chatbots, interactive tools) - low latency needed
32-
- interactive: User waiting but can tolerate slight delay (RAG Q&A, content generation) - balanced
33-
- deferred: User can wait for quality (long summarization, detailed analysis) - quality over speed
34-
- batch: Background/async processing (research, legal analysis) - optimize for quality and cost
30+
- instant: Sub-200ms response time - code completion, autocomplete
31+
- conversational: Real-time user interaction - chatbots, interactive tools
32+
- interactive: User waiting but tolerates slight delay - RAG Q&A, content generation
33+
- deferred: Quality over speed - long summarization, detailed analysis
34+
- batch: Background/async processing - research, legal analysis
3535
"""
3636

3737

@@ -87,15 +87,21 @@ def build_intent_extraction_prompt(user_message: str, conversation_history: list
8787
8888
Priority extraction (for scoring weights - use "medium" as baseline, adjust based on context):
8989
- accuracy_priority: "high" if user mentions accuracy matters, quality is important, accuracy is critical, best model, or top quality. "low" if user says good enough or accuracy less important.
90-
- cost_priority: "high" if user mentions cost-effective, cost-sensitive, budget constrained, minimize cost, cost is important, or budget is tight. "low" if user says cost doesn't matter or budget is unlimited. Default to "medium" if not mentioned.
91-
- latency_priority: "high" if the use case requires fast responses (e.g., real-time, interactive, instant). "low" if async/batch is acceptable.
90+
- cost_priority: "high" if user EXPLICITLY says cost-effective, cost-sensitive, budget constrained, minimize cost, cost is important, or budget is tight. "low" ONLY if user EXPLICITLY says "cost doesn't matter" or "budget is unlimited" or "money is no object". Default to "medium" if not mentioned. DO NOT infer from GPU choice.
91+
- latency_priority: "high" if user mentions low latency needed, fast response critical, speed is important, real-time performance required, or instant responses needed. "low" if user says latency less important or async/batch is acceptable. Default to "medium" if not mentioned.
9292
- complexity_priority: "high" if user wants simple deployment, easy setup. "low" if they're okay with complex setups.
93-
IMPORTANT: Explicit user statements override inferences (e.g., "cost-effective preferred" → cost_priority: high)
93+
IMPORTANT - Priority Extraction Rules (FOLLOW STRICTLY):
94+
- Only extract priorities from EXPLICIT user statements about priorities, not from hardware choices or use case type
95+
- Hardware preference (H100, L4, etc.) does NOT imply cost_priority - user may have GPUs available or budget allocated
96+
- Use case type does NOT imply latency_priority - default to "medium" unless user explicitly mentions speed/latency concerns
97+
- When in doubt, use "medium" - only deviate when user EXPLICITLY states a priority
98+
Examples:
99+
- "chatbot with H100" → cost_priority: "medium" (H100 doesn't mean cost doesn't matter)
100+
- "low latency is important" → latency_priority: "high" (explicit statement)
101+
- "cost-effective solution" → cost_priority: "high" (explicit statement)
102+
- "chatbot for 300 users, low latency important, H100 gpus" → latency_priority: "high", cost_priority: "medium" (only latency explicitly mentioned)
94103
95104
{INTENT_EXTRACTION_SCHEMA}
96105
"""
97106
return prompt
98107

99-
100-
# NOTE: Experimental prompts for future conversational features have been
101-
# moved to prompts_experimental.py to keep this file focused on production code.

ui/app.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4877,7 +4877,7 @@ def render_extraction_result(extraction: dict, priority: str):
48774877
priorities_display = ", ".join(priority_badges) if priority_badges else None
48784878

48794879
# Build priorities item HTML only if there are non-default priorities
4880-
priorities_item = ""
4880+
priorities_item = "<!-- no custom priorities -->"
48814881
if priorities_display:
48824882
priorities_item = f'''<div class="extraction-item"><div><div class="extraction-label">Priorities</div><div class="extraction-value">{priorities_display}</div></div></div>'''
48834883

@@ -4934,7 +4934,7 @@ def render_extraction_with_approval(extraction: dict, models_df: pd.DataFrame):
49344934
priorities_display = ", ".join(priority_badges) if priority_badges else None
49354935

49364936
# Build priorities item HTML only if there are non-default priorities
4937-
priorities_item = ""
4937+
priorities_item = "<!-- no custom priorities -->"
49384938
if priorities_display:
49394939
priorities_item = f'''<div class="extraction-item"><div><div class="extraction-label">Priorities</div><div class="extraction-value">{priorities_display}</div></div></div>'''
49404940

0 commit comments

Comments
 (0)