fix: Prevent incorrect priority inference from use case and hardware

anfredette · anfredette · commit 0531a531be31 · 2026-01-21T12:27:26.000-05:00
Problem:
- "Customer service chatbot for 30 users" incorrectly extracted latency_priority: "low"
- "chatbot with H100/H200 gpus" incorrectly extracted cost_priority: "low"
- LLM was inferring priorities from use case type and hardware choices instead
  of explicit user statements

Root causes:
1. Experience class guidance contained "low latency needed" text, confusing LLM
   into thinking latency_priority should be "low" (deprioritized) instead of
   understanding it as a functional requirement
2. LLM inferred "H100 = expensive GPU = user doesn't care about cost" and set
   cost_priority: "low"
3. Weak prompt guidance allowed use case and hardware to influence priorities

Changes:
1. backend/src/llm/prompts.py:
   - Remove latency-related language from Experience class guidance to prevent
     confusion between functional requirements and priority weights
   - Strengthen cost_priority extraction with "EXPLICITLY" and "ONLY" keywords
   - Add "DO NOT infer from GPU choice" warning to prevent hardware → cost inference
   - Strengthen latency_priority extraction to require explicit user statements
   - Add "FOLLOW STRICTLY" section with clear rules and 4 concrete examples
   - Default to "medium" for all priorities unless user explicitly states otherwise

2. ui/app.py:
   - Fix rendering bug where empty priorities_item string caused "&lt;/div&gt;" to
     display as visible text
   - Use HTML comment placeholder instead of empty string

Testing:
- "Customer service chatbot for 30 users" → all priorities "medium" ✓
- "chatbot with 300 users, low latency important, H100 gpus" →
  latency_priority: "high", cost_priority: "medium" ✓

Assisted-by: Claude &lt;noreply@anthropic.com&gt;
Signed-off-by: Andre Fredette &lt;afredette@redhat.com&gt;
diff --git a/backend/src/llm/prompts.py b/backend/src/llm/prompts.py
@@ -27,11 +27,11 @@
 - research_legal_analysis: Research/legal document analysis (very long prompts, detailed analysis)
 
 Experience class guidance:
-- instant: Extremely low latency required (<200ms TTFT) - code completion, autocomplete
-- conversational: Real-time user interaction (chatbots, interactive tools) - low latency needed
-- interactive: User waiting but can tolerate slight delay (RAG Q&A, content generation) - balanced
-- deferred: User can wait for quality (long summarization, detailed analysis) - quality over speed
-- batch: Background/async processing (research, legal analysis) - optimize for quality and cost
+- instant: Sub-200ms response time - code completion, autocomplete
+- conversational: Real-time user interaction - chatbots, interactive tools
+- interactive: User waiting but tolerates slight delay - RAG Q&A, content generation
+- deferred: Quality over speed - long summarization, detailed analysis
+- batch: Background/async processing - research, legal analysis
 """
 
 
@@ -87,15 +87,21 @@ def build_intent_extraction_prompt(user_message: str, conversation_history: list
 
 Priority extraction (for scoring weights - use "medium" as baseline, adjust based on context):
 - accuracy_priority: "high" if user mentions accuracy matters, quality is important, accuracy is critical, best model, or top quality. "low" if user says good enough or accuracy less important.
-- cost_priority: "high" if user mentions cost-effective, cost-sensitive, budget constrained, minimize cost, cost is important, or budget is tight. "low" if user says cost doesn't matter or budget is unlimited. Default to "medium" if not mentioned.
-- latency_priority: "high" if the use case requires fast responses (e.g., real-time, interactive, instant). "low" if async/batch is acceptable.
+- cost_priority: "high" if user EXPLICITLY says cost-effective, cost-sensitive, budget constrained, minimize cost, cost is important, or budget is tight. "low" ONLY if user EXPLICITLY says "cost doesn't matter" or "budget is unlimited" or "money is no object". Default to "medium" if not mentioned. DO NOT infer from GPU choice.
+- latency_priority: "high" if user mentions low latency needed, fast response critical, speed is important, real-time performance required, or instant responses needed. "low" if user says latency less important or async/batch is acceptable. Default to "medium" if not mentioned.
 - complexity_priority: "high" if user wants simple deployment, easy setup. "low" if they're okay with complex setups.
-IMPORTANT: Explicit user statements override inferences (e.g., "cost-effective preferred" → cost_priority: high)
+IMPORTANT - Priority Extraction Rules (FOLLOW STRICTLY):
+- Only extract priorities from EXPLICIT user statements about priorities, not from hardware choices or use case type
+- Hardware preference (H100, L4, etc.) does NOT imply cost_priority - user may have GPUs available or budget allocated
+- Use case type does NOT imply latency_priority - default to "medium" unless user explicitly mentions speed/latency concerns
+- When in doubt, use "medium" - only deviate when user EXPLICITLY states a priority
+Examples:
+- "chatbot with H100" → cost_priority: "medium" (H100 doesn't mean cost doesn't matter)
+- "low latency is important" → latency_priority: "high" (explicit statement)
+- "cost-effective solution" → cost_priority: "high" (explicit statement)
+- "chatbot for 300 users, low latency important, H100 gpus" → latency_priority: "high", cost_priority: "medium" (only latency explicitly mentioned)
 
 {INTENT_EXTRACTION_SCHEMA}
 """
     return prompt
 
-
-# NOTE: Experimental prompts for future conversational features have been
-# moved to prompts_experimental.py to keep this file focused on production code.
diff --git a/ui/app.py b/ui/app.py
@@ -4877,7 +4877,7 @@ def render_extraction_result(extraction: dict, priority: str):
     priorities_display = ", ".join(priority_badges) if priority_badges else None
 
     # Build priorities item HTML only if there are non-default priorities
-    priorities_item = ""
+    priorities_item = "<!-- no custom priorities -->"
     if priorities_display:
         priorities_item = f'''<div class="extraction-item"><div><div class="extraction-label">Priorities</div><div class="extraction-value">{priorities_display}</div></div></div>'''
 
@@ -4934,7 +4934,7 @@ def render_extraction_with_approval(extraction: dict, models_df: pd.DataFrame):
     priorities_display = ", ".join(priority_badges) if priority_badges else None
 
     # Build priorities item HTML only if there are non-default priorities
-    priorities_item = ""
+    priorities_item = "<!-- no custom priorities -->"
     if priorities_display:
         priorities_item = f'''<div class="extraction-item"><div><div class="extraction-label">Priorities</div><div class="extraction-value">{priorities_display}</div></div></div>'''