|
7 | 7 | "experience_class": "instant|conversational|interactive|deferred|batch", |
8 | 8 | "user_count": <integer>, |
9 | 9 | "domain_specialization": ["general"|"code"|"multilingual"|"enterprise"], |
| 10 | + "preferred_gpu_type": "<GPU type if mentioned (H100, H200, A100, L4), or 'Any GPU' if not specified>", |
10 | 11 | "accuracy_priority": "low|medium|high", |
11 | 12 | "cost_priority": "low|medium|high", |
12 | 13 | "latency_priority": "low|medium|high", |
@@ -63,13 +64,23 @@ def build_intent_extraction_prompt(user_message: str, conversation_history: list |
63 | 64 | 1. **Use case**: What type of application (chatbot, customer service, code generation, summarization, etc.) |
64 | 65 | 2. **User count**: How many users or scale mentioned (estimate if not explicit) |
65 | 66 | 3. **Domain specialization**: Any specific domains mentioned (code, multilingual, enterprise, etc.) |
| 67 | +4. **Latency requirement**: How important is low latency? (very_high = sub-500ms, high = sub-2s, medium = 2-5s, low = >5s acceptable) |
| 68 | +5. **Throughput priority**: Is high request volume more important than low latency? |
| 69 | +6. **Budget constraint**: How price-sensitive are they? |
| 70 | +7. **Domain specialization**: Any specific domains mentioned (code, multilingual, enterprise, etc.) |
| 71 | +8. **Preferred GPU**: If user mentions a specific GPU type (H100, H200, A100, A100-80, L4, B200), extract it |
66 | 72 |
|
67 | 73 | Be intelligent about inference: |
68 | 74 | - "thousands of users" → estimate specific number |
69 | 75 | - "document Q&A" or "knowledge base" or "document search" → use_case: document_analysis_rag |
70 | 76 | - "RAG" or "retrieval" → use_case: document_analysis_rag |
71 | 77 | - "chatbot" or "customer service" or "conversational" → use_case: chatbot_conversational |
72 | 78 | - "summarize document" or "summarization" → use_case: summarization_short or long_document_summarization |
| 79 | +- "running on h200" or "h200" or "H200" → preferred_gpu_type: "H200" |
| 80 | +- "h100" or "H100" → preferred_gpu_type: "H100" |
| 81 | +- "a100" or "A100" → preferred_gpu_type: "A100" |
| 82 | +- "l4" or "L4" → preferred_gpu_type: "L4" |
| 83 | +- No GPU mentioned → preferred_gpu_type: "Any GPU" |
73 | 84 |
|
74 | 85 | Priority extraction (for scoring weights - use "medium" as baseline, adjust based on context): |
75 | 86 | - accuracy_priority: "high" if user mentions accuracy matters, quality is important, accuracy is critical, best model, or top quality. "low" if user says good enough or accuracy less important. |
|
0 commit comments