For comprehensive metric selection guidance, use these primary resources:
🎯 Decision Trees for Metric Selection (Primary Authority)
Task-specific metric selection paths with budget allocation and implementation priorities:
- Q&A Systems, RAG, Content Generation, Code Generation, and Specialized Tasks
- Budget allocation guidelines by risk level
- Implementation decision frameworks
Interactive guidance for selecting evaluation approaches with specific implementation plans
Comprehensive framework for understanding and measuring LLM quality dimensions
| Use Case | Primary Authority | Essential Metrics |
|---|---|---|
| Document Q&A (RAG) | RAG Metrics Path | Context Precision, Faithfulness, Answer Relevance |
| Customer Support | Q&A Metrics Path | Accuracy, Relevance, Safety |
| Content Creation | Creative Metrics Path | Creativity, Coherence, Style |
| Code Generation | Code Metrics Path | Execution, Correctness, Security |
💡 For detailed guidance: Each link above provides specific implementation priorities, budget allocation, and success criteria.
| Metric | Purpose | Implementation | Cost |
|---|---|---|---|
| Safety Check | Prevent harmful outputs | HuggingFace toxicity models | ~$0.01/eval |
| Response Time | User experience baseline | Built-in timing | Free |
| Basic Relevance | Task completion check | Semantic similarity | ~$0.02/eval |
| Metric | Purpose | Implementation | Cost |
|---|---|---|---|
| User Feedback | Direct quality signal | Simple rating system | Free |
| Length Check | Catch obvious failures | Word count validation | Free |
| Basic Coherence | Content quality | Simple heuristics | Free |
📋 For comprehensive metric catalogs and selection frameworks, see the Quality Dimensions Mapping.
- New to evaluation? → Start with Starter Evaluation Toolkit
- Know your task type? → Use Decision Trees for specific guidance
- Need ROI justification? → See Cost-Benefit Calculator
- ❌ Measuring everything - Start with 3-5 essential metrics
- ❌ Ignoring safety - Always include basic safety checks
- ❌ No user feedback - Include at least thumbs up/down
- ❌ Perfect metrics - 80% accuracy quickly beats 99% accuracy slowly
| Tool Category | Recommended Options | Cost Range |
|---|---|---|
| Safety | HuggingFace models, OpenAI Moderation | $0.01-0.05/eval |
| Automated Quality | RAGAS, BERTScore, Semantic similarity | $0.02-0.10/eval |
| LLM-as-Judge | GPT-4, Claude, Local models | $0.01-0.15/eval |
| Human Review | Built-in feedback, Scale AI, Expert panels | $0.50-15.00/eval |
- Decision Trees: Task-specific metric selection with detailed guidance (Primary Authority)
- Quality Dimensions: Comprehensive metric definitions and measurement strategies
- Starter Evaluation Toolkit: Day 1 implementation guide with code examples
- Tool Comparison Matrix: Platform and vendor selection (Definitive Source)
- Implementation Guides: Technical setup instructions
- Evaluation Selection Wizard: Interactive approach selection
- Quick Assessment Tool: 2-minute evaluation readiness check
- Cost-Benefit Calculator: Budget planning and ROI analysis
- Master Roadmap: Strategic planning with specialized templates
This guide provides quick access to metric selection resources with clear navigation to detailed implementation guidance.