Skip to content

Latest commit

 

History

History
91 lines (66 loc) · 4.98 KB

File metadata and controls

91 lines (66 loc) · 4.98 KB

Metrics Selection Reference Guide

Primary Resources

For comprehensive metric selection guidance, use these primary resources:

🎯 Decision Trees for Metric Selection (Primary Authority)

Task-specific metric selection paths with budget allocation and implementation priorities:

  • Q&A Systems, RAG, Content Generation, Code Generation, and Specialized Tasks
  • Budget allocation guidelines by risk level
  • Implementation decision frameworks

Interactive guidance for selecting evaluation approaches with specific implementation plans

Comprehensive framework for understanding and measuring LLM quality dimensions

Quick Reference

Quick Task-to-Metrics Mapping

Use Case Primary Authority Essential Metrics
Document Q&A (RAG) RAG Metrics Path Context Precision, Faithfulness, Answer Relevance
Customer Support Q&A Metrics Path Accuracy, Relevance, Safety
Content Creation Creative Metrics Path Creativity, Coherence, Style
Code Generation Code Metrics Path Execution, Correctness, Security

💡 For detailed guidance: Each link above provides specific implementation priorities, budget allocation, and success criteria.

Essential Metrics Starter Kit

Day 1 Metrics (Implement First)

Metric Purpose Implementation Cost
Safety Check Prevent harmful outputs HuggingFace toxicity models ~$0.01/eval
Response Time User experience baseline Built-in timing Free
Basic Relevance Task completion check Semantic similarity ~$0.02/eval

Week 1 Additions

Metric Purpose Implementation Cost
User Feedback Direct quality signal Simple rating system Free
Length Check Catch obvious failures Word count validation Free
Basic Coherence Content quality Simple heuristics Free

📋 For comprehensive metric catalogs and selection frameworks, see the Quality Dimensions Mapping.

Quick Selection Framework

Step 1: Choose Your Starting Point

Step 2: Avoid Common Mistakes

  • Measuring everything - Start with 3-5 essential metrics
  • Ignoring safety - Always include basic safety checks
  • No user feedback - Include at least thumbs up/down
  • Perfect metrics - 80% accuracy quickly beats 99% accuracy slowly

Implementation Tools Quick Reference

Tool Category Recommended Options Cost Range
Safety HuggingFace models, OpenAI Moderation $0.01-0.05/eval
Automated Quality RAGAS, BERTScore, Semantic similarity $0.02-0.10/eval
LLM-as-Judge GPT-4, Claude, Local models $0.01-0.15/eval
Human Review Built-in feedback, Scale AI, Expert panels $0.50-15.00/eval

📖 Related Framework Resources

Primary Authorities

  • Decision Trees: Task-specific metric selection with detailed guidance (Primary Authority)
  • Quality Dimensions: Comprehensive metric definitions and measurement strategies

Implementation & Tools

Planning & Selection

This guide provides quick access to metric selection resources with clear navigation to detailed implementation guidance.