- 3.4 used to be "Ranking evaluation: vector search", but you will do it in homework
- Videos 3.5 - 3.8 used to be a part of module 4, but now they are a part of the evaluation module (module 4 focuses only on monitoring)
- The data files for retrieval evaluation are in the search_evaluation folder
- The data files for RAG evaluation are in the rag_evaluation folder
- We kept the old data files - the ones generated using this old code
- In the new notebook, you have minsearch instead of elasticsearch
- Also, install the sentence transformers library, we will use it for generating embeddings in some of the videos
pip install sentence-transformers
Plan for the section:
- Why do we need evaluation
- Evaluation metrics
- Ground truth / gold standard data
- Generating ground truth with LLM
- Evaluating the search resuls
Note: in 2025 edition, we use Qdrant for performing vector search (not Elastic Search).
For more details, see Module 2.
- Approaches for getting evaluation data
- Using OpenAI to generate evaluation data
Links:
- Elasticsearch with text results
- minsearch
Links:
That's homework
- Modules recap
- Online vs offline evaluation
- Offline evaluation metrics
Note: We talk about using ElasticSearch, but it's from 2024. Skip to 03:40.
When following the video, use the new code in the notebook.
Links:
- notebook
- results-gpt4o.csv (answers from GPT-4o)
- results-gpt35.csv (answers from GPT-3.5-Turbo)
Content
- A->Q->A' cosine similarity
- Evaluating gpt-4o
- Evaluating gpt-3.5-turbo
- Evaluating gpt-4o-mini
Links:
- notebook
- results-gpt4o-cosine.csv (answers with cosine calculated from GPT-4o)
- results-gpt35-cosine.csv (answers with cosine calculated from GPT-3.5-Turbo)
- results-gpt4o-mini.csv (answers from GPT-4o-mini)
- results-gpt4o-mini-cosine.csv (answers with cosine calculated from GPT-4o-mini)
- LLM as a judge
- A->Q->A' evaluation
- Q->A evaluation
Links:
- notebook
- evaluations-aqa.csv (A->Q->A evaluation results)
- evaluations-qa.csv (Q->A evaluation results)
See here
Cohort 2025| Study notes and FAQ : LLM Evaluation
- Did you take notes? Add them above this line (Send a PR with links to your notes)