155 lines (98 loc) · 5.16 KB

RAG and LLM Evaluation

3.0 Content update explanation video

3.4 used to be "Ranking evaluation: vector search", but you will do it in homework
Videos 3.5 - 3.8 used to be a part of module 4, but now they are a part of the evaluation module (module 4 focuses only on monitoring)
The data files for retrieval evaluation are in the search_evaluation folder
The data files for RAG evaluation are in the rag_evaluation folder
- We kept the old data files - the ones generated using this old code
- In the new notebook, you have minsearch instead of elasticsearch
Also, install the sentence transformers library, we will use it for generating embeddings in some of the videos
```
pip install sentence-transformers
```

3.1 Introduction

Plan for the section:

Why do we need evaluation
Evaluation metrics
Ground truth / gold standard data
Generating ground truth with LLM
Evaluating the search resuls

Note: in 2025 edition, we use Qdrant for performing vector search (not Elastic Search).

For more details, see Module 2.

3.2 Getting ground truth data

Approaches for getting evaluation data
Using OpenAI to generate evaluation data

Links:

3.3 Ranking evaluation: text search

Elasticsearch with text results
minsearch

Links:

Notebook

3.4 Evaluating Vector Search

That's homework

3.5 Offline vs Online (RAG) evaluation

Modules recap
Online vs offline evaluation
Offline evaluation metrics

3.6 Generating data for offline RAG evaluation

Note: We talk about using ElasticSearch, but it's from 2024. Skip to 03:40.

When following the video, use the new code in the notebook.

Links:

notebook
results-gpt4o.csv (answers from GPT-4o)
results-gpt35.csv (answers from GPT-3.5-Turbo)

3.7 Offline RAG evaluation: cosine similarity

Content

A->Q->A' cosine similarity
Evaluating gpt-4o
Evaluating gpt-3.5-turbo
Evaluating gpt-4o-mini

Links:

notebook
results-gpt4o-cosine.csv (answers with cosine calculated from GPT-4o)
results-gpt35-cosine.csv (answers with cosine calculated from GPT-3.5-Turbo)
results-gpt4o-mini.csv (answers from GPT-4o-mini)
results-gpt4o-mini-cosine.csv (answers with cosine calculated from GPT-4o-mini)

3.8 Offline RAG evaluation: LLM as a judge

LLM as a judge
A->Q->A' evaluation
Q->A evaluation

Links:

notebook
evaluations-aqa.csv (A->Q->A evaluation results)
evaluations-qa.csv (Q->A evaluation results)

Homework

See here

Notes

Cohort 2025| Study notes and FAQ : LLM Evaluation

Did you take notes? Add them above this line (Send a PR with links to your notes)