+ "The objective of running evaluations is to measure your pipeline's performance and detect any regressions. To track progress, it is essential to establish baseline metrics using off-the-shelf approaches such as BM25 for keyword retrieval or \"sentence-transformers\" models for embeddings. Then, continue evaluating your pipeline with various parameters: adjust the `top_k` value, experiment with different embedding models, tweak the `temperature`, and benchmark the results to identify what works best for your use case. If labeled data is needed for evaluation, you can use datasets that include ground-truth documents and answers. Such datasets are available on [Hugging Face datasets](https://huggingface.co/datasets) or in the [haystack-evaluation](https://github.com/deepset-ai/haystack-evaluation/tree/main/datasets) repository.\n",
0 commit comments