- Dataset Generator
- Embedding Model Evaluator
- Approximate Search Evaluator
Dataset Generator (DAGE)
This tool provides a flexible command-line tool to generate relevance datasets for search evaluation. It can retrieve documents from a search engine, generate synthetic queries, and score the relevance of document-query pairs using LLMs.
This tool provide a flexible tool to test a HuggingFace embedding model to ensure that works as expected with exact vector search.
This tool provide a flexible tool to deply RRE and extract metrics to test your search engine collection given a template.
- uv: A fast Python package installer and resolver. To install uv follow the instructions here
- Python=3.10 version is fixed and widely used in the project, see .python-version file
First, create a virtual environment using uv following the file pyproject.toml. To do so, just execute:
# place yourself in the rre-tools folder
cd rre-tools
# install dependencies (for users)
uv sync
# install development dependencies as well (e.g., mypy and ruff)
uv sync --group devBefore running the command below, you need to have running search engine instance
(solr/opensearch/elasticsearch/vespa). This can be done even with the test collections in folder
docker-services.
For a detailed description to fill your configuration file (e.g., Config) you can look at the Dataset Generator README.
Execute the main script via CLI, pointing to your DAGE configuration file:
uv run dataset-generator --config <path-to-DAGE-config-yaml>To know more about all the possible CLI parameters, execute:
uv run dataset-generator --helpFor a detailed description to fill in configuration file (e.g., Config) you can look at the README.
Execute the main script via CLI, pointing to configuration file:
uv run embedding-model-evaluator --config <path-to-config-yaml>Execute pytest command as follows:
uv run pytestThe script will then:
- Fetch documents from the specified search engine.
- Generate or load queries.
- Score the relevance for each (document, query) pair.
- Save the output to the destination (specified in the config file).
ruff.toml: Configures Ruff's linting rules and settingsmypy.ini: Configures Mypy's type checking settings
To run mypy type checks inside the environment use
uv run mypy .To run ruff linter inside the environment use
uv run ruff check