🔍 chatnoir-pyterrier

Use the ChatNoir REST-API in PyTerrier for retrieval/re-ranking against large corpora such as ClueWeb09, ClueWeb12, ClueWeb22, or MS MARCO.

Powered by the chatnoir-api package.

Installation

Install the package from PyPI:

pip install chatnoir-pyterrier

Usage

You can use the ChatNoirRetrieve PyTerrier module in any PyTerrier pipeline, like you would do with BatchRetrieve.

from chatnoir_pyterrier import ChatNoirRetrieve

chatnoir = ChatNoirRetrieve(index="msmarco-document-v2.1")
chatnoir.search("python library")

Features

ChatNoir provides an extensive set of extra features, such as the full text or page rank / spam rank (for some indices). These can easily be included in the response data frame for usage in subsequent PyTerrier re-ranking stages like so:

from chatnoir_pyterrier import ChatNoirRetrieve, Feature

chatnoir_msmarco_snippet = ChatNoirRetrieve(index="msmarco-document-v2.1", features=Feature.SNIPPET_TEXT)
chatnoir_msmarco_snippet.search("python library")

chatnoir_cw09_page_spam_rank = ChatNoirRetrieve(index="clueweb09", features=Feature.PAGE_RANK | Feature.SPAM_RANK)
chatnoir_cw09_page_spam_rank.search("python library")

Caching

We recommend wrapping ChatNoirRetrieve in a RetrieverCache, using the pyterrier-caching library:

from chatnoir_pyterrier import ChatNoirRetrieve
from pyterrier_caching import RetrieverCache

chatnoir = ChatNoirRetrieve(index="msmarco-document-v2.1")
cached_chatnoir = RetrieverCache("path/to/cache", chatnoir)

This way, the ChatNoir API is called only once per query, and subsequent experiments can use the cached results. Refer to the pyterrier-caching documentation for more details on how the caching works.

Advanced usage

Please check out our sample notebook or open it in Google Colab.

We also provide a hands-on guide for the Touché 2023 shared tasks here.

Experiments

With chatnoir-pyterrier, it is easy to run benchmarks on a number of shared tasks that run on larger document collections. We demonstrate this by running ChatNoir retrieval on all suported TREC, CLEF, and NTCIR shared tasks available in ir_datasets.

First install the experiment dependencies:

pip install -e .[experiment]

To run the experiments, first create the runs by running:

ray job submit --runtime-env examples/ray-runtime-env.yml --no-wait -- python examples/experiment.py

This will create runs for each shared task in parallel and save it to a cache.

After creating the runs, the experiment.ipynb notebook can be used to analyze the results.

Indexing

Head over to the ChatNoir ir_datasets indexer to learn more on how new ir_datasets-compatible datasets are indexed into ChatNoir.

Development

To build this package and contribute to its development you need to install the build, and setuptools and wheel packages:

pip install build setuptools wheel

(On most systems, these packages are already pre-installed.)

Development installation

Install package and test dependencies:

pip install -e .[test]

Testing

Configure the API keys for testing:

export CHATNOIR_API_KEY="<API_KEY>"

Verify your changes against the test suite to verify.

ruff check .                   # Code format and LINT
mypy .                         # Static typing
bandit -c pyproject.toml -r .  # Security
pytest .                       # Unit tests

Please also add tests for your newly developed code.

Build wheels

Wheels for this package can be built with:

python -m build

Support

If you hit any problems using this package, please file an issue. We're happy to help!

License

This repository is released under the MIT license.

Name		Name	Last commit message	Last commit date
Latest commit History 196 Commits
.github		.github
.idea		.idea
chatnoir_pyterrier		chatnoir_pyterrier
examples		examples
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🔍 chatnoir-pyterrier

Installation

Usage

Features

Caching

Advanced usage

Experiments

Indexing

Development

Development installation

Testing

Build wheels

Support

License

About

Uh oh!

Releases 15

Uh oh!

Contributors 3

Uh oh!

Languages

License

chatnoir-eu/chatnoir-pyterrier

Folders and files

Latest commit

History

Repository files navigation

🔍 chatnoir-pyterrier

Installation

Usage

Features

Caching

Advanced usage

Experiments

Indexing

Development

Development installation

Testing

Build wheels

Support

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 15

Uh oh!

Contributors 3

Uh oh!

Languages