Skip to content

chatnoir-eu/chatnoir-pyterrier

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

PyPi CI Code coverage Python Google Colab Issues Commit activity Downloads License

๐Ÿ” chatnoir-pyterrier

Use the ChatNoir REST-API in PyTerrier for retrieval/re-ranking against large corpora such as ClueWeb09, ClueWeb12, ClueWeb22, or MS MARCO.

Powered by the chatnoir-api package.

Installation

Install the package from PyPI:

pip install chatnoir-pyterrier

Usage

You can use the ChatNoirRetrieve PyTerrier module in any PyTerrier pipeline, like you would do with BatchRetrieve.

from chatnoir_pyterrier import ChatNoirRetrieve, Feature

chatnoir = ChatNoirRetrieve(index="msmarco-document-v2.1", features=Feature.SNIPPET_TEXT)
chatnoir.search("python library")

Features

ChatNoir provides an extensive set of extra features, such as the full text or page rank / spam rank (for some indices). These can easily be included in the response data frame for usage in subsequent PyTerrier re-ranking stages like so:

from chatnoir_pyterrier import ChatNoirRetrieve, Feature

chatnoir_msmarco_snippet = ChatNoirRetrieve(index="msmarco-document-v2.1", features=Feature.SNIPPET_TEXT)
chatnoir_msmarco_snippet.search("python library")

chatnoir_cw09_page_spam_rank = ChatNoirRetrieve(index="clueweb09", features=Feature.PAGE_RANK | Feature.SPAM_RANK)
chatnoir_cw09_page_spam_rank.search("python library")

Advanced usage

Please check out our sample notebook or open it in Google Colab.

We also provide a hands-on guide for the Touchรฉ 2023 shared tasks here.

Experiments

With chatnoir-pyterrier, it is easy to run benchmarks on a number of shared tasks that run on larger document collections. We demonstrate this by running ChatNoir retrieval on all suported TREC, CLEF, and NTCIR shared tasks available in ir_datasets.

First install the experiment dependencies:

pip install -e .[experiment]

To run the experiments, first create the runs by running:

ray job submit --runtime-env examples/ray-runtime-env.yml --no-wait -- python examples/experiment.py 

This will create runs for each shared task in parallel and save it to a cache.

After creating the runs, the experiment.ipynb notebook can be used to analyze the results.

Indexing

Head over to the ChatNoir ir_datasets indexer to learn more on how new ir_datasets-compatible datasets are indexed into ChatNoir.

Development

To build this package and contribute to its development you need to install the build, and setuptools and wheel packages:

pip install build setuptools wheel

(On most systems, these packages are already pre-installed.)

Development installation

Install package and test dependencies:

pip install -e .[test]

Testing

Configure the API keys for testing:

export CHATNOIR_API_KEY="<API_KEY>"

Verify your changes against the test suite to verify.

ruff check .                   # Code format and LINT
mypy .                         # Static typing
bandit -c pyproject.toml -r .  # Security
pytest .                       # Unit tests

Please also add tests for your newly developed code.

Build wheels

Wheels for this package can be built with:

python -m build

Support

If you hit any problems using this package, please file an issue. We're happy to help!

License

This repository is released under the MIT license.