This repository demonstrates indexing and searching the TREC datasets using Pyserini.
- Converting HTML documents to JSONL format;
- Indexing the JSONL data with Lucene;
- Performing BM25 searches.
The project utilizes Python, Pyserini, a compatible JDK, and several Python libraries (BeautifulSoup, tqdm, PyTorch, TorchVision, Transformers). Version compatibility among these components is crucial for successful execution.