Skip to content

Indrajit-hub/Indexing_And_Searching_Using_pyserini

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 

Repository files navigation

Indexing_And_Searching_Using_pyserini

This repository demonstrates indexing and searching the TREC datasets using Pyserini.

The process involves three steps:

  1. Converting HTML documents to JSONL format;
  2. Indexing the JSONL data with Lucene;
  3. Performing BM25 searches.

The project utilizes Python, Pyserini, a compatible JDK, and several Python libraries (BeautifulSoup, tqdm, PyTorch, TorchVision, Transformers). Version compatibility among these components is crucial for successful execution.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published