Information Retrieval System on Cranfield Dataset

Problem Definition

An information retrieval system needs to be built which can retrieve documents from a set of documents given a query for the document. We have a naive vector space model, which uses the classical approach to implement the IR system. However, it fails to retrieve relevant documents for some queries. We need to find and implement methods which can address these issues and make the model more efficient and accurate.

Code Description

This repo contains 6 folders corresponding to the 5 different types of models trained on the cranfield dataset and the 6th folder for hypothesis testing between models.

baseline_vsm

This is a naive vector space model with acts as a base model on which we make modifications.

vsm_with_corrections

This model has better tokenized and spell corrected text.

lsa

This is the model which implements latent sematic indexing.

query_expansion

This model uses expanded queries.

word2vec

This model uses Google's pretrained neural net model.

NOTE: Each model can be run independent of each other by running the main.py file of the respective model.

hyp_testing_data

This folder contains randomly sampled nDCG scores for all 5 models. The t_test.py file performs the paired t test between the models.

Report

The detailed introduction to the problem, methodology, results and conclusions can be found in the report

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Information Retrieval System on Cranfield Dataset

Problem Definition

Code Description

baseline_vsm

vsm_with_corrections

lsa

query_expansion

word2vec

hyp_testing_data

Report

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
baseline_vsm		baseline_vsm
hyp_testing_data		hyp_testing_data
lsa		lsa
query_expansion		query_expansion
vsm_with_corrections		vsm_with_corrections
word2vec		word2vec
README.md		README.md
project_proposal.pdf		project_proposal.pdf
report.pdf		report.pdf

devanshjain7/information-retrieval

Folders and files

Latest commit

History

Repository files navigation

Information Retrieval System on Cranfield Dataset

Problem Definition

Code Description

baseline_vsm

vsm_with_corrections

lsa

query_expansion

word2vec

hyp_testing_data

Report

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages