CAPP30122-Project-

CAPP30122 2018 Group Project By WXD

Data downloaded from https://www.kaggle.com/hugodarwood/epirecipes/data

Scripts in our project:

In the cookbook folder:

Vector_space.py, a python file containing data preprocessing and vector space model construction
BM25.py, a python file containing data preprocessing and probabilistic BM-25 model construction
Scrapeimage.py, a python file scraping image from google

Note: data preprocessing has already been added to Vector_space.py and BM25.py

In the evaluation folder:

project.py, a python file that evaluates vector space model and BM25 model based on normalized discounted cumulative gain score.

Files in our project:

In the cookbook folder

for Vector_space.py and BM25.py, the documents are originally generated by our save_func function. We delete the save_func steps and directly load the following documents to shorten running time.

For Vector_space.py, doc_length, documents, index_in_json, inverted_index, word_set are saved
For BM25.py, doc_length_BM25, documents_BM25, index_in_json_BM25, inverted_index_BM25, word_set_BM25 are saved

In the evaluation folder

The following documents are also generated by the save_func function.

doc_length_BM25_removed_stop_words, doc_length_tfdf_removed_stop_words, documents_BM25_removed_stop_words, documents_tfdf_removed_stop_words, index_in_json_BM25_removed_stop_words, index_in_json_tfidf_removed_stop_words, inverted_index_BM25_removed_stop_words, inverted_index_tfidf_removed_stop_words, word_set_BM25_removed_stop_words, word_set_tfidf_removed_stop_words

evaluation.xlsx includes our results for evaluation

To run the project:

Install pip in your terminal (if have not done so)
To run the project using Django user interface: $ pip install Django
Go to “cookbook” folder, run: $ python3 manage.py runserver
Go to 127.0.0.1:8000/search/
If you want to use a different model, go to views.py in cookbook/search/views.py, change "from BM25 import * " to "from Vector_space import * "

To run model evaluation:

Please change the directory into "evaluation" folder.
Run python project.py.
You can see the scoring process shown in the terminal and it will print the score for ten queries we selected for both TFIDF model and BM25 model.
Please open the evaluation.xlsx file to see the histogram for comparing NDCG value for two models.

Responsibilities:

Data Preprocessing: Wenxi Xiao
Build models and algorithms: Lerong Wang, Wenxi Xiao, Yangyang Dai
Django user interface: Lerong Wang, Yangyang Dai
Model Evaluation: Wenxi Xiao

Documentation of Code Ownership:

"Direct copy" ~ Learned from online sources, Generated by installed package (Django or other) and few edits made
"Modified" ~ Generated by installed package (Django or other) and meaningful edits made OR heavily utilized template(s) provided by tutorial sessions (TA- or Django-generated)
"Original" ~ Original code or heavily modified given structure

Name		Name	Last commit message	Last commit date
Latest commit History 55 Commits
cookbook		cookbook
data preprocessing		data preprocessing
evaluation		evaluation
README.md		README.md
project presentation.pptx		project presentation.pptx
proposal presentation.pptx		proposal presentation.pptx

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CAPP30122-Project-

Scripts in our project:

Files in our project:

In the cookbook folder

In the evaluation folder

To run the project:

To run model evaluation:

Responsibilities:

Documentation of Code Ownership:

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

CAPP30122-Project-

Scripts in our project:

Files in our project:

In the cookbook folder

In the evaluation folder

To run the project:

To run model evaluation:

Responsibilities:

Documentation of Code Ownership:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages