Contribution guidelines:

General guidelines:

Abstract stuff as much as you can, design code in such a way that minimum hardcoding is done inside the code.
IPs and any device specific parameters should be passed as sysargs while running the code.
Write all communication requests/responses in terms of RPC calls. We use gRPC for the same.

To run the code

Basic setup:

Clone the repository

git clone https://github.com/zorroblue/distributed-search-engine
Create a virtual environment venv if not done already

cd distributed-search-engine
virtualenv venv
Install mongodb 3.2.12
Install robomongo if you want to visualize the data changes on a GUI client(optional)
Set up the database on the master and backup server

mongoimport --jsonArray -d masterdb -c indices data/indices.json on master

mongoimport --jsonArray -d backupdb -c indices data/indices.json on backup of master`
Set up the environment

. environment.sh
Set up the required libraries

pip install -r requirements.txt
List the accessible replica servers in replicas_list.txt. The necessary setup as described above needs to be done.

Running the code

To build the protobufs

python -m grpc_tools.protoc -I./protos --python_out=. --grpc_python_out=. protos/search.proto

Running the servers

Running master.py, masterbackup.py and replica.py with appropriate command line arguments should work. For running the crawler, use crawler.py. For demo purposes, we append the URLs in the URL list of 5 search terms with the input seed word during the writes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Contribution guidelines:

General guidelines:

To run the code

Basic setup:

Running the code

To build the protobufs

Running the servers

FilesExpand file tree

CONTRIBUTING.md

Latest commit

History

CONTRIBUTING.md

File metadata and controls

Contribution guidelines:

General guidelines:

To run the code

Basic setup:

Running the code

To build the protobufs

Running the servers