You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For this project, we will analyze millions of NYC Parking violations since January 2016.
Part 1: Python Scripting
In the first part, we simply want to develop a python command line interface that can connect to the OPCV API and demonstrate that the data is accessible via python.
A) File Structure
B) Associated Files
1. Python Scripting Files:
main.py
It parses the arguments --page_size, --num_pages, --output into the api.py file for function call.
The code can be found in this repository under nyc_parking_violations
api.py
It has all the functions and error handling code to implement the exercise.
The APP Token, domain, etc are also defined here alongside necessary packages.
The code can be found in this repository under nyc_parking_violations > src > bigdata 1
2. Supplementary Files:
Docker File
It is a text document that contains all commands a user could call on the command line to assemble an image.
It is located in the root directory of our project.
Requirements.txt
This file is used for specifying what python packages are required to run the project you are looking at.
It is located in the root directory of our project.
3. Output:
NYC_PV_Sample.csv
It is a simple .csv file to record first 1000 records to see how the info is rendered from the API.
This result is populated based on the method called through the results_filter command.
results.json
This is our main output which shows the json output we have called from the API.
It will store as many rows we pass as part of our arguments - num_pages and pages_size.
It is located in the root directory of our project.
C) Commands
1. Docker Build:
Docker build
docker build -t bigdata1:2.0 .
Docker run on /bin/bash
docker run -v "$(pwd):/app" -e APP_TOKEN=<api_token> -it bigdata1:2.0 /bin/bash
Docker mount with arguments
docker run -v "$(pwd):/app" -e APP_TOKEN=<api_token> -it bigdata1:2.0 python -m main --page_size=3 --num_pages=2 --output=results.json
2. Deploying via Docker Hub:
Connection to docker hub
docker login --username=tanaydocker
Password: <put your docker login password>
In this second part, we want to leverage docker-compose to bring up a service that encapsulates our bigdata1 container and an elasticsearch container and ensures that they are able to interact.
A) File Structure
B) Associated Files
1. Python Scripting Files:
main.py
It parses the arguments --page_size, --num_pages, --output -- elastic_search into the api.py file for function call.
The code can be found in this repository under nyc_parking_violations
api.py
It has all the functions and error handling code to implement the exercise.
The APP Token, domain, etc are also defined here alongside necessary packages.
The code can be found in this repository under nyc_parking_violations > src > bigdata 1
elasticsearch.py
It has all the functions to create the necessary instance for elastic search module.
The data formatting related tasks are also accomplished here.
The code can be found in this repository under nyc_parking_violations > src > bigdata 1
2. Supplementary Files:
Docker File
Same content as part 1.
Requirements.txt
Only update from part 1 here is an additional exercise to install elasticsearch.
docker-compose.yml
Docker-compose is a tool for defining and running multi-container Docker applications. With Compose, we use a YAML/YML file to configure our application's services.
NOTE: Remember, docker-compose. yml files are used for defining and running multi-container Docker applications, whereas Dockerfiles are simple text files that contain the commands to assemble an image that will be used to deploy containers.
3. Output:
results.json
This is our main output which shows the json output we have called from the API.
It will store as many rows we pass as part of our arguments - num_pages and pages_size.
It is located in the root directory of our project.
This is a sample to the 1 million records we will try to push into elastic search later.
C) Commands
1. Setting up docker:
Clean all the previous images
docker system prune -a
Allocate memory to docker for uploading large volume of data in Kibana
Build the Pyth, Elastic Search and Kibana instances
docker-compose build pyth
Launch the above services in detach mode
docker-compose up -d
Check if the services are up and ruuning?
docker ps -a
Check logs to see if Elastic Search is ready via localhost/docker's IP address
docker-compose logs elasticsearch
NOTE: It takes the system good 2-4 minutes to get the services running. We can continue to monitor the log and see if that has
any error. If all goes fine, in some time the logs will show - services being initiated and then display the running status.
Check logs to see if Kibana is ready via localhost/docker's IP address
docker-compose logs kibana
NOTE: It takes the system good 2-4 minutes to get the services running. We can continue to monitor the log and see if that has
any error. If all goes fine, in some time the logs will show - kibana up and is connected to the elastic search instance.
To kill the services
docker-compose down
3. Verify the build and successful logins into Elastic Search:
Verifying the build status
docker-compose run pyth bash
Verfiy Elastic search via curl
curl <docker's ip>:9200
Verfiy Elastic search via broswer
Go to: http://<docker's ip>:9200/
Result from Elastic Search Instance on successful ping at the server
Kibana is an open source data visualization dashboard for Elasticsearch. It provides visualization capabilities on top of the content indexed on an Elasticsearch cluster. Users can create bar, line and scatter plots, or pie charts and maps on top of large volumes of data. Fianlly all the visualizations can be together put up as a dashboard
A) Commands
1. Verify the build and successful logins into Kibana:
Verfiy Kibana via curl
curl <docker's ip>:5601
Verfiy Kibana via broswer
Go to: http://<docker's ip>:5601/
Result from Kibana Instance on successful ping at the server
The user interface of Kibana module. Check for "kibana homepage.png" inside the folder - part 3.
2. Push 1 Millions records into Kibana via Elastic Search