Skip to content

dimon777/pandas_nyc_311

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Pandas demo on NYC 311 datasets

This demonstraits:

  • Creating AWS instance with Terraform
  • Using jupyter notebook to execute simple analytic pipeline on 311 dataset

Step 1: Provision AWS instance with Terraform (this is done on local machine)

Users -> select your username -> Security credentials -> Create Access Key -> Record access key id and secret_access_key
  • Add yourself to Admin group:
Users -> select your username -> Add permissions -> Add to AdministratorAccess group
  • Edit files to add required keys and secrets:
~/.aws/credentials
[default]
aws_access_key_id = YOUR_KEY
aws_secret_access_key = YOUR_SECRET

~/.aws/config
[default]
region=us-east-1
git clone https://github.com/dimon777/pandas_nyc_311
  • Install Terraform

  • Create AWS instance and security group with Terraform

cd terraform
terraform init
terraform apply
  • In AWS Console add instance into new Security Group (311_access)
-> Instance -> Actions -> Netoworking -> Change Security Groups -> 311_access
-> Create Key Pair -> Name: aws_key.pem -> Note location of the pem file: `aws_key.pem`
  • Verify SSH connection to your instance
ssh -i "aws_key.pem" ubuntu@<instance public IP>

Step 2: Execute simple data pipeline (this is done on AWS instance)

git clone https://github.com/dimon777/pandas_nyc_311

Requirements:

  • Install required O/S and Python packages
sudo apt install python3-pip
pip3 install -r requirements.txt --user
  • Start Jupyter, navigate to notebook url and execute notebook
jupyter notebook --ip=0.0.0.0

About

Pandas demo on NYC 311 datasets

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published