Skip to content

Latest commit

 

History

History
49 lines (36 loc) · 2.79 KB

File metadata and controls

49 lines (36 loc) · 2.79 KB

Accessing INRIX Data for AWS x INRIX Hack 2025

This repo demonstrates how to fetch data from an S3 Bucket for training AI/ML Models.

1) Setting up your Python environment

First, we need to create a virtual environment for Python, which essentially installs your packages to one project instead of globally

1a) Installing the required packages

  • Create a virtual environment by running the following in the terminal directory of your project
    • Windows: py -m venv venv
    • Mac/Linux: python3 -m venv venv
  • Then, you need to actually use the environment, so type the following
    • Windows: venv\Scripts\Activate.ps1
    • Mac/Linux: source ./venv/bin/activate
  • Your terminal should have (venv) at the beginning of it now
    • I would recommend setting your VSCode interpreter to the venv path in the bottom right when a Python file is open
  • I have made a requirements.txt to install all of the packages you will need, so run the following to install them
    • pip install -r requirements.txt

1b) Putting the AWS credentials in the right place

In order to not push your keys to GitHub, we need to put them in a .env file

  • Open that file and replace YOUR_ACCESS_KEY and YOUR_SECRET_KEY with the ones you copied earlier
  • Go to the .gitignore file and uncomment (remove the #) from line 4 that says .env
    • A .gitignore file tells git to not push certain files and keep them local to computer
    • Remember, your AWS credentials are EXTREMELY CONFIDENTIAL, hence why we do this
  • Now you are ready to call AWS services!

2) Getting sample data to train an AI/ML Model

The INRIX data is in an S3 bucket hosted on AWS via an ACM AWS account, so this can easily be used as a dataset for SageMaker or Bedrock all through AWS!

  • If you need the files however, s3access.py and s3download.py are useful for you
  • The bucket name should've been shared with you (WHICH IS SENSITIVE INFO)
    • Ask the #help channel if you don't have it
  • Simply run the files that you need
    • s3access to see the files
    • s3downlaod to download a specific file

This dataset contains images of traffic for 24 hours, every 5 minutes, on 24 cameras in Seattle. This is great for training an AI model on detection of many different things regarding traffic!