Skip to content

Example of an end-to-end TensorFlow pipeline using the Chicago taxi trips dataset (BigQuery) to improve its service. This demo requires the use of either Vertex AI or Kubeflow (deployed on GKE), with data pre-processing performed using Dataflow, BigQuery, or Dataproc.

Notifications You must be signed in to change notification settings

tryolabs/google-partners-demo-1

Repository files navigation

Google Partners Capabilities Assessment - Demo 1

This repository showcases an end-to-end TensorFlow pipeline to train a ML model for predicting taxi trip duration. The "Chicago Taxi Trips" Kaggle dataset was used for training and evaluation.

Table of contents

Getting started

To get your environment set up and start using this model, please follow the step-by-step instructions provided below.

Step 1: Clone the Repository

First, you'll need to clone this repository to your local machine or development environment. Open your terminal, navigate to the directory where you want to clone the repository, and run the following command:

git clone <repository-url> Replace <repository-url> with the actual URL of this repository. Once cloned, navigate into the repository's directory with cd <repository-name>.

Step 2: Create a .env File

Within the root directory of the cloned repository, create a .env file to store your project configurations. This file should include the following environment variables tailored to your project:

PROJECT_ID=<project_id>
REGION=<region> # example: us-central1

Make sure to replace with your specific project details.

Step 3: Install the Google Cloud CLI

To interact with Google Cloud resources, you need to install the Google Cloud Command Line Interface (CLI) on your system. Follow the detailed installation instructions provided in the official documentation here.

Step 4: Set Up Application Default Credentials

To be able to authenticate to Google Cloud services from your development environment, configure the Application Default Credentials (ADC) by following the guide here.

Step 5: Set up credentials file to use BigQuery

Follow the guide here, the section Create a service account key. To create a json file which stores the service account credentials.

Step 6: Install dependencies

  1. Install python 3.9.5, using pyenv install 3.9.5
  2. Install python dependencies with poetry install

Executing the pipeline

Executing the pipeline takes care of ingesting and transforming data, hyperparameter tuning and model training, evaluating the model and registering to the model registry, creating a Vertex endpoint, and deploying the model to make it available for predictions. To run the pipeline using Vertex AI use the following command:

python chicago_taxis/kubeflow_v2_runner.py

Evaluation and model analysis

When executing the pipeline there MAPE on the test set will be logged in a file in GCP. If the MAPE beats the current best (for the blessed model) then the trained model will be registered in GCP and the MAPE will be registered as well.

Making predictions

The prediction process is facilitated through the predict.py script by running:

python chicago_taxis/predict.py \
 --endpoint_id <endpoint_id> \
 --input_file chicago_taxis/data/prediction_sample.json

Contributing

Our project embraces a streamlined workflow that ensures high-quality software development and efficient collaboration among team members. To maintain this standard, we follow a specific branching strategy and commit convention outlined in our CONTRIBUTING.md file.

We highly encourage all contributors to familiarize themselves with these guidelines. Adhering to the outlined practices helps us keep our codebase organized, facilitates easier code reviews, and accelerates the development process. For detailed information on our branching strategy and how we commit changes, please refer to the CONTRIBUTING.md file.

About

Example of an end-to-end TensorFlow pipeline using the Chicago taxi trips dataset (BigQuery) to improve its service. This demo requires the use of either Vertex AI or Kubeflow (deployed on GKE), with data pre-processing performed using Dataflow, BigQuery, or Dataproc.

Resources

Contributing

Stars

Watchers

Forks

Packages

No packages published

Contributors 7