Skip to content

Develop Tooling for Dev VM's #5196

@zschira

Description

@zschira

Description

This issue will track developing tooling to help us run dev vm's to avoid local resource constraints when running large workloads.

Motivation

Despite our work improving of the ETL with respect to both memory usage and compute time, many of us still struggle to run the full ETL on our personal computers. This creates friction when testing changes that impact the entire ETL, or when trying to update row counts. This friction eats into developer time, which is far more expensive than compute time, which is why we feel it's worthwhile to pay for vm's in these circumstances.

By developing standard tooling for managing the lifecycle / connecting to a dev vm, we can make the experience seamless for developers and improve reproducibility of our development and testing of our development processes.

Scope

  • Add a GCP compute instance template to terraform
  • Add a new GHA workflow to build a docker image on every commit using the existing Dockerfile
  • Create dev CLI to create / launch a VM then run dagster using docker image associated with a specific commit
  • Create dev CLI command to stop / destroy the VM
  • Add a command to dev CLI to connect to the VM which will use ssh tunneling to forward the dagster port to localhost
  • Add another command (or an options for the above command) to interface with the data outputs on the vm. Interface options could include:
    • Shell
    • Jupyter
    • duckdb (I've been using the duckdb UI locally and could be a really good fit for this

Metadata

Metadata

Assignees

Labels

cloudStuff that has to do with adapting PUDL to work in cloud computing context.developer experienceThings that make the developers' lives easier, but don't necessarily directly improve the data.epicAny issue whose primary purpose is to organize other issues into a group.performanceMake PUDL run faster!

Type

No type

Projects

Status

Epic

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions