Skip to content

Add parameter to build-and-run-model res workflow to run ingest on Batch #375

@jeancochrane

Description

@jeancochrane

These days, most of our model improvements involve some sort of change to features in the input data, but the structure of our build-and-run-model workflow does not allow us to easily test new input data when triggering a model run on AWS Batch. The main problem is that the workflow is configured to run the Batch job using dvc pull to pull input data, meaning any training data for a Batch model run needs to already be uploaded to DVC before you can run a model on Batch.

We can resolve this limitation by adding a new parameter to the build-and-run-model workflow that allows us to optionally run the ingest stage of the model to generate input data, instead of pulling the data from DVC.

This task requires some non-trivial changes to GitHub workflows, so let me know when you're ready to pick it up and we can walk through it together.

The steps here include:

  • Add a new command input variable to the build-and-run-batch-job workflow in the ccao-data/actions repo that is a non-required string
    • See the docs for a refresher on input variables
    • The variable should have a short description, should not be required, and should have the string type
  • Update the "Submit new Batch job" step to optionally add a "command" key to the container overrides if a command was passed to the workflow
  • Tweak the input variables for the build-and-run-model workflow to add a new variable repro_ingest
    • The variable should have a short description, should not be required, should have the boolean type, and should default to false
  • Add a new job to the jobs config in build-and-run-model called parse-command
    • This job should read the repro_ingest input variable and generate a command value depending on its value, and pass the command to the output:
      • If true, the value should be "dvc unfreeze ingest && dvc repro"
      • If false, the value should be empty
  • Update the build-and-run-model job in the build-and-run-model workflow to pass the command output generated by the parse-command job to a new command argument
    • You'll also need to update this job to depend on the new parse-command job, so that it can read the output of parse-command
  • Update the uses key in build-and-run-model to sub out the @master ref and point to your branch in ccao-data/actions that has the changes to the build-and-run-batch-job workflow
    • This may seem duplicative, but you also need to add a ref argument pointing to your branch name under the with key in order for the workflow to properly pull code from your branch when it uses the build-and-run-batch-job workflow
  • Run a test model run and confirm that you can get it to repro the ingest along with the other model stages

This approach to overriding the batch job command_should_ Just Work, since the docs for the --container-overrides arg to the Batch submit-job function suggest that you can override a container command using the key command, but I haven't tested it before and I wouldn't be surprised if there are special tricks. If it doesn't work, let me know and we'll pair on it, since debugging Batch failures is confusing

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions