-
Notifications
You must be signed in to change notification settings - Fork 17
Description
These days, most of our model improvements involve some sort of change to features in the input data, but the structure of our build-and-run-model workflow does not allow us to easily test new input data when triggering a model run on AWS Batch. The main problem is that the workflow is configured to run the Batch job using dvc pull to pull input data, meaning any training data for a Batch model run needs to already be uploaded to DVC before you can run a model on Batch.
We can resolve this limitation by adding a new parameter to the build-and-run-model workflow that allows us to optionally run the ingest stage of the model to generate input data, instead of pulling the data from DVC.
This task requires some non-trivial changes to GitHub workflows, so let me know when you're ready to pick it up and we can walk through it together.
The steps here include:
- Add a new
commandinput variable to thebuild-and-run-batch-jobworkflow in theccao-data/actionsrepo that is a non-required string- See the docs for a refresher on input variables
- The variable should have a short
description, should not berequired, and should have thestringtype
- Update the "Submit new Batch job" step to optionally add a "command" key to the container overrides if a command was passed to the workflow
- Tweak the input variables for the
build-and-run-modelworkflow to add a new variablerepro_ingest- The variable should have a short
description, should not berequired, should have thebooleantype, and should default tofalse
- The variable should have a short
- Add a new job to the
jobsconfig inbuild-and-run-modelcalledparse-command- This job should read the
repro_ingestinput variable and generate acommandvalue depending on its value, and pass the command to the output:- If
true, the value should be"dvc unfreeze ingest && dvc repro" - If
false, the value should be empty
- If
- This job should read the
- Update the
build-and-run-modeljob in thebuild-and-run-modelworkflow to pass thecommandoutput generated by theparse-commandjob to a newcommandargument- You'll also need to update this job to depend on the new
parse-commandjob, so that it can read the output ofparse-command
- You'll also need to update this job to depend on the new
- Update the
useskey inbuild-and-run-modelto sub out the@masterref and point to your branch inccao-data/actionsthat has the changes to thebuild-and-run-batch-jobworkflow- This may seem duplicative, but you also need to add a
refargument pointing to your branch name under thewithkey in order for the workflow to properly pull code from your branch when it uses thebuild-and-run-batch-jobworkflow
- This may seem duplicative, but you also need to add a
- Run a test model run and confirm that you can get it to repro the ingest along with the other model stages
This approach to overriding the batch job command_should_ Just Work, since the docs for the --container-overrides arg to the Batch submit-job function suggest that you can override a container command using the key command, but I haven't tested it before and I wouldn't be surprised if there are special tricks. If it doesn't work, let me know and we'll pair on it, since debugging Batch failures is confusing