Skip to content

Switch to using S3 subfolders for DVC #407

@jeancochrane

Description

@jeancochrane

Currently we use the same exact DVC S3 remote for res and condo model input data. This means that the archived input data is mixed together for both models.

Per the docs, we should add a key to the end of the S3 remotes for both the res and condo model that is specific to each model, e.g. /model-res-avm or /model-condo-avm. That should help keep the DVC bucket organized going forward, making it easier to figure out which DVC files relate to which model.

Note that it is technically possible for us to migrate all of our old input data to the new keys, but I think that would take a ton of manual work and I don't actually think it's worth the effort now that we have Athena tables for final model training data (ccao-data/data-architecture#804).

Two additional tasks here:

  • Create a new function in the ccao Python and R packages to load input data from a DVC hash and year, so that we can encapsulate the logic that switches on different years
  • Check to make sure that you can still check out older model years and dvc pull

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions