-
Notifications
You must be signed in to change notification settings - Fork 17
Description
Currently we use the same exact DVC S3 remote for res and condo model input data. This means that the archived input data is mixed together for both models.
Per the docs, we should add a key to the end of the S3 remotes for both the res and condo model that is specific to each model, e.g. /model-res-avm or /model-condo-avm. That should help keep the DVC bucket organized going forward, making it easier to figure out which DVC files relate to which model.
Note that it is technically possible for us to migrate all of our old input data to the new keys, but I think that would take a ton of manual work and I don't actually think it's worth the effort now that we have Athena tables for final model training data (ccao-data/data-architecture#804).
Two additional tasks here:
- Create a new function in the
ccaoPython and R packages to load input data from a DVC hash and year, so that we can encapsulate the logic that switches on different years - Check to make sure that you can still check out older model years and
dvc pull