375 add parameter to build and run model res workflow to run ingest on batch#380
Conversation
| libudunits2-dev python3-dev python3-pip python3-venv libgdal-dev \ | ||
| libgeos-dev libproj-dev libfontconfig1-dev libharfbuzz-dev \ | ||
| libfribidi-dev pandoc curl gdebi-core && \ | ||
| libfribidi-dev pandoc curl gdebi-core \ |
There was a problem hiding this comment.
This is needed to install renv::igraph
| # Establish Athena connection | ||
| AWS_ATHENA_CONN_NOCTUA <- dbConnect( | ||
| noctua::athena(), | ||
| s3_staging_dir = "s3://ccao-athena-results-us-east-1/", |
There was a problem hiding this comment.
It errors without this when run through actions and doesn't seem to do anything when run locally.
There was a problem hiding this comment.
[Praise] Yup, this is right! It doesn't error out locally in the absence of this param because we instruct team members to set the AWS_ATHENA_S3_STAGING_DIR env var in their root-level .Renviron file, which Noctua knows to use as a fallback. Setting this parameter directly in the dbConnect() function will override that root-level environment variable when we run this script locally, but I think it's fine because we rarely change this query location.
jeancochrane
left a comment
There was a problem hiding this comment.
Nice work getting this over the line! I've tested workflows myself both with and without repro_ingest and it seems to work correctly both ways.
| libudunits2-dev python3-dev python3-pip python3-venv libgdal-dev \ | ||
| libgeos-dev libproj-dev libfontconfig1-dev libharfbuzz-dev \ | ||
| libfribidi-dev pandoc curl gdebi-core && \ | ||
| libfribidi-dev pandoc curl gdebi-core \ |
| # Establish Athena connection | ||
| AWS_ATHENA_CONN_NOCTUA <- dbConnect( | ||
| noctua::athena(), | ||
| s3_staging_dir = "s3://ccao-athena-results-us-east-1/", |
There was a problem hiding this comment.
[Praise] Yup, this is right! It doesn't error out locally in the absence of this param because we instruct team members to set the AWS_ATHENA_S3_STAGING_DIR env var in their root-level .Renviron file, which Noctua knows to use as a fallback. Setting this parameter directly in the dbConnect() function will override that root-level environment variable when we run this script locally, but I think it's fine because we rarely change this query location.
Co-authored-by: Jean Cochrane <jeancochrane@users.noreply.github.com>
Co-authored-by: Jean Cochrane <jeancochrane@users.noreply.github.com>
…es-workflow-to-run-ingest-on-batch
Quick fix to point back to the main branch #380 Workflow to test https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1#logsV2:log-groups/log-group/%2Faws%2Fbatch%2Fjob/log-events/z_ci_tidy-up-model-run-pr_ccao-data-model-res-avm%2Fdefault%2F451b4108c4a14d61822bee98d3a0ba37 - I did have to unfreeze ingest to run it.
This currently errors out on dev branches. Is it worth it to provide permissions for all dev branches to ensure new features can be added without them going into our main datalake?
A test was run by removing prox_nearest_metra_route_dist_ft as an input variable in the ingest script. As desired, it returned null values in assessment_card
Final run with up to date changes
https://github.com/ccao-data/model-res-avm/actions/runs/15330192847