Skip to content

375 add parameter to build and run model res workflow to run ingest on batch#380

Merged
Damonamajor merged 34 commits intomasterfrom
375-add-parameter-to-build-and-run-model-res-workflow-to-run-ingest-on-batch
May 30, 2025
Merged

375 add parameter to build and run model res workflow to run ingest on batch#380
Damonamajor merged 34 commits intomasterfrom
375-add-parameter-to-build-and-run-model-res-workflow-to-run-ingest-on-batch

Conversation

@Damonamajor
Copy link
Contributor

@Damonamajor Damonamajor commented May 28, 2025

This currently errors out on dev branches. Is it worth it to provide permissions for all dev branches to ensure new features can be added without them going into our main datalake?

A test was run by removing prox_nearest_metra_route_dist_ft as an input variable in the ingest script. As desired, it returned null values in assessment_card


SELECT
meta_pin, prox_nearest_metra_route_dist_ft
from assessment_card
where run_id = '2025-05-29-agitated-sam'
limit 10

Final run with up to date changes
https://github.com/ccao-data/model-res-avm/actions/runs/15330192847

@Damonamajor Damonamajor linked an issue May 28, 2025 that may be closed by this pull request
7 tasks
@Damonamajor Damonamajor self-assigned this May 29, 2025
libudunits2-dev python3-dev python3-pip python3-venv libgdal-dev \
libgeos-dev libproj-dev libfontconfig1-dev libharfbuzz-dev \
libfribidi-dev pandoc curl gdebi-core && \
libfribidi-dev pandoc curl gdebi-core \
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is needed to install renv::igraph

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Praise] Nice catch!

# Establish Athena connection
AWS_ATHENA_CONN_NOCTUA <- dbConnect(
noctua::athena(),
s3_staging_dir = "s3://ccao-athena-results-us-east-1/",
Copy link
Contributor Author

@Damonamajor Damonamajor May 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It errors without this when run through actions and doesn't seem to do anything when run locally.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Praise] Yup, this is right! It doesn't error out locally in the absence of this param because we instruct team members to set the AWS_ATHENA_S3_STAGING_DIR env var in their root-level .Renviron file, which Noctua knows to use as a fallback. Setting this parameter directly in the dbConnect() function will override that root-level environment variable when we run this script locally, but I think it's fine because we rarely change this query location.

@Damonamajor Damonamajor marked this pull request as ready for review May 29, 2025 18:15
Copy link
Member

@jeancochrane jeancochrane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work getting this over the line! I've tested workflows myself both with and without repro_ingest and it seems to work correctly both ways.

libudunits2-dev python3-dev python3-pip python3-venv libgdal-dev \
libgeos-dev libproj-dev libfontconfig1-dev libharfbuzz-dev \
libfribidi-dev pandoc curl gdebi-core && \
libfribidi-dev pandoc curl gdebi-core \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Praise] Nice catch!

# Establish Athena connection
AWS_ATHENA_CONN_NOCTUA <- dbConnect(
noctua::athena(),
s3_staging_dir = "s3://ccao-athena-results-us-east-1/",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Praise] Yup, this is right! It doesn't error out locally in the absence of this param because we instruct team members to set the AWS_ATHENA_S3_STAGING_DIR env var in their root-level .Renviron file, which Noctua knows to use as a fallback. Setting this parameter directly in the dbConnect() function will override that root-level environment variable when we run this script locally, but I think it's fine because we rarely change this query location.

Damonamajor and others added 3 commits May 30, 2025 11:35
Co-authored-by: Jean Cochrane <jeancochrane@users.noreply.github.com>
Co-authored-by: Jean Cochrane <jeancochrane@users.noreply.github.com>
@Damonamajor Damonamajor merged commit a930a3a into master May 30, 2025
6 checks passed
@Damonamajor Damonamajor deleted the 375-add-parameter-to-build-and-run-model-res-workflow-to-run-ingest-on-batch branch May 30, 2025 18:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add parameter to build-and-run-model res workflow to run ingest on Batch

2 participants