Skip to content

Commit a930a3a

Browse files
Add parameter to build and run model res workflow to run ingest on batch (#380)
This currently errors out on dev branches. Is it worth it to provide permissions for all dev branches to ensure new features can be added without them going into our main datalake? A test was run by removing prox_nearest_metra_route_dist_ft as an input variable in the ingest script. As desired, it returned null values in assessment_card ``` SELECT meta_pin, prox_nearest_metra_route_dist_ft from assessment_card where run_id = '2025-05-29-agitated-sam' limit 10 ``` Final run with up to date changes https://github.com/ccao-data/model-res-avm/actions/runs/15330192847 --------- Co-authored-by: Jean Cochrane <[email protected]> Co-authored-by: Jean Cochrane <[email protected]>
1 parent 9045e7c commit a930a3a

File tree

4 files changed

+30
-2
lines changed

4 files changed

+30
-2
lines changed

.dockerignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@ README.md
33
renv/profile
44
docs/
55
input/
6+
!input/.gitkeep
67
renv/library/
78
renv/sandbox/
89
renv/staging/

.github/workflows/build-and-run-model.yaml

Lines changed: 23 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -51,11 +51,31 @@ on:
5151
description: Calculate SHAP values
5252
default: false
5353
required: true
54+
repro_ingest:
55+
type: boolean
56+
description: Run ingest stage before running model
57+
default: false
58+
required: true
5459
push:
5560
branches: [master, "*-assessment-year"]
5661

5762
jobs:
63+
parse-command:
64+
runs-on: ubuntu-latest
65+
outputs:
66+
command: ${{ steps.set.outputs.command }}
67+
steps:
68+
- name: Determine DVC command
69+
id: set
70+
shell: bash
71+
run: |
72+
if [[ "${{ inputs.repro_ingest }}" == "true" ]]; then
73+
echo "command=dvc unfreeze ingest && dvc repro" >> $GITHUB_OUTPUT
74+
else
75+
echo "command=" >> $GITHUB_OUTPUT
76+
fi
5877
build-and-run-model:
78+
needs: parse-command
5979
permissions:
6080
# contents:read and id-token:write permissions are needed to interact
6181
# with GitHub's OIDC Token endpoint so that we can authenticate with AWS
@@ -65,8 +85,10 @@ jobs:
6585
# required in order to allow the reusable called workflow to push to
6686
# GitHub Container Registry
6787
packages: write
68-
uses: ccao-data/actions/.github/workflows/build-and-run-batch-job.yaml@main
88+
uses: ccao-data/actions/.github/workflows/build-and-run-batch-job.yaml@Add-parameter-to-build-and-run-model-res-workflow-to-run-ingest-on-Batch
6989
with:
90+
ref: Add-parameter-to-build-and-run-model-res-workflow-to-run-ingest-on-Batch
91+
command: ${{ needs.parse-command.outputs.command }}
7092
backend: "ec2"
7193
vcpu: "40"
7294
memory: "158000"

Dockerfile

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,8 @@ RUN apt-get update && \
1616
libcurl4-openssl-dev libssl-dev libxml2-dev libgit2-dev git \
1717
libudunits2-dev python3-dev python3-pip python3-venv libgdal-dev \
1818
libgeos-dev libproj-dev libfontconfig1-dev libharfbuzz-dev \
19-
libfribidi-dev pandoc curl gdebi-core && \
19+
libfribidi-dev pandoc curl gdebi-core \
20+
libglpk-dev libglpk40 && \
2021
rm -rf /var/lib/apt/lists/*
2122

2223
# Install Quarto
@@ -30,12 +31,14 @@ RUN pip install --no-cache-dir dvc[s3]
3031
# Copy R bootstrap files into the image
3132
COPY renv.lock .Rprofile DESCRIPTION requirements.txt ./
3233
COPY renv/profiles/reporting/renv.lock reporting-renv.lock
34+
COPY renv/profiles/dev/renv.lock dev-renv.lock
3335
COPY renv/ renv/
3436

3537
# Install R dependencies. Restoring renv first ensures that it's
3638
# using the same version as recorded in the lockfile
3739
RUN Rscript -e 'renv::restore(packages = "renv"); renv::restore()'
3840
RUN Rscript -e 'renv::restore(lockfile = "reporting-renv.lock")'
41+
RUN Rscript -e 'renv::restore(lockfile = "dev-renv.lock")'
3942

4043
# Set the working directory to the model directory
4144
WORKDIR /model-res-avm/

pipeline/00-ingest.R

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,8 @@ noctua_options(unload = TRUE)
2525
# Establish Athena connection
2626
AWS_ATHENA_CONN_NOCTUA <- dbConnect(
2727
noctua::athena(),
28+
s3_staging_dir = "s3://ccao-athena-results-us-east-1/",
29+
region_name = "us-east-1",
2830
rstudio_conn_tab = FALSE
2931
)
3032

0 commit comments

Comments
 (0)