Update CI vulnerability workflow to reduce how often the docker image is built by lisac · Pull Request #196 · navapbc/platform-test

lisac · 2025-04-15T15:14:09Z

Ticket

n/a. Implements a proposed improvement documented in workflow vulnerability-scans.yml, so that we avoid multiple jobs within the workflow building the app docker image.

Changes

Updates the vulnerability-scans.yml workflow so that the a docker image of the app is built once and cached for use as needed by the jobs for each of the vulnerability scans configured in that workflow, rather than having each of those jobs build the app docker image.

This build and cache logic is in a new composite action: actions/build-release-candidate.

This composite workflow is additionally applied to the build-and-publish.yml workflow. Whereas before, this workflow would run make release-build, it now calls the new composite action.

This PR is inspired by an implementation authored by @daphnegold . navapbc/template-infra#936 describes another element of their implementation that might be of interest for template-infra.

Context for reviewers

I am very regretful about not starting with a tech spec for this work!
Lessons were learned.

Testing

[TODO]

Preview environment for app

♻️ Environment destroyed ♻️

Preview environment for app-rails

♻️ Environment destroyed ♻️

while informative, we aren't taking any action based on the results

the underlying repo redirects to 'wardencommunity/warden' and is not on rubydoc.info

lorenyu

Thanks! This will be a great improvement. I have some design feedback. I think we should explore making this a composite action instead of a job, and clean up the ordering of the steps to be more streamlined (check for the image first, then restore the buildx layers). Also found a bug where the buildx layers aren't caching properly, (you need to modify the release-build make command to support OPTS)

.github/workflows/vulnerability-scans.yml

lorenyu · 2025-04-16T22:42:19Z

.github/workflows/vulnerability-scans.yml

+          OPTIONAL_BUILD_FLAGS=" \
+          --cache-from=type=local,src=/tmp/.buildx-cache \
+          --cache-to=type=local,dest=/tmp/.buildx-cache"


This doesn't do anything, OPTIONAL_BUILD_FLAGS is not an option in make release-build which is causing the layers not to be caching properly

Once this is fixed you should test by triggering multiple builds (with different commit hashes) (for example by modifying the example app or adding a command to the Dockerfile) and showing that subsequent builds are much faster since most of the layers are already cached.

.github/workflows/vulnerability-scans.yml

## Ticket n/a ## Changes Updates a documentation link that no longer resolves. https://www.rubydoc.info/github/hassox/warden gives a 404. The underlying https://github.com/hassox/warden redirects to https://github.com/wardencommunity/warden. Updated the documentation to use the latter's wiki (https://github.com/wardencommunity/warden/wiki) ## Testing navapbc/platform-test#196 (shows the CI check that flagged the broken link)

Co-authored-by: Loren Yu <loren@navapbc.com>

lisac · 2025-04-17T13:30:56Z

@lorenyu: your notes here make sense to me. Thanks for taking the time to walk through this.
I'm not confident about having revisions reviewable this week - next week more likely - will re-request a review when I have those changes in. Thanks again!

and adjust the inputs for the various cache actions: 1) i don't think we want to rely on restore-keys, and 2) we re-use the docker image name as the cache key (maybe?)

why: the dockle scan should suffice for testing the build job that's the subject of this branch

… to building and caching a docker image of the app. (running into issues with the buildx caching)

…r image

…t to find the docker image in the cache

…he docker image" technically, it is an option to delay checking out the repo, however we'd need to duplicate the logic from the Makefile around the expected image name. i think not worth it. This reverts commit 95a8a18.

…osite action

needed in order to configure aws credentials

lisac · 2025-05-02T23:25:15Z

status: making progress, ran into what i think is a fixable bug and will revisit next week.

good: moved the build logic into a composite action used by build-and-publish and vulnerability-scans; observed desired concurrency behavior, to avoid multiple workflows executing the build action concurrently
bug: the cache lookup is not behaving as i expected. observed: the two workflows (build-and-publish and vulnerability-scans) apparently used different SHAs in their cache keys, whereas i expected the cache keys to be identical. they keys they used:
- platform-test-app:1a6cc698c726109c799bfca714a2a48496e76cd3
- platform-test-app:ba457e9ca8cc127a1361b1fc72e4cf56f9c6105f

next steps:

fix the bug and remove debugging
update the Issue's description
re-request PR reviews

in the step before calling the build-release-candidate action

the build triggered through the vulnerability workflow is still coming up with a different SHA from the build triggered through build-and-publish.

lorenyu

I realize you're still working on this, but i saw you left a comment so i decided to poke and left some comments. feel free to ignore if they are a distraction but hopefully they help

lorenyu · 2025-05-06T23:07:35Z

.github/workflows/build-and-publish.yml

          echo "Is image published: $is_image_published"
          echo "is_image_published=$is_image_published" >> "$GITHUB_OUTPUT"

-      - name: Build release


what's the reason for making this a separate job?

lorenyu · 2025-05-06T23:10:09Z

.github/workflows/build-and-publish.yml

+    name: Check whether the image is already published
    runs-on: ubuntu-latest
    needs: get-commit-hash
-    concurrency: build-and-publish-${{ inputs.app_name }}-${{ needs.get-commit-hash.outputs.commit_hash }}


this concurrency is important to keep. the concurrency statement later on that uses github.ref won't work since different github.refs can refer to the same commit hash (e.g. main, origin/main, HEAD, , ) can all be valid refs that point to the same commit hash, so there'd be a race condition where multiple jobs are trying to build the same commit hash but don't realize it since they are referencing the commit via different refs. that's why we have the separate job beforehand that gets the commit hash.

lorenyu · 2025-05-06T23:10:57Z

.github/workflows/build-and-publish.yml

+      - name: Restore cached Docker image
+        uses: actions/cache/restore@v4
+        with:
+          path: /tmp/docker-image.tar
+          key: ${{ steps.build-release-candidate.outputs.image_cache_key }}
+          fail-on-cache-miss: true
+
+      - name: Load cached Docker image
+        run: |
+          docker load < /tmp/docker-image.tar


can we put these steps into actions/build-release-candidate

lorenyu · 2025-05-06T23:11:39Z

.github/workflows/ci-app-rails-vulnerability-scans.yml

    uses: ./.github/workflows/vulnerability-scans.yml
    with:
      app_name: "app-rails"
+      ref: ${{ github.ref }}


is this needed?

lorenyu · 2025-05-06T23:14:32Z

.github/actions/build-release-candidate/action.yml

+      with:
+        path: /tmp/docker-image.tar
+        key: ${{ steps.create-image-identifier.outputs.image }}
+        lookup-only: true


what's the reason for only doing a lookup? i think we'd want to download the image if it's already built

lorenyu · 2025-05-06T23:15:33Z

.github/actions/build-release-candidate/action.yml

+    - name: Cache Docker image
+      if: steps.check-image-already-exists.outputs.cache-hit != 'true'
+      uses: actions/cache/save@v4
+      with:
+        path: /tmp/docker-image.tar
+        key: ${{ steps.create-image-identifier.outputs.image }}


this step shouldn't be needed. cache should save automatically when the job completes

lorenyu · 2025-05-06T23:18:46Z

.github/actions/build-release-candidate/action.yml

+        key: ${{ steps.create-image-identifier.outputs.image }}
+        lookup-only: true
+
+    - name: Build and tag Docker image for scanning


Not necessarily in scope for this PR if it is getting complex, but in an older version of this PR you had code that also cached the intermediate docker layers in /tmp/.buildx-cache which could dramatically speed up builds even when it's a cache miss since some of the intermediate layers will be cached

Yes, you're right. I removed that from my branch because I was overwhelmed by all the other issues I was running into. At this point - I want to get the PR closed! - I think I won’t attempt to get that feature working. Should I propose it as a new issue? or add it as a comment to navapbc/template-infra#206 ?

Sure go ahead and create a new issue and link it here

navapbc/template-infra#936

lisac · 2025-05-07T21:05:32Z

@lorenyu : I appreciate these notes - thank you. For what you’ve flagged, most of those were knowledge gaps on my part, or me thinking too narrowly around the vulnerability scans workflow. I’ll be following your notes to get this PR cleaned up.

lorenyu · 2025-05-07T22:46:07Z

@lisac sounds good! happy to help if you run into any issues

instead of having each of the caller workflows follow up with those steps

why: we're not gaining efficiency by running the lookup-only mode as its own step

…alling workflow see the image?

…e code and workflow

… remove the sleep

…-and-publish.yml

…tion

…the commit hash see #196 (comment) multiple github.refs may evaluate to the same commit hash

…rability-scans.yml

lisac · 2025-05-14T13:14:41Z

Closing without merging. While I like this PR's change to the vulnerability workflow stylistically, it doesn't meaningfully improve the developer experience. Consider that the jobs in the vulnerability workflow run in parallel, thus whether each job builds the docker image before executing a particular vulnerability scan (version in main) versus introducing a job specific to building the image and having the other jobs wait for this single job (version in this PR), the runtime is roughly the same.

Let's first implement navapbc/template-infra#936, which is the more impactful aspect from Daphne's implementation. That will benefit the vulnerability workflow, and as a follow-on we could re-consider having the vulnerability workflow have a single job specific to building the image.

lisac added 6 commits April 15, 2025 10:46

add job build-and-cache

32071f7

update trivy job to use the previously-built image

352feea

update anchore scan to use the previously-built image

30f2a9c

update dockle job to use the previously-built image

8bebaee

remove check on disk space

47b8168

while informative, we aren't taking any action based on the results

remove debugging around disk space

cdc863f

lisac self-assigned this Apr 15, 2025

fix outdated documentation link

b8e1fb4

the underlying repo redirects to 'wardencommunity/warden' and is not on rubydoc.info

lisac changed the title ~~[draft] Update CI vulnerability workflow to reduce how often the docker image is built~~ Update CI vulnerability workflow to reduce how often the docker image is built Apr 16, 2025

This was referenced Apr 16, 2025

Streamline vulnerability scan workflow to build docker image once navapbc/template-infra#921

Closed

update documentation link to warden navapbc/template-application-rails#88

Merged

lorenyu reviewed Apr 16, 2025

View reviewed changes

Update .github/workflows/vulnerability-scans.yml

409afe0

Co-authored-by: Loren Yu <loren@navapbc.com>

lisac added 11 commits April 17, 2025 12:02

rename job: build-and-cache -> build

ec4acfb

fix order: check that the cache exists before we try to use it

f93ad60

move the cache-hit check to earlier in the job

e615687

and adjust the inputs for the various cache actions: 1) i don't think we want to rely on restore-keys, and 2) we re-use the docker image name as the cache key (maybe?)

(lint fix) specify the path associated with the key we want to look up

0f0985e

oops. need to check out the repo in order to use the Makefile.

8f1f396

skip the trivy and anchore scans (temporarily)

b5d6823

why: the dockle scan should suffice for testing the build job that's the subject of this branch

modify 'release-build' make command to allow for additional build flags

a622e54

simplify. remove logic for caching the buildx layers. reduce the task…

1430fd4

… to building and caching a docker image of the app. (running into issues with the buildx caching)

experiment: don't checkout the repo unless we have to build the docke…

95a8a18

…r image

specify 'fail-on-cache-miss: true' for the subsequent jobs that expec…

057ebb1

…t to find the docker image in the cache

lisac marked this pull request as draft April 21, 2025 19:18

lisac added 3 commits April 28, 2025 18:53

refactor the steps for building and caching the image into a new comp…

e7e8072

…osite action

update the build-and-publish to use the new composite action

0c6fcc6

limit concurrency for the build action

d6367d6

lisac added 2 commits May 2, 2025 19:09

re-try fixing syntax

1d970fc

set id-token to have write permission

ba457e9

needed in order to configure aws credentials

lisac added 4 commits May 4, 2025 16:25

specify the github ref to checkout

2e5552c

in the step before calling the build-release-candidate action

update description text of the composite action

541e272

revert 2e5552c. ineffective.

73e97a0

the build triggered through the vulnerability workflow is still coming up with a different SHA from the build triggered through build-and-publish.

fix format - EOF newline

32333f3

lorenyu reviewed May 6, 2025

View reviewed changes

lisac added 17 commits May 8, 2025 11:44

move 'docker load' to the composite action

9fde625

instead of having each of the caller workflows follow up with those steps

retrieve the image from cache rather than just checking for a cache hit

c58be8b

why: we're not gaining efficiency by running the lookup-only mode as its own step

fix syntax. was missing required property 'shell'

ec69330

debug: if we call 'docker load' from the composite action, will the c…

d05c648

…alling workflow see the image?

debug: observe the environment variables describing the version of th…

ca3da45

…e code and workflow

debugging - check that we get a cache hit after saving the cache, and…

3124e7f

… remove the sleep

create composite action for get-commit-hash, copying logic from build…

cf38b58

…-and-publish.yml

update build-and-publish.yml to call the get-commit-hash composite ac…

82b68ff

…tion

fix concurrency group - github.ref will not be reliable, need to use …

db0f928

…the commit hash see #196 (comment) multiple github.refs may evaluate to the same commit hash

adjust a condition for readability and to be more exact

f4c816e

correct concurrency group to use hash instead of github.ref. in vulne…

ee98361

…rability-scans.yml

debug: the subsequent jobs may need to explicitly run docker load

1cc95b8

fix syntax

800fe0b

fix (id for steps need to be unique, within a workflow)

772ddd4

use existing Makefile command for getting the commit sha

e56d5fa

remove debugging, update some descriptive properties (name, description)

dc57850

restore original code for getting the commit hash

c211561

lisac closed this May 14, 2025

lisac deleted the lisac/ci-vulnerability-scans-build-image-once branch October 8, 2025 11:02

Conversation

lisac commented Apr 15, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Ticket

Changes

Context for reviewers

Testing

Preview environment for app

Preview environment for app-rails

Uh oh!

lorenyu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

lisac commented Apr 17, 2025

Uh oh!

lisac commented May 2, 2025

Uh oh!

lorenyu left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lisac commented May 7, 2025

Uh oh!

lorenyu commented May 7, 2025

Uh oh!

lisac commented May 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

lisac commented Apr 15, 2025 •

edited by github-actions bot

Loading