Skip to content

Update CI vulnerability workflow to reduce how often the docker image is built#196

Closed
lisac wants to merge 52 commits intomainfrom
lisac/ci-vulnerability-scans-build-image-once
Closed

Update CI vulnerability workflow to reduce how often the docker image is built#196
lisac wants to merge 52 commits intomainfrom
lisac/ci-vulnerability-scans-build-image-once

Conversation

@lisac
Copy link
Copy Markdown

@lisac lisac commented Apr 15, 2025

Ticket

n/a. Implements a proposed improvement documented in workflow vulnerability-scans.yml, so that we avoid multiple jobs within the workflow building the app docker image.

Changes

Updates the vulnerability-scans.yml workflow so that the a docker image of the app is built once and cached for use as needed by the jobs for each of the vulnerability scans configured in that workflow, rather than having each of those jobs build the app docker image.

This build and cache logic is in a new composite action: actions/build-release-candidate.

This composite workflow is additionally applied to the build-and-publish.yml workflow. Whereas before, this workflow would run make release-build, it now calls the new composite action.

This PR is inspired by an implementation authored by @daphnegold . navapbc/template-infra#936 describes another element of their implementation that might be of interest for template-infra.

Context for reviewers

I am very regretful about not starting with a tech spec for this work!
Lessons were learned.

Testing

[TODO]

Preview environment for app

♻️ Environment destroyed ♻️

Preview environment for app-rails

♻️ Environment destroyed ♻️

@lisac lisac self-assigned this Apr 15, 2025
the underlying repo redirects to 'wardencommunity/warden' and is not on rubydoc.info
@lisac lisac changed the title [draft] Update CI vulnerability workflow to reduce how often the docker image is built Update CI vulnerability workflow to reduce how often the docker image is built Apr 16, 2025
Copy link
Copy Markdown
Collaborator

@lorenyu lorenyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! This will be a great improvement. I have some design feedback. I think we should explore making this a composite action instead of a job, and clean up the ordering of the steps to be more streamlined (check for the image first, then restore the buildx layers). Also found a bug where the buildx layers aren't caching properly, (you need to modify the release-build make command to support OPTS)

Comment on lines +84 to +86
OPTIONAL_BUILD_FLAGS=" \
--cache-from=type=local,src=/tmp/.buildx-cache \
--cache-to=type=local,dest=/tmp/.buildx-cache"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't do anything, OPTIONAL_BUILD_FLAGS is not an option in make release-build which is causing the layers not to be caching properly

image

Once this is fixed you should test by triggering multiple builds (with different commit hashes) (for example by modifying the example app or adding a command to the Dockerfile) and showing that subsequent builds are much faster since most of the layers are already cached.

lisac added a commit to navapbc/template-application-rails that referenced this pull request Apr 17, 2025
## Ticket

n/a

## Changes

Updates a documentation link that no longer resolves.
https://www.rubydoc.info/github/hassox/warden gives a 404. The
underlying https://github.com/hassox/warden redirects to
https://github.com/wardencommunity/warden. Updated the documentation to
use the latter's wiki (https://github.com/wardencommunity/warden/wiki)


## Testing

navapbc/platform-test#196 (shows the CI check
that flagged the broken link)
Co-authored-by: Loren Yu <loren@navapbc.com>
@lisac
Copy link
Copy Markdown
Author

lisac commented Apr 17, 2025

@lorenyu: your notes here make sense to me. Thanks for taking the time to walk through this.
I'm not confident about having revisions reviewable this week - next week more likely - will re-request a review when I have those changes in. Thanks again!

lisac added 11 commits April 17, 2025 12:02
and adjust the inputs for the various cache actions: 1) i don't think we want to rely on restore-keys, and 2) we re-use the docker image name as the cache key (maybe?)
why: the dockle scan should suffice for testing the build job that's the subject of this branch
… to building and caching a docker image of the app.

(running into issues with the buildx caching)
…he docker image"

technically, it is an option to delay checking out the repo, however we'd need to duplicate the logic from the Makefile around the expected image name.
i think not worth it.

This reverts commit 95a8a18.
@lisac lisac marked this pull request as draft April 21, 2025 19:18
lisac added 2 commits May 2, 2025 19:09
needed in order to configure aws credentials
@lisac
Copy link
Copy Markdown
Author

lisac commented May 2, 2025

status: making progress, ran into what i think is a fixable bug and will revisit next week.

  • good: moved the build logic into a composite action used by build-and-publish and vulnerability-scans; observed desired concurrency behavior, to avoid multiple workflows executing the build action concurrently
  • bug: the cache lookup is not behaving as i expected. observed: the two workflows (build-and-publish and vulnerability-scans) apparently used different SHAs in their cache keys, whereas i expected the cache keys to be identical. they keys they used:
    • platform-test-app:1a6cc698c726109c799bfca714a2a48496e76cd3
    • platform-test-app:ba457e9ca8cc127a1361b1fc72e4cf56f9c6105f

next steps:

  • fix the bug and remove debugging
  • update the Issue's description
  • re-request PR reviews

lisac added 4 commits May 4, 2025 16:25
in the step before calling the build-release-candidate action
the build triggered through the vulnerability workflow is still coming up with a different SHA from the build triggered through build-and-publish.
Copy link
Copy Markdown
Collaborator

@lorenyu lorenyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realize you're still working on this, but i saw you left a comment so i decided to poke and left some comments. feel free to ignore if they are a distraction but hopefully they help

echo "Is image published: $is_image_published"
echo "is_image_published=$is_image_published" >> "$GITHUB_OUTPUT"

- name: Build release
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the reason for making this a separate job?

name: Check whether the image is already published
runs-on: ubuntu-latest
needs: get-commit-hash
concurrency: build-and-publish-${{ inputs.app_name }}-${{ needs.get-commit-hash.outputs.commit_hash }}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this concurrency is important to keep. the concurrency statement later on that uses github.ref won't work since different github.refs can refer to the same commit hash (e.g. main, origin/main, HEAD, , ) can all be valid refs that point to the same commit hash, so there'd be a race condition where multiple jobs are trying to build the same commit hash but don't realize it since they are referencing the commit via different refs. that's why we have the separate job beforehand that gets the commit hash.

Comment on lines +107 to +116
- name: Restore cached Docker image
uses: actions/cache/restore@v4
with:
path: /tmp/docker-image.tar
key: ${{ steps.build-release-candidate.outputs.image_cache_key }}
fail-on-cache-miss: true

- name: Load cached Docker image
run: |
docker load < /tmp/docker-image.tar
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we put these steps into actions/build-release-candidate

uses: ./.github/workflows/vulnerability-scans.yml
with:
app_name: "app-rails"
ref: ${{ github.ref }}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this needed?

with:
path: /tmp/docker-image.tar
key: ${{ steps.create-image-identifier.outputs.image }}
lookup-only: true
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the reason for only doing a lookup? i think we'd want to download the image if it's already built

Comment on lines +43 to +48
- name: Cache Docker image
if: steps.check-image-already-exists.outputs.cache-hit != 'true'
uses: actions/cache/save@v4
with:
path: /tmp/docker-image.tar
key: ${{ steps.create-image-identifier.outputs.image }}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this step shouldn't be needed. cache should save automatically when the job completes

key: ${{ steps.create-image-identifier.outputs.image }}
lookup-only: true

- name: Build and tag Docker image for scanning
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessarily in scope for this PR if it is getting complex, but in an older version of this PR you had code that also cached the intermediate docker layers in /tmp/.buildx-cache which could dramatically speed up builds even when it's a cache miss since some of the intermediate layers will be cached

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you're right. I removed that from my branch because I was overwhelmed by all the other issues I was running into. At this point - I want to get the PR closed! - I think I won’t attempt to get that feature working. Should I propose it as a new issue? or add it as a comment to navapbc/template-infra#206 ?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure go ahead and create a new issue and link it here

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lisac
Copy link
Copy Markdown
Author

lisac commented May 7, 2025

@lorenyu : I appreciate these notes - thank you. For what you’ve flagged, most of those were knowledge gaps on my part, or me thinking too narrowly around the vulnerability scans workflow. I’ll be following your notes to get this PR cleaned up.

@lorenyu
Copy link
Copy Markdown
Collaborator

lorenyu commented May 7, 2025

@lisac sounds good! happy to help if you run into any issues

lisac added 17 commits May 8, 2025 11:44
instead of having each of the caller workflows follow up with those steps
why: we're not gaining efficiency by running the lookup-only mode as its own step
…the commit hash

see #196 (comment)
multiple github.refs may evaluate to the same commit hash
@lisac
Copy link
Copy Markdown
Author

lisac commented May 14, 2025

Closing without merging. While I like this PR's change to the vulnerability workflow stylistically, it doesn't meaningfully improve the developer experience. Consider that the jobs in the vulnerability workflow run in parallel, thus whether each job builds the docker image before executing a particular vulnerability scan (version in main) versus introducing a job specific to building the image and having the other jobs wait for this single job (version in this PR), the runtime is roughly the same.

Let's first implement navapbc/template-infra#936, which is the more impactful aspect from Daphne's implementation. That will benefit the vulnerability workflow, and as a follow-on we could re-consider having the vulnerability workflow have a single job specific to building the image.

@lisac lisac closed this May 14, 2025
@lisac lisac deleted the lisac/ci-vulnerability-scans-build-image-once branch October 8, 2025 11:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants