Skip to content

build.yml: fix eve job cache handling#220

Open
europaul wants to merge 2 commits intomasterfrom
rene/fix-ci-build-cache
Open

build.yml: fix eve job cache handling#220
europaul wants to merge 2 commits intomasterfrom
rene/fix-ci-build-cache

Conversation

@europaul
Copy link
Collaborator

The eve job was rebuilding arm64 packages from scratch instead of using the ones already built by the packages job. Investigating the root cause revealed several interrelated issues.

  1. Redundant 'pkgs' target in the eve build command

    The eve job ran 'make pkgs eve', but the packages job already builds and caches all packages. Since the eve job restores the cache first, the 'pkgs' target should be a no-op. Removed it.

  2. arm64 packages were never restored from cache

    The cache restore logic had a conditional: if the runner arch matched the matrix arch, it skipped both clearing the linuxkit cache and restoring the target arch cache. The assumption was that the first cache restore (for tool images) already had the right packages. But that first restore always fetched the amd64 generic cache — even on arm64 runners. So arm64 jobs were left with amd64 packages in the cache, and 'make pkgs' (issue Update README.md #1) was silently rebuilding everything for arm64.

  3. Tool images were hardcoded to amd64

    The cache key for loading tool images (mkconf, mkimage-raw-efi, mkrootfs-squash, etc.) into docker was hardcoded to amd64. On arm64 runners this is wrong — they need arm64 tool images. Since for native builds the target cache already contains these tools, we now load them directly from the target cache. The two-cache dance (load tools from one arch, then restore packages from another) is only needed for riscv64 cross-builds on amd64.

  4. The 'rt' platform maps to generic packages

    No build-rt.yml files exist anywhere in pkg/, so PLATFORM=rt produces identical packages to PLATFORM=generic. Rather than adding a redundant amd64/rt entry to the packages matrix, we map 'rt' to 'generic' in the cache key.

The fix simplifies the eve job's cache handling:

  • Native builds (amd64, arm64): restore target cache, load tools, build
  • Cross-builds (riscv64): restore amd64 cache, load tools, clear, restore riscv64 cache, build

The "Arch Runner is Matrix" step is removed as it is no longer used.

Description

Provide a clear and concise description of the changes in this PR and
explain why they are necessary.

If the PR contains only one commit, you will see the commit message above:
fill free to use it under the description section here, if it is good enough.

For Backport PRs, a full description is optional, but please clearly state
the original PR number(s). Use the #{NUMBER} format for that, it makes it easier
to handle with the scripts later. For example:

Backport of lf-edge#1234, lf-edge#5678, #91011

Title of a backport PR must also follow the following format:

"[x.y-stable] Original PR title".

where x.y-stable is the name of the target stable branch, and
Original PR title is the title of the original PR.

For example, for a PR that backports a PR with title Fix the nasty bug to
branch 13.4-stable the title should be:
[13.4-stable] Fix the nasty bug.

PR dependencies

List all dependencies of this PR (when applicable, otherwise remove this
section).

How to test and validate this PR

Please describe how the changes in this PR can be validated or verified. For
example:

  • If your PR fixes a bug, outline the steps to confirm the issue is resolved.
  • If your PR introduces a new feature, explain how to test and validate it.

This will be used

  1. to provide test scenarios for the QA team
  2. by a reviewer to validate the changes in this PR.

The first is especially important, so, please make sure to provide as much
detail as possible.

If it's covered by an automated test, please mention it here.

Changelog notes

Text in this section will be used to generate the changelog entry for
release notes. The consumers of this are end users, not developers.
So, provide a clear and short description of what is changed in the PR from
the end user perspective. If it changes only tooling or some internal
implementation, put a note like "No user-facing changes" or "None".

PR Backports

For all current LTS branches, please state explicitly if this PR should be
backported or not. This section is used by our scripts to track the backports,
so, please, do not omit it.

Here is the list of current LTS branches (it should be always up to date):

  • 16.0-stable
  • 14.5-stable
  • 13.4-stable

For example, if this PR fixes a bug in a feature that was introduced in 14.5,
you can write:

- 16.0-stable: To be backported.
- 14.5-stable: No, as the feature is not available there.
- 13.4-stable: No, as the feature is not available there.

Also, to the PRs that should be backported into any stable branch, please
add a label stable.

Checklist

  • I've provided a proper description
  • I've added the proper documentation
  • I've tested my PR on amd64 device
  • I've tested my PR on arm64 device
  • I've written the test verification instructions
  • I've set the proper labels to this PR

For backport PRs (remove it if it's not a backport):

  • I've added a reference link to the original PR
  • PR's title follows the template

And the last but not least:

  • I've checked the boxes above, or I've provided a good reason why I didn't
    check them.

Please, check the boxes above after submitting the PR in interactive mode.

The eve job was rebuilding arm64 packages from scratch instead of
using the ones already built by the packages job. Investigating
the root cause revealed several interrelated issues.

1. Redundant 'pkgs' target in the eve build command

   The eve job ran 'make pkgs eve', but the packages job already
   builds and caches all packages. Since the eve job restores the
   cache first, the 'pkgs' target should be a no-op. Removed it.

2. arm64 packages were never restored from cache

   The cache restore logic had a conditional: if the runner arch
   matched the matrix arch, it skipped both clearing the linuxkit
   cache and restoring the target arch cache. The assumption was
   that the first cache restore (for tool images) already had the
   right packages. But that first restore always fetched the amd64
   generic cache — even on arm64 runners. So arm64 jobs were left
   with amd64 packages in the cache, and 'make pkgs' (issue #1)
   was silently rebuilding everything for arm64.

3. Tool images were hardcoded to amd64

   The cache key for loading tool images (mkconf, mkimage-raw-efi,
   mkrootfs-squash, etc.) into docker was hardcoded to amd64. On
   arm64 runners this is wrong — they need arm64 tool images. Since
   for native builds the target cache already contains these tools,
   we now load them directly from the target cache. The two-cache
   dance (load tools from one arch, then restore packages from
   another) is only needed for riscv64 cross-builds on amd64.

4. The 'rt' platform maps to generic packages

   No build-rt.yml files exist anywhere in pkg/, so PLATFORM=rt
   produces identical packages to PLATFORM=generic. Rather than
   adding a redundant amd64/rt entry to the packages matrix, we
   map 'rt' to 'generic' in the cache key.

The fix simplifies the eve job's cache handling:
- Native builds (amd64, arm64): restore target cache, load tools, build
- Cross-builds (riscv64): restore amd64 cache, load tools, clear,
  restore riscv64 cache, build

The "Arch Runner is Matrix" step is removed as it is no longer used.

Signed-off-by: Paul Gaiduk <paulg@zededa.com>
just to trigger build.yml

Signed-off-by: Paul Gaiduk <paulg@zededa.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant