Skip to content

Not failing over to other mirrors after zchunk checksum cache is poisoned #372

@dxdxdt

Description

@dxdxdt

When a repo is configured to use metalink=, if the files specified in repomd.xml are fetched from a mirror and the checksum test fails against the hashes included in repomd.xml, dnf5 fails to fail over to other mirrors. A possible explanation is that the bad checksums calculated off the files from the first mirror is cached and the checksums of the files from another mirror are not re-calculated.

 > [10/10] RUN make test-failover-checksum-ok:
0.210 dnf clean all
0.242 Removed 6 files, 3 directories (total of 50 MiB). 0 errors occurred.
0.244 dnf makecache --repo=test-bad-meta-checksum
0.267 Updating and loading repositories:
0.545  test-bad-meta-checksum                 100% |  96.4 GiB/s |  26.5 GiB |  00m00s
0.545 >>> At least one of the zchunk checksums doesn't match in file:///opt/dnf-bad-meta-checksum/checksum/repodata/e1a77f25f5b991a4527dee93a73015795bdd74e024703cee79e6f789dc9da9fc-primary.xml.zck - file:///opt/dnf-bad-meta-checksum/checksum/repodata/e1a77f25f5b991a4527dee93a73015795bdd74e024703cee79e6f789dc9da9fc-primary.xml.zck
0.545 >>> At least one of the zchunk checksums doesn't match in file:///opt/dnf-bad-meta-checksum/ok/repodata/e1a77f25f5b991a4527dee93a73015795bdd74e024703cee79e6f789dc9da9fc-primary.xml.zck - file:///opt/dnf-bad-meta-checksum/ok/repodata/e1a77f25f5b991a4527dee93a73015795bdd74e024703cee79e6f789dc9da9fc-primary.xml.zck
0.545 >>> No more mirrors to try - All mirrors were already tried without success
0.546 Failed to download metadata (metalink: "file:///opt/dnf-bad-meta-checksum/test-metalink.checksum-ok.xml") for repository "test-bad-meta-checksum": Cannot download, all mirrors were already tried without success
0.552 make: *** [Makefile:32: test-failover-checksum-ok] Error 1
------
Dockerfile:12
--------------------
  10 |     RUN make test-ok
  11 |     RUN make test-failover-404-ok
  12 | >>> RUN make test-failover-checksum-ok

The test case I wrote test if dnf successfully fails over to a healthy mirror having tried to cache the bad mirror that provides corrupted repodata files. The prior sanity checks include:

  1. trying one healthy mirror
  2. trying a mirror that 404's, then trying a healthy one

I'm still investigating to identify the culprit myself.

Found whilst investigating the following incident: https://discussion.fedoraproject.org/t/quality-control-on-bad-public-mirror-operators/185687

Metadata

Metadata

Assignees

No one assigned

    Labels

    Priority: MEDIUMTriagedSomeone on the DNF 5 team has read the issue and determined the next steps to take

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions