Skip to content

Don't use delta for tiny files#70

Merged
djach7 merged 1 commit intocontainers:mainfrom
alexlarsson:dont-diff-tiny-files
Apr 9, 2026
Merged

Don't use delta for tiny files#70
djach7 merged 1 commit intocontainers:mainfrom
alexlarsson:dont-diff-tiny-files

Conversation

@alexlarsson
Copy link
Copy Markdown
Collaborator

Files that are smaller than the pathname of the delta source (and some extra header bytes) now are copied directly to the delta rather than being delta:ed, because just the delta header would already be larger than the file content.

This happens a lot in e.g. bootc images, because there are lots of tiny ostree metadata objects that have long pathnames, like the *.file-xattrs-link files that are 46 bytes, but have 110 char long pathnames.

@codecov
Copy link
Copy Markdown

codecov bot commented Apr 1, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 40.59%. Comparing base (13704c8) to head (00093ac).
⚠️ Report is 2 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff             @@
##             main      #70       +/-   ##
===========================================
- Coverage   59.51%   40.59%   -18.92%     
===========================================
  Files          10       10               
  Lines        1035     1037        +2     
===========================================
- Hits          616      421      -195     
- Misses        305      545      +240     
+ Partials      114       71       -43     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a check to skip delta generation for files smaller than the source path plus a small overhead, preventing inefficient delta sizes. A suggestion was made to replace the magic number used for the overhead calculation with a named constant to improve code clarity.

Copy link
Copy Markdown
Collaborator

@djach7 djach7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Copy Markdown
Collaborator

@rmiki-dev rmiki-dev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! codecov/project is failing, but once the branch gets rebased and pushed, I should be able to merge

@alexlarsson alexlarsson force-pushed the dont-diff-tiny-files branch 2 times, most recently from 403962b to 14dd19b Compare April 7, 2026 06:52
@alexlarsson
Copy link
Copy Markdown
Collaborator Author

I rebased this.

@alexlarsson
Copy link
Copy Markdown
Collaborator Author

It seems the code coverage test things the code coverage decreased by 19%, but i'm only adding three lines. Something must be going wrong with it.

@kgiusti
Copy link
Copy Markdown
Collaborator

kgiusti commented Apr 7, 2026

I think I can explain the drop in codecov - the proposed patch contains an optimization that's really very good. Turns out our test coverage now hits the optimization so we're not executing the old codepaths. Good for the tool, but makes codecov sad.

There's nothing wrong with this patch. But we do need to update the test cases to use data that doesn't trigger the optimization.

Here's a quick compare of main branch coverage vs the PR:

on main:

go tool covdata percent -i=/home/kgiusti/work/containers/tar-diff/test/coverage -o=/home/kgiusti/work/containers/tar-diff/test/coverage/integration.out
	github.com/containers/tar-diff/cmd/tar-diff		coverage: 62.7% of statements
	github.com/containers/tar-diff/cmd/tar-patch		coverage: 60.7% of statements
	github.com/containers/tar-diff/pkg/protocol		coverage: 75.0% of statements
	github.com/containers/tar-diff/pkg/tar-diff		coverage: 67.0% of statements
	github.com/containers/tar-diff/pkg/tar-patch		coverage: 67.1% of statements

from pr:

go tool covdata percent -i=/home/kgiusti/work/containers/tar-diff/test/coverage -o=/home/kgiusti/work/containers/tar-diff/test/coverage/integration.out
	github.com/containers/tar-diff/cmd/tar-diff		coverage: 62.7% of statements
	github.com/containers/tar-diff/cmd/tar-patch		coverage: 60.7% of statements
	github.com/containers/tar-diff/pkg/protocol		coverage: 75.0% of statements
	github.com/containers/tar-diff/pkg/tar-diff		coverage: 43.1% of statements
	github.com/containers/tar-diff/pkg/tar-patch		coverage: 24.4% of statements

@djach7
Copy link
Copy Markdown
Collaborator

djach7 commented Apr 7, 2026

I just created an issue to update the tests (#76). This PR is good to merge imo, I think the commits just need to be re-signed.

Files that are smaller than the pathname of the delta source (and some
extra header bytes) now are copied directly to the delta rather than
being delta:ed, because just the delta header would already be larger
than the file content.

This happens a lot in e.g. bootc images, because there are lots of
tiny ostree metadata objects that have long pathnames, like the
*.file-xattrs-link files that are 46 bytes, but have 110 char long
pathnames.

Signed-off-by: Alexander Larsson <alexl@redhat.com>
@alexlarsson alexlarsson force-pushed the dont-diff-tiny-files branch from 14dd19b to 00093ac Compare April 8, 2026 12:00
@alexlarsson
Copy link
Copy Markdown
Collaborator Author

I signed the commit

@djach7 djach7 merged commit f472e63 into containers:main Apr 9, 2026
12 of 14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants