Skip to content

Add hardlink support #1954

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Conversation

ChengyuZhu6
Copy link
Contributor

Propose the implementation of a hardlink feature in the caching mechanism to optimize memory usage, improve performance and save disk space.

Fixes: #1953

Depends-on: #1948

@ChengyuZhu6 ChengyuZhu6 force-pushed the hardlink branch 3 times, most recently from ff5ecf3 to 3e921b6 Compare January 24, 2025 02:44
cache/cache.go Outdated
Comment on lines 313 to 316
if linkPath, exists := dc.hlManager.GetLink(key); exists {
if _, err := os.Stat(linkPath); err == nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not just doing wipfile?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m not sure I fully understand your point. Are you asking, 'Why do we need to check if a file with the same content has been hardlinked?' If that’s the case, let me explain: If a file has the same content, we wouldn’t need to create a new cache file; instead, we could simply hardlink to the existing one and use it. This approach would be beneficial for performance.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be useful if I can try to refactor the CreateLink function to use the wipfile function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I have misunderstood anything, please feel free to correct me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh! Got your point! I'll refactor my code.

cache/cache.go Outdated
Comment on lines 233 to 234
if linkPath, exists := dc.hlManager.GetLink(key); exists {
if r, err := os.Open(linkPath); err == nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the performance benefit comparing to just doing os.Open(dc.cachePath(key))?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not related to the os.Open function. However, I think it's a good point, and I can reuse the dc.cachePath(key) function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got your point. I'll refactor my code.

docs/hardlink.md Outdated

## Conclusion

Enabling hardlinking in `stargz-snapshotter` can significantly optimize storage and improve performance. By following the steps outlined in this guide, you can leverage this feature to enhance your containerized environments.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

numbers?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you referring to specific numbers regarding performance or storage savings?

@ChengyuZhu6 ChengyuZhu6 force-pushed the hardlink branch 2 times, most recently from b879c95 to 457b45b Compare February 5, 2025 11:17
@ChengyuZhu6 ChengyuZhu6 requested a review from ktock February 5, 2025 11:18
@ChengyuZhu6 ChengyuZhu6 force-pushed the hardlink branch 2 times, most recently from 8829e66 to 92702af Compare February 10, 2025 06:20
@AkihiroSuda
Copy link
Member

Needs rebase

@ChengyuZhu6
Copy link
Contributor Author

Needs rebase

Done.

@ChengyuZhu6
Copy link
Contributor Author

Just retrigger ci, no code change.

This commit adds a hardlink system for the Stargz Snapshotter cache to optimize
storage and improve performance. The system intelligently creates hardlinks between
identical content chunks, significantly reducing disk space usage in environments
with many containers using the same base layers.

Key changes:
- Add new HardlinkManager that tracks files by chunk digest
- Enable hardlinking between chunk files with same content
- Add configuration option `EnableHardlink` to control the feature
- Preserve file digest mapping across snapshotter restarts
- Add documentation on hardlink usage and configuration

The implementation includes:
- Chunk-level digest tracking for optimizing cache lookups
- Background persistence of hardlink mappings to survive restarts
- Automatic cleanup of unused digest mappings
- Test suite for hardlink functionality

Signed-off-by: ChengyuZhu6 <[email protected]>
@ChengyuZhu6
Copy link
Contributor Author

@ktock I conducted experiments with several basic images, converting them to the estargz format and running them in containers with a simple 'echo "hello"' command. These tests used only background threads of stargz to pull images to the local machine. By measuring the overall memory and disk usage, I observed that implementing hardlinks resulted in a 20-30% reduction in both memory consumption and disk space requirements.

image image

@ChengyuZhu6
Copy link
Contributor Author

When working with different versions of an image, implementing hardlinks can achieve a memory and disk deduplication effect of nearly 50%. I verified this by conducting tests with various development versions of Golang images.

image

@ChengyuZhu6
Copy link
Contributor Author

kindly ping @ktock

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement Hardlink Feature for Cache Optimization and Data Deduplication
3 participants