-
Notifications
You must be signed in to change notification settings - Fork 123
Add hardlink support #1954
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add hardlink support #1954
Conversation
ff5ecf3
to
3e921b6
Compare
cache/cache.go
Outdated
if linkPath, exists := dc.hlManager.GetLink(key); exists { | ||
if _, err := os.Stat(linkPath); err == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not just doing wipfile?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’m not sure I fully understand your point. Are you asking, 'Why do we need to check if a file with the same content has been hardlinked?' If that’s the case, let me explain: If a file has the same content, we wouldn’t need to create a new cache file; instead, we could simply hardlink to the existing one and use it. This approach would be beneficial for performance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be useful if I can try to refactor the CreateLink
function to use the wipfile
function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I have misunderstood anything, please feel free to correct me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh! Got your point! I'll refactor my code.
cache/cache.go
Outdated
if linkPath, exists := dc.hlManager.GetLink(key); exists { | ||
if r, err := os.Open(linkPath); err == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the performance benefit comparing to just doing os.Open(dc.cachePath(key))?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's not related to the os.Open function. However, I think it's a good point, and I can reuse the dc.cachePath(key)
function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got your point. I'll refactor my code.
docs/hardlink.md
Outdated
|
||
## Conclusion | ||
|
||
Enabling hardlinking in `stargz-snapshotter` can significantly optimize storage and improve performance. By following the steps outlined in this guide, you can leverage this feature to enhance your containerized environments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
numbers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you referring to specific numbers regarding performance or storage savings?
b879c95
to
457b45b
Compare
8829e66
to
92702af
Compare
Needs rebase |
92702af
to
1d5fd08
Compare
Done. |
1d5fd08
to
8694c9a
Compare
Just retrigger ci, no code change. |
This commit adds a hardlink system for the Stargz Snapshotter cache to optimize storage and improve performance. The system intelligently creates hardlinks between identical content chunks, significantly reducing disk space usage in environments with many containers using the same base layers. Key changes: - Add new HardlinkManager that tracks files by chunk digest - Enable hardlinking between chunk files with same content - Add configuration option `EnableHardlink` to control the feature - Preserve file digest mapping across snapshotter restarts - Add documentation on hardlink usage and configuration The implementation includes: - Chunk-level digest tracking for optimizing cache lookups - Background persistence of hardlink mappings to survive restarts - Automatic cleanup of unused digest mappings - Test suite for hardlink functionality Signed-off-by: ChengyuZhu6 <[email protected]>
@ktock I conducted experiments with several basic images, converting them to the estargz format and running them in containers with a simple 'echo "hello"' command. These tests used only background threads of stargz to pull images to the local machine. By measuring the overall memory and disk usage, I observed that implementing hardlinks resulted in a 20-30% reduction in both memory consumption and disk space requirements. ![]() ![]() |
kindly ping @ktock |
Propose the implementation of a hardlink feature in the caching mechanism to optimize memory usage, improve performance and save disk space.
Fixes: #1953
Depends-on: #1948