Open
Description
Description
I would like to propose the implementation of a hardlink feature in the caching mechanism to optimize memory usage, improve performance and save disk space.
Background
The current caching system stores files in memory, which can lead to high memory usage, especially when dealing with large datasets. By utilizing hardlinks, we can reduce memory consumption and storage redundancy by allowing multiple references to the same file on disk without duplicating the file content.
Design
Key Components
- HardlinkManager: Manages the creation, validation, and persistence of hardlinks.
- CreateLink: Attempts to create a hardlink for a given cache key.
- HasHardlink: Checks if a hardlink exists for a given key.
- Persist and Restore: Manages the persistence of hardlink metadata to disk and restores it on startup.
- DirectoryCache: Implements the cache logic, including hardlink support.
- CreateHardlink: Invokes the HardlinkManager to create a hardlink.
- HasHardlink: Checks for the existence of a hardlink using the HardlinkManager.
- Configuration: The EnableHardlink flag in the configuration determines whether hardlinking is enabled.
Work Flow
[Start]
|
v
[Initialize Cache]
|
v
[Check if Hardlinking is Enabled]
|
v
[Access Cached File]
|
v
[Check if Hardlink Exists] -- No --> [Create Hardlink]
| |
Yes v
| [Verify Hardlink]
v |
[Use Hardlink] v
| [Rename to Final Location]
v |
[Persist Hardlink State] <-------------|
|
v
[Restore Hardlink State on Startup]
|
v
[End]
+-----------------------------+
- Cache Write:
- When a file is added to the cache, the system checks if hardlinking is enabled.
- If enabled, it attempts to create a hardlink for the cached file.
- Cache Read:
- When accessing a cached file, the system checks if a hardlink exists.
- If a hardlink exists, it uses the hardlink path to access the file.
Persistence: - Hardlink metadata is periodically persisted to disk.
- On startup, the system restores hardlink metadata from disk.
Benefits
- Reduced Memory Usage: By leveraging hardlinks, we can significantly decrease the memory footprint of the caching system.
- Improved Performance: Hardlinks allow for faster access to cached files, as they avoid the overhead of duplicating file data.
- Data Deduplication: Hardlinks inherently support data deduplication by allowing multiple cache entries to reference the same physical file, reducing storage redundancy.
- Scalability: This feature will enable the caching system to handle larger datasets more efficiently.
Metadata
Metadata
Assignees
Labels
No labels