-
-
Notifications
You must be signed in to change notification settings - Fork 810
mount: fix write_offset, #9245 #9246
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
ThomasWaldmann
wants to merge
3
commits into
borgbackup:1.4-maint
Choose a base branch
from
ThomasWaldmann:fuse-fixes-1.4
base: 1.4-maint
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
mount: fix write_offset, #9245 #9246
ThomasWaldmann
wants to merge
3
commits into
borgbackup:1.4-maint
from
ThomasWaldmann:fuse-fixes-1.4
+54
−19
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Maybe this caused borgbackup#9245, not sure. AI analysis: ### Analysis of Inode Collision and File Confusion in `borg mount` Without the applied fixes, it was indeed possible for a `borg mount` of a repository to contain different files that appeared identical (showing the same metadata and content) even though they were distinct in the underlying archives. ### The Mechanism of Confusion The issue stemmed from how `ItemCache` managed `write_offset`, the pointer used to allocate unique inodes for archive items: 1. **Lazy Inode Allocation**: Inodes for archive items are generated during the processing of an archive's metadata. Each item is assigned an inode based on the current `write_offset`. 2. **Deferred State Update**: Before the fix, `self.write_offset` was only updated at the very end of the `iter_archive_items` generator. 3. **Failure Scenario**: If processing an archive failed midway (e.g., due to a repository error or network issue), the items already processed were added to the FUSE directory structure (`self.contents`) with inodes based on the *old* `write_offset`. However, because the generator didn't complete, `self.cache.write_offset` remained unchanged. 4. **Collision**: When the next archive was accessed, `ItemCache` would start allocating inodes from the same old `write_offset`. This resulted in the new archive's items receiving the same inode numbers as the partially loaded previous archive. 5. **Cache Overwrite**: Since the FUSE layer uses a central `_inode_cache` and `ItemCache` stores metadata in a flat `meta` array indexed by the inode, the metadata for the new archive would overwrite the entries for the colliding inodes. ### User-Visible Impact When a user attempted to access a file from the first (partially loaded) archive that suffered an inode collision: - **Metadata Confusion**: The `stat` call would return metadata (permissions, timestamps, size) belonging to the file from the *second* archive. - **Content Confusion**: Because the inode now pointed to the metadata of the second file, reading the first file would actually stream the data chunks associated with the second file. ### Conclusion The fixes ensure that `self.write_offset` is updated immediately after every inode allocation. This guarantees that even if an archive fails to load completely, the inode range it "consumed" remains reserved, and subsequent archives will always receive fresh, unique inodes, preventing any metadata or content confusion.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## 1.4-maint #9246 +/- ##
==========================================
Coverage 80.59% 80.59%
==========================================
Files 38 38
Lines 11256 11256
Branches 1771 1771
==========================================
Hits 9072 9072
Misses 1615 1615
Partials 569 569 ☔ View full report in Codecov by Sentry. |
There is actual concurrency when using `pyfuse3` in Borg, which necessitates thread safety mechanisms like the recently added `threading.Lock`. Borg's `pyfuse3` implementation is built on top of the **`trio`** async framework. When Borg is started in `pyfuse3` mode, it calls `trio.run(llfuse.main)`. 1. **Async/Await Model**: Unlike the classic `llfuse` (which Borg runs with `workers=1` to remain single-threaded), `pyfuse3` uses an asynchronous event loop. While it often runs on a single OS thread, `trio` allows multiple tasks to be "in-flight" simultaneously. 2. **Context Switching**: When an operation (like reading metadata or data) hits an `await` point—such as during network I/O with a remote repository or disk I/O—`trio` can suspend that task and switch to another one. 3. **Parallel Archive Loading**: In `borg mount` (without a specific archive specified), archives are loaded lazily. If a user or a process (like `find` or a file manager) triggers a `lookup` or `opendir` on multiple archive directories nearly simultaneously, multiple `check_pending_archive` calls can be active at once. 4. **Race Conditions**: Because `iter_archive_items` is a generator that yields control back to the caller (which, in the `pyfuse3` case, happens within an async wrapper), and because it performs I/O via `get_many`, it creates windows where one archive's loading process can be suspended while another begins. Even if running on a single OS thread, the interleaved execution of async tasks can lead to the same data corruption issues as multi-threading: - **Shared State**: Both tasks access the same `ItemCache` instance. - **Inode Collisions**: If two tasks read the `write_offset` before either has incremented it, they will assign the same inode numbers to different files and overwrite each other's metadata in the `meta` bytearray. - **Global Interpreter Lock (GIL)**: While the GIL protects Python's internal memory integrity, it does not prevent logical race conditions at the application level when tasks are interleaved. The use of `threading.Lock` in `check_pending_archive` and the direct instance variable access in `ItemCache` ensure that even when `trio` switches between concurrent FUSE requests, the internal state remains consistent and inode assignment remains unique across all archives.
### Performance Comparison: `bytearray.extend()` vs. Concatenation The efficiency difference between `bytearray.extend(bytes(N))` and `self.meta = self.meta + bytes(N)` stems from how Python manages memory and objects during these operations. #### 1. In-Place Modification vs. Object Creation - **`bytearray.extend()`**: This is an **in-place** operation. If the current memory block allocated for the `bytearray` has enough extra capacity (pre-allocated space), Python simply writes the new bytes into that space and updates the length. If it needs more space, it uses `realloc()`, which can often expand the existing memory block without moving the entire data set to a new location. - **Concatenation (`+`)**: This creates a **completely new** `bytearray` object. It allocates a new memory block large enough to hold the sum of both parts, copies the contents of `self.meta`, copies the contents of `bytes(N)`, and then reassigns the variable `self.meta` to this new object. #### 2. Computational Complexity - **`bytearray.extend()`**: In the best case (when capacity exists), it is **O(K)**, where K is the number of bytes being added. In the worst case (reallocation), it is **O(N + K)**, but Python uses an over-allocation strategy (growth factor) that amortizes this cost, making it significantly faster on average. - **Concatenation (`+`)**: It is always **O(N + K)** because it must copy the existing `N` bytes every single time. As the `bytearray` grows larger (e.g., millions of items in a backup), this leads to **O(N²)** total time complexity across multiple additions, as you are repeatedly copying an ever-growing buffer. #### 3. Memory Pressure and Garbage Collection - Concatenation briefly requires memory for **both** the old buffer and the new buffer simultaneously before the old one is garbage collected. This increases the peak memory usage of the process. - `extend()` is more memory-efficient as it minimizes the need for multiple large allocations and relies on the underlying memory manager's ability to resize buffers efficiently. In the context of `borg mount`, where `self.meta` can grow to be many megabytes or even gigabytes for very large repositories, using concatenation causes a noticeable slowdown as the number of archives or files increases, whereas `extend()` remains performant.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Maybe this caused #9245, not sure.
AI analysis:
Analysis of Inode Collision and File Confusion in
borg mountWithout the applied fixes, it was indeed possible for a
borg mountof a repository to contain different files that appeared identical (showing the same metadata and content) even though they were distinct in the underlying archives.The Mechanism of Confusion
The issue stemmed from how
ItemCachemanagedwrite_offset, the pointer used to allocate unique inodes for archive items:write_offset.self.write_offsetwas only updated at the very end of theiter_archive_itemsgenerator.self.contents) with inodes based on the oldwrite_offset. However, because the generator didn't complete,self.cache.write_offsetremained unchanged.ItemCachewould start allocating inodes from the same oldwrite_offset. This resulted in the new archive's items receiving the same inode numbers as the partially loaded previous archive._inode_cacheandItemCachestores metadata in a flatmetaarray indexed by the inode, the metadata for the new archive would overwrite the entries for the colliding inodes.User-Visible Impact
When a user attempted to access a file from the first (partially loaded) archive that suffered an inode collision:
statcall would return metadata (permissions, timestamps, size) belonging to the file from the second archive.Conclusion
The fixes ensure that
self.write_offsetis updated immediately after every inode allocation. This guarantees that even if an archive fails to load completely, the inode range it "consumed" remains reserved, and subsequent archives will always receive fresh, unique inodes, preventing any metadata or content confusion.