Skip to content

Conversation

@ThomasWaldmann
Copy link
Member

Maybe this caused #9245, not sure.

AI analysis:

Analysis of Inode Collision and File Confusion in borg mount

Without the applied fixes, it was indeed possible for a borg mount of a repository to contain different files that appeared identical (showing the same metadata and content) even though they were distinct in the underlying archives.

The Mechanism of Confusion

The issue stemmed from how ItemCache managed write_offset, the pointer used to allocate unique inodes for archive items:

  1. Lazy Inode Allocation: Inodes for archive items are generated during the processing of an archive's metadata. Each item is assigned an inode based on the current write_offset.
  2. Deferred State Update: Before the fix, self.write_offset was only updated at the very end of the iter_archive_items generator.
  3. Failure Scenario: If processing an archive failed midway (e.g., due to a repository error or network issue), the items already processed were added to the FUSE directory structure (self.contents) with inodes based on the old write_offset. However, because the generator didn't complete, self.cache.write_offset remained unchanged.
  4. Collision: When the next archive was accessed, ItemCache would start allocating inodes from the same old write_offset. This resulted in the new archive's items receiving the same inode numbers as the partially loaded previous archive.
  5. Cache Overwrite: Since the FUSE layer uses a central _inode_cache and ItemCache stores metadata in a flat meta array indexed by the inode, the metadata for the new archive would overwrite the entries for the colliding inodes.

User-Visible Impact

When a user attempted to access a file from the first (partially loaded) archive that suffered an inode collision:

  • Metadata Confusion: The stat call would return metadata (permissions, timestamps, size) belonging to the file from the second archive.
  • Content Confusion: Because the inode now pointed to the metadata of the second file, reading the first file would actually stream the data chunks associated with the second file.

Conclusion

The fixes ensure that self.write_offset is updated immediately after every inode allocation. This guarantees that even if an archive fails to load completely, the inode range it "consumed" remains reserved, and subsequent archives will always receive fresh, unique inodes, preventing any metadata or content confusion.

Maybe this caused borgbackup#9245, not sure.

AI analysis:

### Analysis of Inode Collision and File Confusion in `borg mount`

Without the applied fixes, it was indeed possible for a `borg mount` of a repository to contain different files that appeared identical (showing the same metadata and content) even though they were distinct in the underlying archives.

### The Mechanism of Confusion

The issue stemmed from how `ItemCache` managed `write_offset`, the pointer used to allocate unique inodes for archive items:

1.  **Lazy Inode Allocation**: Inodes for archive items are generated during the processing of an archive's metadata. Each item is assigned an inode based on the current `write_offset`.
2.  **Deferred State Update**: Before the fix, `self.write_offset` was only updated at the very end of the `iter_archive_items` generator.
3.  **Failure Scenario**: If processing an archive failed midway (e.g., due to a repository error or network issue), the items already processed were added to the FUSE directory structure (`self.contents`) with inodes based on the *old* `write_offset`. However, because the generator didn't complete, `self.cache.write_offset` remained unchanged.
4.  **Collision**: When the next archive was accessed, `ItemCache` would start allocating inodes from the same old `write_offset`. This resulted in the new archive's items receiving the same inode numbers as the partially loaded previous archive.
5.  **Cache Overwrite**: Since the FUSE layer uses a central `_inode_cache` and `ItemCache` stores metadata in a flat `meta` array indexed by the inode, the metadata for the new archive would overwrite the entries for the colliding inodes.

### User-Visible Impact

When a user attempted to access a file from the first (partially loaded) archive that suffered an inode collision:
- **Metadata Confusion**: The `stat` call would return metadata (permissions, timestamps, size) belonging to the file from the *second* archive.
- **Content Confusion**: Because the inode now pointed to the metadata of the second file, reading the first file would actually stream the data chunks associated with the second file.

### Conclusion

The fixes ensure that `self.write_offset` is updated immediately after every inode allocation. This guarantees that even if an archive fails to load completely, the inode range it "consumed" remains reserved, and subsequent archives will always receive fresh, unique inodes, preventing any metadata or content confusion.
@codecov
Copy link

codecov bot commented Dec 31, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 80.59%. Comparing base (af232a5) to head (7321daa).
⚠️ Report is 1 commits behind head on 1.4-maint.

Additional details and impacted files
@@            Coverage Diff             @@
##           1.4-maint    #9246   +/-   ##
==========================================
  Coverage      80.59%   80.59%           
==========================================
  Files             38       38           
  Lines          11256    11256           
  Branches        1771     1771           
==========================================
  Hits            9072     9072           
  Misses          1615     1615           
  Partials         569      569           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

There is actual concurrency when using `pyfuse3` in Borg, which necessitates thread safety mechanisms like the recently added `threading.Lock`.

Borg's `pyfuse3` implementation is built on top of the **`trio`** async framework. When Borg is started in `pyfuse3` mode, it calls `trio.run(llfuse.main)`.

1.  **Async/Await Model**: Unlike the classic `llfuse` (which Borg runs with `workers=1` to remain single-threaded), `pyfuse3` uses an asynchronous event loop. While it often runs on a single OS thread, `trio` allows multiple tasks to be "in-flight" simultaneously.
2.  **Context Switching**: When an operation (like reading metadata or data) hits an `await` point—such as during network I/O with a remote repository or disk I/O—`trio` can suspend that task and switch to another one.
3.  **Parallel Archive Loading**: In `borg mount` (without a specific archive specified), archives are loaded lazily. If a user or a process (like `find` or a file manager) triggers a `lookup` or `opendir` on multiple archive directories nearly simultaneously, multiple `check_pending_archive` calls can be active at once.
4.  **Race Conditions**: Because `iter_archive_items` is a generator that yields control back to the caller (which, in the `pyfuse3` case, happens within an async wrapper), and because it performs I/O via `get_many`, it creates windows where one archive's loading process can be suspended while another begins.

Even if running on a single OS thread, the interleaved execution of async tasks can lead to the same data corruption issues as multi-threading:
-   **Shared State**: Both tasks access the same `ItemCache` instance.
-   **Inode Collisions**: If two tasks read the `write_offset` before either has incremented it, they will assign the same inode numbers to different files and overwrite each other's metadata in the `meta` bytearray.
-   **Global Interpreter Lock (GIL)**: While the GIL protects Python's internal memory integrity, it does not prevent logical race conditions at the application level when tasks are interleaved.

The use of `threading.Lock` in `check_pending_archive` and the direct instance variable access in `ItemCache` ensure that even when `trio` switches between concurrent FUSE requests, the internal state remains consistent and inode assignment remains unique across all archives.
### Performance Comparison: `bytearray.extend()` vs. Concatenation

The efficiency difference between `bytearray.extend(bytes(N))` and `self.meta = self.meta + bytes(N)` stems from how Python manages memory and objects during these operations.

#### 1. In-Place Modification vs. Object Creation
- **`bytearray.extend()`**: This is an **in-place** operation. If the current memory block allocated for the `bytearray` has enough extra capacity (pre-allocated space), Python simply writes the new bytes into that space and updates the length. If it needs more space, it uses `realloc()`, which can often expand the existing memory block without moving the entire data set to a new location.
- **Concatenation (`+`)**: This creates a **completely new** `bytearray` object. It allocates a new memory block large enough to hold the sum of both parts, copies the contents of `self.meta`, copies the contents of `bytes(N)`, and then reassigns the variable `self.meta` to this new object.

#### 2. Computational Complexity
- **`bytearray.extend()`**: In the best case (when capacity exists), it is **O(K)**, where K is the number of bytes being added. In the worst case (reallocation), it is **O(N + K)**, but Python uses an over-allocation strategy (growth factor) that amortizes this cost, making it significantly faster on average.
- **Concatenation (`+`)**: It is always **O(N + K)** because it must copy the existing `N` bytes every single time. As the `bytearray` grows larger (e.g., millions of items in a backup), this leads to **O(N²)** total time complexity across multiple additions, as you are repeatedly copying an ever-growing buffer.

#### 3. Memory Pressure and Garbage Collection
- Concatenation briefly requires memory for **both** the old buffer and the new buffer simultaneously before the old one is garbage collected. This increases the peak memory usage of the process.
- `extend()` is more memory-efficient as it minimizes the need for multiple large allocations and relies on the underlying memory manager's ability to resize buffers efficiently.

In the context of `borg mount`, where `self.meta` can grow to be many megabytes or even gigabytes for very large repositories, using concatenation causes a noticeable slowdown as the number of archives or files increases, whereas `extend()` remains performant.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant