Skip to content

Extended network outage causes FUSE operations to block indefinitely — no fail-fast or abort #472

@arnobeutch

Description

@arnobeutch

Environment

Description

When graph.microsoft.com becomes unreachable for an extended period (TCP-level timeout, not just DNS), onedriver enters a retry loop marking the filesystemoffline. During this period, any application that has open FUSE operations (stat, opendir, open, read, write) on the mountpoint is completely blocked at the kernel level — the syscall does not return until onedriver responds.

This means applications freeze with no indication of what is happening, for as long as the outage lasts. In my case, Dolphin (KDE file manager) and other applications (VSCode, Obsidian) hung for ~15 minutes until connectivity was restored.

The Linux kernel provides a mechanism to abort a hung FUSE connection (echo 1 > /sys/fs/fuse/connections//abort), which returns ENOTCONN to all waiting requests. onedriver does not use this mechanism, so there is currently no way to fail-fast when the backend is unreachable.

Reproduction

  1. Mount a OneDrive volume via onedriver
  2. Open an application that accesses the mount (e.g. a file manager)
  3. Block outbound HTTPS to graph.microsoft.com (e.g. via firewall rule or simply during a real network outage)
  4. Attempt to open a directory or file on the mount

Expected: Operations fail quickly with a network error (e.g. EIO, ENOTCONN)
Actual: Application thread hangs indefinitely in the kernel until network is restored

Logs

During the outage the journal shows a new error approximately every 62 seconds, each with context deadline exceeded:

Apr 21 17:30:29 onedriver[...]: ERR Error during delta fetch, marking fs as offline. error="Get "https://graph.microsoft.com/v1.0/me/drive/root/delta?token=...\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"
Apr 21 17:31:31 onedriver[...]: ERR Error during delta fetch, marking fs as offline. error="...context deadline exceeded..."
[... repeated every ~62s until ...]
Apr 21 17:45:43 onedriver[...]: ERR Error during delta fetch, marking fs as offline. error="...read tcp 192.168.1.180:48768->40.126.49.88:443: read: connection timed out"
Apr 21 17:45:46 onedriver[...]: INF Delta fetch success, marking fs as online.

Total outage duration: ~15 minutes. Applications were blocked for the entire duration
Shorter outages (seconds) produce the same symptom but are less noticeable:

Apr 20 11:02:19 onedriver[...]: ERR Error during delta fetch, marking fs as offline. error="...dial tcp: lookup graph.microsoft.com: Temporary failure in name resolution"
Apr 20 11:02:29 onedriver[...]: INF Delta fetch success, marking fs as online.

Suggested improvement

When the filesystem has been marked offline for longer than a configurable threshold (e.g. 10–30 seconds), onedriver could abort the FUSE connection or return errors on pending requests rather than continuing to hold them in the kernel queue. This would allow applications to handle the error gracefully rather than freezing indefinitely.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions