fix(filesystem): resolve provider activation on registration by cdamus · Pull Request #17625 · eclipse-theia/theia

cdamus · 2026-06-04T20:11:44Z

What it does

Hardens FileService.activateProvider against a dangling onWillActivateFileSystemProvider listener, which can wedge frontend startup indefinitely.

Fixes #17506

activateProvider tied a scheme's activation to the settlement of WaitUntilEvent.fire(...), i.e. Promise.all of every listener's waitUntil promise. If any one of those promises never settles (the user-storage emitter carries several listeners), the activation, and everything awaiting it, hangs forever. In #17506 this stalls all four User-scope preference providers → UserConfigsPreferenceProvider.ready → PreferenceServiceImpl.initializeProviders → FrontendApplication.start, so the workbench never comes up. As the reporter found, even registering the provider by hand against the live container did not help: the activations deferred stayed pending.

This PR is split into two independently reviewable commits:

resolve provider activation on registration (core fix, low risk): resolve the pending activation as soon as a provider is registered for the scheme. This decouples activation completion from unrelated listeners and restores registration as a recovery path.
time out provider activation as a backstop (deferrable): if no provider is registered within a timeout, reject the activation so callers fail fast (e.g. readPreferencesFromFile treats the file as absent and startup proceeds degraded) instead of hanging, and remove the scheme from activations so a later attempt can retry once the connection recovers. The default timeout (DEFAULT_PROVIDER_ACTIVATION_TIMEOUT, 90s, overridable via the protected getActivationTimeout()) is not less than the websocket heartbeat detection window (checkAliveTimeout 30s + pingTimeout 60s), so a genuinely dropped connection is detected and rejects in-flight RPCs before this
fires. Activation only ever awaits constant-time backend calls (capability handshake, config-directory lookup), so the timeout cannot abort legitimate long-running work.

Caution

The second commit is a containment/blast-radius measure; the underlying dangling-promise cause (a lost RPC reply on a connection that still appears alive) belongs to the family addressed by #17334 and is left as a follow-up. Reviewers may choose to take only the first commit.

How to test

The root-cause hang is timing-dependent and not reliably reproducible in the example apps, so verification is via the new automated tests, which fail without the fix (TDD).

To gut-check the first test against the bug, revert commit 1 and confirm it goes from passing to a 'PENDING' assertion failure.

Follow-ups

the true root cause is a waitUntil/RPC promise that never settles on a still-open connection (lost reply, detached continuation), the same class fix(core): guarantee promise rejection on failure to send RPC call #17334 began addressing. Worth a dedicated issue to audit remaining RPC loss points (reply decode failure, a reply routed to an already-closed multiplexer sub-channel) and reject the pending request at the point of loss.
a general per-request RPC timeout was considered and rejected: it can be made efficient (a single self-cancelling sweep per active protocol, no per-request timers) but not correct, since a pure age-based timeout cannot distinguish a lost reply from a legitimately long-running backend call and would break such APIs. The scoped activation timeout here is safe precisely because its dependencies are constant-time.

Breaking changes

This PR introduces breaking changes and requires careful review. If yes, the breaking changes section in the changelog has been updated.

Attribution

Review checklist

As an author, I have thoroughly tested my changes and carefully followed the review guidelines
User-facing text is internationalized using the nls service (for details, please see the Internationalization/Localization section in the Coding Guidelines)

Reminder for reviewers

As a reviewer, I agree to behave in accordance with the review guidelines

FileService::activateProvider coupled the resolution of a scheme's activation to the settlement of every onWillActivateFileSystemProvider listener (via WaitUntilEvent.fire / Promise.all(waitables)). A single listener whose waitUntil promise never settles therefore wedged the activation, and everything awaiting it, forever. With the user-storage emitter carrying several listeners, an unrelated dangling listener could hang preference loading and the whole frontend startup. Registering the provider directly did not help: the activations deferred stayed pending. Resolve the pending activation as soon as a provider is registered for the scheme, decoupling activation completion from unrelated listeners and restoring registration as a recovery path. Fixes #17506 Signed-off-by: Christian W. Damus <cdamus@eclipsesource.com>

Even with activation resolving on provider registration, an activation whose provider never registers (its own onWillActivateFileSystemProvider listener dangles, e.g. a lost RPC reply on a connection that still seems to be alive) would hang forever, blocking preference loading and frontend startup with no recovery. Add a timeout backstop to FileService.activateProvider: if no provider is registered for the scheme within the timeout, reject the activation so that callers fail fast (e.g. readPreferencesFromFile treats the file as absent and startup proceeds degraded) rather than hanging. On rejection the scheme is removed from activations so a later attempt can retry once the connection recovers. The default timeout (DEFAULT_PROVIDER_ACTIVATION_TIMEOUT, 90s, overridable via getActivationTimeout) is not less than the websocket heartbeat detection window (checkAliveTimeout 30s + pingTimeout 60s), so a real disconnect is detected and rejects in-flight RPCs before this fires. Activation only awaits constant-time backend calls, so the timeout cannot abort legitimate long-running work. For #17506 Signed-off-by: Christian W. Damus <cdamus@eclipsesource.com>

colin-grant-work · 2026-06-04T21:41:52Z

+ * The default time, in milliseconds, after which {@link FileService.activateProvider} gives up waiting
+ * for a provider to be registered for a scheme and rejects the activation instead of hanging forever.
+ */
+export const DEFAULT_PROVIDER_ACTIVATION_TIMEOUT = 90_000;


This seems like longer than any activation should plausibly take. Perhaps, given the suggested cause, it should be tied to how long the application waits before timing out a connection or declaring disconnection?

That's what this effectively does: 90s is the expected maximum time it will take for Theia to abandon a stuck RCP socket. I didn't want to add API for this service to get the internal timeout parameter from the RPC service but maybe that would be better.

I'm still hoping that this commit isn't needed at all by the OP.

cdamus added 2 commits June 4, 2026 16:03

github-project-automation Bot added this to PR Backlog Jun 4, 2026

github-project-automation Bot moved this to Waiting on reviewers in PR Backlog Jun 4, 2026

cdamus mentioned this pull request Jun 4, 2026

FileService activateProvider('user-storage') hangs during frontend startup in 1.71 #17506

Open

colin-grant-work reviewed Jun 4, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(filesystem): resolve provider activation on registration#17625

fix(filesystem): resolve provider activation on registration#17625
cdamus wants to merge 2 commits into
masterfrom
issue/17506-activate-provider-resilience

cdamus commented Jun 4, 2026

Uh oh!

colin-grant-work Jun 4, 2026

Uh oh!

cdamus Jun 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

cdamus commented Jun 4, 2026

What it does

How to test

Follow-ups

Breaking changes

Attribution

Review checklist

Reminder for reviewers

Uh oh!

colin-grant-work Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

cdamus Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants