Skip to content

Fix hang on abrupt client death#580

Merged
aviatesk merged 6 commits intomasterfrom
abap34/shutdown-carefully
Mar 13, 2026
Merged

Fix hang on abrupt client death#580
aviatesk merged 6 commits intomasterfrom
abap34/shutdown-carefully

Conversation

@abap34
Copy link
Collaborator

@abap34 abap34 commented Mar 8, 2026

Summary

Fix a bug where the JETLS process remains running indefinitely when a client such as Neovim terminates without sending the exit notification.

Two hang points in Endpoint are addressed.

Background

In some setups, the client process appears to terminate before the language server returns the ShutdownResponse. This behavior has been observed with Neovim under their default configuration.

In such cases, JETLS is expected to shut down gracefully by monitoring the process Id.

However, two issues in Endpoint prevented this from happening and caused the server process to remain running indefinitely.

Where the server hangs

sequenceDiagram
    participant N as Neovim
    participant R as read_task
    participant S as Server loop<br/>(calls iterate)
    participant W as write_task
    participant M as processId monitor

    N->>S: ShutdownRequest
    S->>W: ShutdownResponse (via out_msg_queue)
    N-xN: Process exits (no exit notification)
    Note over N: stdin/stdout/stderr pipes break

    R-->>R: readlsp returns nothing
    Note over R: Loop exits

    W-->>W: writelsp hits EPIPE
    W-->>W: err_handler writes to stderr
    Note over W: stderr also broken<br/>-> double EPIPE<br/>-> task dies

    S->>S: take!(in_msg_queue)
    Note over S: ❌ Hang 1<br/>Queue still open, blocks forever

    M->>M: Polls every 60s
    M->>S: SelfShutdownNotification

    Note over S: Loop exits -> finally

    S->>S: close(endpoint) -> flush()
    Note over S: ❌ Hang 2<br/>isready stays true (write_task dead)<br/>yield() loops forever
Loading

(Both Mermaid charts were initially generated by Claude; I have manually reviewed them for correctness.)

Hang 1: take!(in_msg_queue) blocks

When stdin closes, read_task exits its loop, but in_msg_queue remains open. As a result, take!(in_msg_queue) in iterate keeps waiting for a message and blocks indefinitely.

The processId monitor (polling every 60 seconds) eventually resolves this by putting SelfShutdownNotification into the queue, but the server still fails to terminate due to Hang 2.

Hang 2: flush(endpoint) loops indefinitely

write_task encounters EPIPE when writing to stdout. The code attempts to recover (catch -> err_handler -> continue), but the default err_handler writes to stderr using @error and display_error.

After the client exits, stderr is also a broken pipe. As a result, err_handler itself throws EPIPE, which is not caught and terminates write_task.

After the server loop exits, finally calls close(endpoint) -> flush(endpoint). The flush implementation waits for the queue to drain:

while isready(out_msg_queue)
    yield()
end

However, isready only checks whether the queue contains data; it does not consider whether a consumer is still alive. Since write_task has already terminated, the queue never drains and the loop runs forever. This appears to be one of the main reasons the server hangs.

Fixes

Commit 1: Send a sentinel to in_msg_queue

When read_task exits, it inserts nothing into in_msg_queue.

iterate detects this sentinel and returns nothing, which ends the server loop and proceeds to the finally block.

Commit 2: Detect write_task termination in flush

The flush loop now checks

istaskdone(endpoint.write_task)

and exits when the consumer task has already terminated. This allows close(endpoint) in the finally block to complete and the server process to exit normally.

The underlying cause of Hang 2 is the unintended termination of write_task due to the double EPIPE in err_handler. That issue could be addressed separately by fixing err_handler. This change instead ensures robustly that flush does not hang regardless of why write_task terminates.

After the fix

Before the fix, the process would remain active every time, but now it closes immediately once the editor is shut down!

sequenceDiagram
    participant N as Neovim
    participant R as read_task
    participant S as Server loop<br/>(calls iterate)
    participant W as write_task

    N->>S: ShutdownRequest
    S->>W: ShutdownResponse (via out_msg_queue)
    N-xN: Process exits (no exit notification)
    Note over N: stdin/stdout/stderr pipes break

    R-->>R: readlsp returns nothing
    Note over R: Loop exits

    R->>S: put!(in_msg_queue, nothing)
    Note over R: ✅ Fix 1: sentinel

    W-->>W: double EPIPE -> task dies

    S->>S: take! returns nothing
    Note over S: Loop exits -> finally

    S->>W: istaskdone? -> true
    Note over S: ✅ Fix 2: flush breaks

    Note over S: Process exits normally
Loading

Related issue

This may resolve #561, but there could be other contributing factors, so it may be better to wait before closing it. For example, I noticed that the process can remain when Neovim is closed immediately after startup (this does not seem to occur with other editors).

Notes

Testing every possible scenario is quite challenging, so this has not been verified comprehensively. However, after implementing the fix and using it for a while with Neovim, Helix, and VSCode, I confirmed that processes start, communicate, and terminate correctly.

Feedback from Neovim & other editors would be appreciated.

@abap34 abap34 requested a review from aviatesk March 8, 2026 17:26
@codecov
Copy link

codecov bot commented Mar 8, 2026

Codecov Report

❌ Patch coverage is 0% with 14 lines in your changes missing coverage. Please review.
✅ Project coverage is 66.14%. Comparing base (9ac7d6c) to head (8ea1de9).
⚠️ Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
LSP/src/communication.jl 0.00% 14 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #580      +/-   ##
==========================================
- Coverage   67.88%   66.14%   -1.74%     
==========================================
  Files          51       63      +12     
  Lines        8295     8658     +363     
==========================================
+ Hits         5631     5727      +96     
- Misses       2664     2931     +267     
Flag Coverage Δ
JETLS.jl 67.88% <ø> (ø)
LSP.jl 26.44% <0.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@abap34 abap34 force-pushed the abap34/shutdown-carefully branch from e7c84fb to b173bca Compare March 12, 2026 02:37
@abap34
Copy link
Collaborator Author

abap34 commented Mar 12, 2026

@aviatesk Does this PR work correctly in Zed? I tried it in Helix for a while, and it does not seem to cause any major issues. It also appears that the original problem has been fixed, so it might be fine to merge it.

abap34 and others added 2 commits March 12, 2026 17:04
When the client (e.g. Neovim) terminates without sending an `exit`
notification, `read_task` exits its loop but `in_msg_queue` remains
open. This causes `take!` in `iterate` to block indefinitely.

Apply the same sentinel pattern already used for `out_msg_queue`:
`read_task` puts `nothing` into `in_msg_queue` on loop exit, and
`iterate` checks for it to terminate the server loop gracefully.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When a client like Neovim terminates without waiting for the
shutdown response, `write_task` dies from EPIPE while messages
remain in `out_msg_queue`. Since `isready` only checks whether
the queue has data and does not consider consumer liveness,
`flush` becomes an infinite busy loop (`yield()` forever).

Check `istaskdone(endpoint.write_task)` to break out of the loop
when there is no consumer alive to drain the queue.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@aviatesk
Copy link
Owner

Thanks for the thorough analysis and the fix — I strongly agree with the overall approach. Let's move forward with this.

I've confirmed that the normal server lifecycle (Zed, VSCode) is not affected by these changes.

Since this code is fairly subtle, I'll push a few additional comments and safeguards for future reference. Once I push those, could you take a look and confirm? Then we can merge.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@aviatesk aviatesk force-pushed the abap34/shutdown-carefully branch from b173bca to 25bc059 Compare March 12, 2026 08:34
# the server loop hangs forever when the input stream closes.
# Guard with `isopen` since `close(endpoint)` may have already
# closed the channel during normal shutdown.
isopen(in_msg_queue) && put!(in_msg_queue, nothing)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When read_task exits during normal shutdown, close(endpoint) may have already closed in_msg_queue. Without this guard, put! might throw InvalidStateException on the closed channel. The exception is silently swallowed (since read_task is never waited on), but the guard makes the intent explicit and avoids the unnecessary exception.

@aviatesk
Copy link
Owner

Also, please update CHANGLEOG.md. This is an important enhancement.

Copy link
Owner

@aviatesk aviatesk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much for working on this, @abap34 !

@aviatesk aviatesk merged commit e65429a into master Mar 13, 2026
15 of 18 checks passed
@aviatesk aviatesk deleted the abap34/shutdown-carefully branch March 13, 2026 11:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

JETLS keeps running forever at 100% CPU (1 core) after closing Neovim

2 participants