Fix hang on abrupt client death by abap34 · Pull Request #580 · aviatesk/JETLS.jl

abap34 · 2026-03-08T17:26:22Z

Summary

Fix a bug where the JETLS process remains running indefinitely when a client such as Neovim terminates without sending the exit notification.

Two hang points in Endpoint are addressed.

Background

In some setups, the client process appears to terminate before the language server returns the ShutdownResponse. This behavior has been observed with Neovim under their default configuration.

In such cases, JETLS is expected to shut down gracefully by monitoring the process Id.

However, two issues in Endpoint prevented this from happening and caused the server process to remain running indefinitely.

Where the server hangs

sequenceDiagram
    participant N as Neovim
    participant R as read_task
    participant S as Server loop<br/>(calls iterate)
    participant W as write_task
    participant M as processId monitor

    N->>S: ShutdownRequest
    S->>W: ShutdownResponse (via out_msg_queue)
    N-xN: Process exits (no exit notification)
    Note over N: stdin/stdout/stderr pipes break

    R-->>R: readlsp returns nothing
    Note over R: Loop exits

    W-->>W: writelsp hits EPIPE
    W-->>W: err_handler writes to stderr
    Note over W: stderr also broken<br/>-> double EPIPE<br/>-> task dies

    S->>S: take!(in_msg_queue)
    Note over S: ❌ Hang 1<br/>Queue still open, blocks forever

    M->>M: Polls every 60s
    M->>S: SelfShutdownNotification

    Note over S: Loop exits -> finally

    S->>S: close(endpoint) -> flush()
    Note over S: ❌ Hang 2<br/>isready stays true (write_task dead)<br/>yield() loops forever

(Both Mermaid charts were initially generated by Claude; I have manually reviewed them for correctness.)

Hang 1: `take!(in_msg_queue)` blocks

When stdin closes, read_task exits its loop, but in_msg_queue remains open. As a result, take!(in_msg_queue) in iterate keeps waiting for a message and blocks indefinitely.

The processId monitor (polling every 60 seconds) eventually resolves this by putting SelfShutdownNotification into the queue, but the server still fails to terminate due to Hang 2.

Hang 2: `flush(endpoint)` loops indefinitely

write_task encounters EPIPE when writing to stdout. The code attempts to recover (catch -> err_handler -> continue), but the default err_handler writes to stderr using @error and display_error.

After the client exits, stderr is also a broken pipe. As a result, err_handler itself throws EPIPE, which is not caught and terminates write_task.

After the server loop exits, finally calls close(endpoint) -> flush(endpoint). The flush implementation waits for the queue to drain:

while isready(out_msg_queue)
    yield()
end

However, isready only checks whether the queue contains data; it does not consider whether a consumer is still alive. Since write_task has already terminated, the queue never drains and the loop runs forever. This appears to be one of the main reasons the server hangs.

Fixes

Commit 1: Send a sentinel to `in_msg_queue`

When read_task exits, it inserts nothing into in_msg_queue.

iterate detects this sentinel and returns nothing, which ends the server loop and proceeds to the finally block.

Commit 2: Detect `write_task` termination in `flush`

The flush loop now checks

istaskdone(endpoint.write_task)

and exits when the consumer task has already terminated. This allows close(endpoint) in the finally block to complete and the server process to exit normally.

The underlying cause of Hang 2 is the unintended termination of write_task due to the double EPIPE in err_handler. That issue could be addressed separately by fixing err_handler. This change instead ensures robustly that flush does not hang regardless of why write_task terminates.

After the fix

Before the fix, the process would remain active every time, but now it closes immediately once the editor is shut down!

sequenceDiagram
    participant N as Neovim
    participant R as read_task
    participant S as Server loop<br/>(calls iterate)
    participant W as write_task

    N->>S: ShutdownRequest
    S->>W: ShutdownResponse (via out_msg_queue)
    N-xN: Process exits (no exit notification)
    Note over N: stdin/stdout/stderr pipes break

    R-->>R: readlsp returns nothing
    Note over R: Loop exits

    R->>S: put!(in_msg_queue, nothing)
    Note over R: ✅ Fix 1: sentinel

    W-->>W: double EPIPE -> task dies

    S->>S: take! returns nothing
    Note over S: Loop exits -> finally

    S->>W: istaskdone? -> true
    Note over S: ✅ Fix 2: flush breaks

    Note over S: Process exits normally

Related issue

This may resolve #561, but there could be other contributing factors, so it may be better to wait before closing it. For example, I noticed that the process can remain when Neovim is closed immediately after startup (this does not seem to occur with other editors).

Notes

Testing every possible scenario is quite challenging, so this has not been verified comprehensively. However, after implementing the fix and using it for a while with Neovim, Helix, and VSCode, I confirmed that processes start, communicate, and terminate correctly.

Feedback from Neovim & other editors would be appreciated.

codecov · 2026-03-08T17:28:57Z

Codecov Report

❌ Patch coverage is 0% with 14 lines in your changes missing coverage. Please review.
✅ Project coverage is 66.14%. Comparing base (9ac7d6c) to head (8ea1de9).
⚠️ Report is 1 commits behind head on master.

Files with missing lines	Patch %	Lines
LSP/src/communication.jl	0.00%	14 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #580      +/-   ##
==========================================
- Coverage   67.88%   66.14%   -1.74%     
==========================================
  Files          51       63      +12     
  Lines        8295     8658     +363     
==========================================
+ Hits         5631     5727      +96     
- Misses       2664     2931     +267

Flag	Coverage Δ
JETLS.jl	`67.88% <ø> (ø)`
LSP.jl	`26.44% <0.00%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

abap34 · 2026-03-12T03:16:09Z

@aviatesk Does this PR work correctly in Zed? I tried it in Helix for a while, and it does not seem to cause any major issues. It also appears that the original problem has been fixed, so it might be fine to merge it.

When the client (e.g. Neovim) terminates without sending an `exit` notification, `read_task` exits its loop but `in_msg_queue` remains open. This causes `take!` in `iterate` to block indefinitely. Apply the same sentinel pattern already used for `out_msg_queue`: `read_task` puts `nothing` into `in_msg_queue` on loop exit, and `iterate` checks for it to terminate the server loop gracefully. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

When a client like Neovim terminates without waiting for the shutdown response, `write_task` dies from EPIPE while messages remain in `out_msg_queue`. Since `isready` only checks whether the queue has data and does not consider consumer liveness, `flush` becomes an infinite busy loop (`yield()` forever). Check `istaskdone(endpoint.write_task)` to break out of the loop when there is no consumer alive to drain the queue. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

aviatesk · 2026-03-12T08:23:31Z

Thanks for the thorough analysis and the fix — I strongly agree with the overall approach. Let's move forward with this.

I've confirmed that the normal server lifecycle (Zed, VSCode) is not affected by these changes.

Since this code is fairly subtle, I'll push a few additional comments and safeguards for future reference. Once I push those, could you take a look and confirm? Then we can merge.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

aviatesk · 2026-03-12T08:37:36Z

LSP/src/communication.jl

+            # the server loop hangs forever when the input stream closes.
+            # Guard with `isopen` since `close(endpoint)` may have already
+            # closed the channel during normal shutdown.
+            isopen(in_msg_queue) && put!(in_msg_queue, nothing)


When read_task exits during normal shutdown, close(endpoint) may have already closed in_msg_queue. Without this guard, put! might throw InvalidStateException on the closed channel. The exception is silently swallowed (since read_task is never waited on), but the guard makes the intent explicit and avoids the unnecessary exception.

LSP/src/communication.jl

aviatesk · 2026-03-12T08:41:06Z

Also, please update CHANGLEOG.md. This is an important enhancement.

aviatesk

Thanks so much for working on this, @abap34 !

abap34 requested a review from aviatesk March 8, 2026 17:26

abap34 force-pushed the abap34/shutdown-carefully branch from e7c84fb to b173bca Compare March 12, 2026 02:37

abap34 and others added 2 commits March 12, 2026 17:04

comm: Add comments and safeguard for sentinel shutdown

25bc059

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

aviatesk force-pushed the abap34/shutdown-carefully branch from b173bca to 25bc059 Compare March 12, 2026 08:34

aviatesk reviewed Mar 12, 2026

View reviewed changes

LSP/src/communication.jl Outdated Show resolved Hide resolved

abap34 and others added 3 commits March 13, 2026 18:51

update CHANGELOG

ff929e6

comm: Remove redundant endpoint status checks and flush method

234192c

commn: Better typing for Endpoint

8ea1de9

aviatesk approved these changes Mar 13, 2026

View reviewed changes

aviatesk merged commit e65429a into master Mar 13, 2026
15 of 18 checks passed

aviatesk deleted the abap34/shutdown-carefully branch March 13, 2026 11:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix hang on abrupt client death#580

Fix hang on abrupt client death#580
aviatesk merged 6 commits intomasterfrom
abap34/shutdown-carefully

abap34 commented Mar 8, 2026 •

edited

Loading

Uh oh!

codecov bot commented Mar 8, 2026 •

edited

Loading

Uh oh!

abap34 commented Mar 12, 2026

Uh oh!

aviatesk commented Mar 12, 2026

Uh oh!

aviatesk Mar 12, 2026

Uh oh!

Uh oh!

aviatesk commented Mar 12, 2026

Uh oh!

aviatesk left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

abap34 commented Mar 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Background

Where the server hangs

Hang 1: take!(in_msg_queue) blocks

Hang 2: flush(endpoint) loops indefinitely

Fixes

Commit 1: Send a sentinel to in_msg_queue

Commit 2: Detect write_task termination in flush

After the fix

Related issue

Notes

Uh oh!

codecov bot commented Mar 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

abap34 commented Mar 12, 2026

Uh oh!

aviatesk commented Mar 12, 2026

Uh oh!

aviatesk Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

aviatesk commented Mar 12, 2026

Uh oh!

aviatesk left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

abap34 commented Mar 8, 2026 •

edited

Loading

Hang 1: `take!(in_msg_queue)` blocks

Hang 2: `flush(endpoint)` loops indefinitely

Commit 1: Send a sentinel to `in_msg_queue`

Commit 2: Detect `write_task` termination in `flush`

codecov bot commented Mar 8, 2026 •

edited

Loading