Bug: llama-server won't stop generation when client disconnects during prompt processing

### What happened?

When a client disconnects while llama-server is still processing the prompt (before any token is streamed), the server continues running the generation until completion. This wastes compute and keeps the model busy even though no client is connected to receive the output.


1. Start llama-server with any model.
2. Send a /v1/chat/completions request with a moderately large prompt.
3. Disconnect the client immediately after the request is sent (e.g., terminate curl, close browser tab, cancel HTTP request in client).
4. Observe that llama-server keeps generating tokens until completion even though no client is connected.

### Name and Version

at least for commit: 5f3485c2c251210e09a1216f2cb1c84e02594c62

### What operating system are you seeing the problem on?

Linux

### Relevant log output

```shell

```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Bug: llama-server won't stop generation when client disconnects during prompt processing #1020

What happened?

Name and Version

What operating system are you seeing the problem on?

Relevant log output

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Bug: llama-server won't stop generation when client disconnects during prompt processing #1020

Description

What happened?

Name and Version

What operating system are you seeing the problem on?

Relevant log output

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions