Infinite loop in ttrpc on OOM

### Description

I encountered an out-of-memory error:
```
time="2025-10-03T07:34:52Z" level=error msg="error receiving message" error="failed to discard after receiving oversized message: cannot allocate memory"
```

After this error occurs, the process appears to get stuck in an infinite loop.

From what I can tell, the error message is printed [here](https://github.com/containerd/containerd/blob/v1.7.24/vendor/github.com/containerd/ttrpc/server.go#L555). Based on the code, the underlying error seems to originate from ch.recv ([code](https://github.com/containerd/containerd/blob/v1.7.24/vendor/github.com/containerd/ttrpc/server.go#L380)).

The goroutine ([code](https://github.com/containerd/containerd/blob/v1.7.24/vendor/github.com/containerd/ttrpc/server.go#L369)) responsible for receiving messages exits when ch.recv returns an error. However, the run function itself does not exit when it receives an unexpected error from ch.recv ([code](https://github.com/containerd/containerd/blob/v1.7.24/vendor/github.com/containerd/ttrpc/server.go#L556)). Instead, it proceeds to the next iteration of the loop.

At this point, the goroutine that normally sends events into either of the channels ([here](https://github.com/containerd/containerd/blob/v1.7.24/vendor/github.com/containerd/ttrpc/server.go#L509) and [here](https://github.com/containerd/containerd/blob/v1.7.24/vendor/github.com/containerd/ttrpc/server.go#L545)) has already exited. As a result, the run loop becomes stuck in the select statement ([link](https://github.com/containerd/containerd/blob/v1.7.24/vendor/github.com/containerd/ttrpc/server.go#L508)), since none of the channels will ever receive another value.

### Describe the results you received and expected

**Received**:
After the OOM error occurs, the process gets stuck in an infinite loop and never makes forward progress.

**Expected**:
The process should handle the error gracefully.

### What version of containerd are you using?

containerd github.com/containerd/containerd 1.7.24

### Any other relevant information

A possible solution might be to return after the error is logged [here], https://github.com/containerd/containerd/blob/v1.7.24/vendor/github.com/containerd/ttrpc/server.go#L555), so that the run loop does not continue after the receiving goroutine has exited.

Alternatively, the OOM error could be added to the list of known/handled errors [here](https://github.com/containerd/containerd/blob/v1.7.24/vendor/github.com/containerd/ttrpc/server.go#L550).


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Infinite loop in ttrpc on OOM #12669

Description

Describe the results you received and expected

What version of containerd are you using?

Any other relevant information

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Infinite loop in ttrpc on OOM #12669

Description

Description

Describe the results you received and expected

What version of containerd are you using?

Any other relevant information

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions