Skip to content

Infinite loop in ttrpc on OOM #12669

@Merkyrio

Description

@Merkyrio

Description

I encountered an out-of-memory error:

time="2025-10-03T07:34:52Z" level=error msg="error receiving message" error="failed to discard after receiving oversized message: cannot allocate memory"

After this error occurs, the process appears to get stuck in an infinite loop.

From what I can tell, the error message is printed here. Based on the code, the underlying error seems to originate from ch.recv (code).

The goroutine (code) responsible for receiving messages exits when ch.recv returns an error. However, the run function itself does not exit when it receives an unexpected error from ch.recv (code). Instead, it proceeds to the next iteration of the loop.

At this point, the goroutine that normally sends events into either of the channels (here and here) has already exited. As a result, the run loop becomes stuck in the select statement (link), since none of the channels will ever receive another value.

Describe the results you received and expected

Received:
After the OOM error occurs, the process gets stuck in an infinite loop and never makes forward progress.

Expected:
The process should handle the error gracefully.

What version of containerd are you using?

containerd github.com/containerd/containerd 1.7.24

Any other relevant information

A possible solution might be to return after the error is logged [here], https://github.com/containerd/containerd/blob/v1.7.24/vendor/github.com/containerd/ttrpc/server.go#L555), so that the run loop does not continue after the receiving goroutine has exited.

Alternatively, the OOM error could be added to the list of known/handled errors here.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    Status

    Todo

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions