-
Notifications
You must be signed in to change notification settings - Fork 3.7k
Description
Description
I encountered an out-of-memory error:
time="2025-10-03T07:34:52Z" level=error msg="error receiving message" error="failed to discard after receiving oversized message: cannot allocate memory"
After this error occurs, the process appears to get stuck in an infinite loop.
From what I can tell, the error message is printed here. Based on the code, the underlying error seems to originate from ch.recv (code).
The goroutine (code) responsible for receiving messages exits when ch.recv returns an error. However, the run function itself does not exit when it receives an unexpected error from ch.recv (code). Instead, it proceeds to the next iteration of the loop.
At this point, the goroutine that normally sends events into either of the channels (here and here) has already exited. As a result, the run loop becomes stuck in the select statement (link), since none of the channels will ever receive another value.
Describe the results you received and expected
Received:
After the OOM error occurs, the process gets stuck in an infinite loop and never makes forward progress.
Expected:
The process should handle the error gracefully.
What version of containerd are you using?
containerd github.com/containerd/containerd 1.7.24
Any other relevant information
A possible solution might be to return after the error is logged [here], https://github.com/containerd/containerd/blob/v1.7.24/vendor/github.com/containerd/ttrpc/server.go#L555), so that the run loop does not continue after the receiving goroutine has exited.
Alternatively, the OOM error could be added to the list of known/handled errors here.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status