Skip to content

Flux 2.2, image-automation-controller stops reconciling; needed restart #689

Open
@0xStarcat

Description

Making a new issue after seeing this comment in the last issue on this topic: #282 (comment)

Specs

On Flux v2.2 with the image-update-automation pod running the image image-automation-controller:v0.37.0 for image.toolkit.fluxcd.io/v1beta1 automation resources

Description

The issue we saw was the image-automation-controller stopped reconciling for all ImageUpdateAutomation resources in the cluster for ~ 5 hours, then when it resumed stopped reconciling only 1 ImageUpdateAutomation resource out of many in the cluster.

To resolve it, had to delete the image-automation-controller pod and upon restart everything worked fine.

The following telemetry screenshots show CPU usage dropping off entirely, log volume dropping off entirely, and no errors produced in logs. There were no pod restarts / crashloops or any k8s events for the pods.

Screenshot 2024-05-29 at 2 53 47 PM
Screenshot 2024-05-29 at 2 54 30 PM

I've attached a debug/pprof/goroutine?debug=2" profiler to this issue as well.

heap.txt

Please let me know if there's any further details I can provide. Thank you!

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions