What you would like to be added?
The TrainJobStatus server introduced in #3227 runs in the same process as the controller manager but is not registered with its /readyz and /healthz probes.
The status server should implement healthz.Checker and register via mgr.AddHealthzCheck and mgr.AddReadyzCheck in pkg/statusserver/setup.go.
Discussed in #3227 and flagged as a follow-up by @andreyvelich.
/kind feature
/area controller
Why is this needed?
Training pods have no way to verify the status server is ready before sending their first update, causing silent failures if the server is still initializing TLS or the OIDC provider. If the server crashes mid-job, the controller pod stays Running with no signal from the existing liveness probe.
Wiring into readyz/healthz is a low-risk pattern already used by the webhook server in the same process.
Love this feature?
Give it a 👍 We prioritize the features with most 👍