Skip to content

CatalogSource "redhat-operators" stuck in "TRANSIENT_FAILURE" when using OCS dev registry image #14873

@yitzhak12

Description

@yitzhak12

The redhat-operators CatalogSource in openshift-marketplace remained in TRANSIENT_FAILURE instead of becoming
READY.

For OCS/ODF installs, ocs-ci disables the default OperatorHub redhat-operators source and creates a CatalogSource
with the same name that points to the OCS registry image (for example, quay.io/rhceph-dev/ocs-registry:…).
That image is large. The catalog pod takes a long time to start listening on gRPC port 50051. OpenShift’s Operator
Lifecycle Manager (OLM) uses startup probes on that port. If the server is not ready in time, Kubernetes keeps
restarting the container. The CatalogSource then never reports a good connection and shows TRANSIENT_FAILURE.
So the failure is not mainly “bad image pull” or “wrong kubeconfig”; it is startup time vs. how long OLM waits for a
very heavy catalog index.

What helped on the cluster (workaround)

• Using the default Red Hat operator index for redhat-operators again (OperatorHub) made the CatalogSource go to
READY quickly.

Ideas for a proper fix

  1. In ocs-ci / config: set something like grpcPodConfig.memoryTarget on the CatalogSource so the catalog pod
    gets enough memory / GOMEMLIMIT for a big index (helps in many cases).
  2. Design: avoid replacing the real redhat-operators with the OCS image—use a separate CatalogSource name (e.g.
    ocs-catalog) for the OCS registry and point only ODF/OCS subscriptions at it, so the normal Red Hat catalog
    keeps working.
  3. Upstream: if the OCS index still cannot start within OLM’s startup window, that may need OLM (longer tolerance)
    or image (faster startup) changes.

Metadata

Metadata

Assignees

No one assigned

    Labels

    RFERequest For EnhancementenhancementNew feature or requestteam/ecosystemEcosystem team related issues/PRs

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions