Skip to content

Docker Hub coupling makes the UI unusable for Fusion-only / air-gapped deployments #380

@antoniocali

Description

@antoniocali

Summary

The UI server hard-couples several code paths to hub.docker.com (the metadata API, not the registry), which is fragile in any deployment where outbound to Docker Hub is rate-limited, slow, or restricted — and is dead weight in Fusion-only deployments that never run CDC sync jobs.

I'm running OLake on Kubernetes purely as the Iceberg maintenance layer (Fusion + Spark optimizer) against an existing Polaris catalog. No olakeWorker CDC, no source / destination connectors. Observed three classes of issue:

1. Continuous background polling generates log noise

Every ~30s the UI polls Docker Hub for tags on every CDC source connector image, even when CDC is unused:

2026-05-21T15:41:22Z WRN failed to fetch image tags online for olakego/source-mysql:
  docker hub api request failed with status code: 504.
  Cached fallback unavailable on Kubernetes (no Docker daemon)
2026-05-21T15:41:52Z WRN failed to fetch image tags online for olakego/source-postgres: …
2026-05-21T15:42:22Z WRN failed to fetch image tags online for olakego/source-oracle: …
2026-05-21T15:42:52Z WRN failed to fetch image tags online for olakego/source-mongodb: …
… (also kafka, s3, db2, mssql)

Eight connectors × ~30s polling = ~16 calls/min hitting Docker Hub. With the (very strict) anonymous metadata-API rate limits, these mostly 504 and just generate continuous warnings.

The Cached fallback unavailable on Kubernetes (no Docker daemon) text is also confusing — it implies a feature is missing on K8s rather than that the deployment doesn't need it.

2. User-facing endpoints become slow / fail when Docker Hub throttles

When the rate limit is in effect, user-facing endpoints that also call Docker Hub become very slow or 500:

[GIN] 2026/05/21 - 15:36:48 | 500 |   4m0s    | GET  /api/v1/project/123/destinations/versions?type=iceberg
[GIN] 2026/05/21 - 15:38:55 | 500 |   30.1s   | GET  /api/v1/project/123/sources/versions?type=mongodb
[GIN] 2026/05/21 - 15:40:49 | 500 |   4m0s    | GET  /api/v1/project/123/destinations/versions?type=iceberg
[GIN] 2026/05/21 - 20:58:23 | 200 |  15.58s   | POST /api/v1/project/123/destinations/spec

POST /destinations/spec (server/internal/services/etl/destination.go:264) calls utils.GetDriverImageTags(...) before doing the actual work — even though the existing // TODO: cache spec in db for each version (line 263) acknowledges this.

The four-minute hangs are particularly bad — the user clicks a tab and the UI just spins until they assume something's broken.

3. No way to disable connector discovery

There's no config flag to disable Docker Hub polling, and useStandardResources: true / K8s mode doesn't gate it. Deployments that only use Fusion (or pin specific connector versions, or run air-gapped against a private registry mirror) get no benefit from the polling but pay the full cost.

Suggested fixes (in order of value)

  1. Add a connectors.discoveryEnabled (or similar) chart/server config flag that disables the background polling and the eager Docker Hub calls in /spec, /versions, and /releases. Default true for backwards compat; set to false for Fusion-only deployments. Probably worth defaulting false when running on K8s where the "Docker daemon cached fallback" path doesn't apply anyway.

  2. Back off on 504s instead of polling every 30s. Exponential backoff after a few consecutive failures would reduce log noise dramatically without any feature change.

  3. Cache driver specs in postgres (the existing TODO at destination.go:263). One-shot fetch on first use; subsequent calls served from DB. This makes the user-facing slowness a one-time cost per version and survives Docker Hub being down.

  4. Allow pointing connector discovery at a custom registry/mirror. Useful for air-gapped or rate-limit-managed deployments — same shape as image.repository overrides but for the discovery API. (Helm chart side; would need server changes too.)

  5. Reword the K8s warning. Cached fallback unavailable on Kubernetes (no Docker daemon) is technically accurate but reads like a deficiency. Something like Docker Hub query failed; no local cache available in K8s mode — connector discovery unavailable, this is expected if connectors are unused would be less alarming.

I'm happy to send a PR for (1) — gate the polling and the eager /spec and /versions Docker-Hub calls behind a config flag — if there's interest.

Environment

  • olake/olake Helm chart 0.0.18 on EKS
  • Fusion enabled (fusion.enabled: true), no CDC source/destination configured
  • Polaris (Snowflake-hosted) as the Iceberg REST catalog
  • Outbound to registry-1.docker.io (image pulls) works; outbound to hub.docker.com (metadata API) intermittently 504s

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions