Summary
The UI server hard-couples several code paths to hub.docker.com (the metadata API, not the registry), which is fragile in any deployment where outbound to Docker Hub is rate-limited, slow, or restricted — and is dead weight in Fusion-only deployments that never run CDC sync jobs.
I'm running OLake on Kubernetes purely as the Iceberg maintenance layer (Fusion + Spark optimizer) against an existing Polaris catalog. No olakeWorker CDC, no source / destination connectors. Observed three classes of issue:
1. Continuous background polling generates log noise
Every ~30s the UI polls Docker Hub for tags on every CDC source connector image, even when CDC is unused:
2026-05-21T15:41:22Z WRN failed to fetch image tags online for olakego/source-mysql:
docker hub api request failed with status code: 504.
Cached fallback unavailable on Kubernetes (no Docker daemon)
2026-05-21T15:41:52Z WRN failed to fetch image tags online for olakego/source-postgres: …
2026-05-21T15:42:22Z WRN failed to fetch image tags online for olakego/source-oracle: …
2026-05-21T15:42:52Z WRN failed to fetch image tags online for olakego/source-mongodb: …
… (also kafka, s3, db2, mssql)
Eight connectors × ~30s polling = ~16 calls/min hitting Docker Hub. With the (very strict) anonymous metadata-API rate limits, these mostly 504 and just generate continuous warnings.
The Cached fallback unavailable on Kubernetes (no Docker daemon) text is also confusing — it implies a feature is missing on K8s rather than that the deployment doesn't need it.
2. User-facing endpoints become slow / fail when Docker Hub throttles
When the rate limit is in effect, user-facing endpoints that also call Docker Hub become very slow or 500:
[GIN] 2026/05/21 - 15:36:48 | 500 | 4m0s | GET /api/v1/project/123/destinations/versions?type=iceberg
[GIN] 2026/05/21 - 15:38:55 | 500 | 30.1s | GET /api/v1/project/123/sources/versions?type=mongodb
[GIN] 2026/05/21 - 15:40:49 | 500 | 4m0s | GET /api/v1/project/123/destinations/versions?type=iceberg
[GIN] 2026/05/21 - 20:58:23 | 200 | 15.58s | POST /api/v1/project/123/destinations/spec
POST /destinations/spec (server/internal/services/etl/destination.go:264) calls utils.GetDriverImageTags(...) before doing the actual work — even though the existing // TODO: cache spec in db for each version (line 263) acknowledges this.
The four-minute hangs are particularly bad — the user clicks a tab and the UI just spins until they assume something's broken.
3. No way to disable connector discovery
There's no config flag to disable Docker Hub polling, and useStandardResources: true / K8s mode doesn't gate it. Deployments that only use Fusion (or pin specific connector versions, or run air-gapped against a private registry mirror) get no benefit from the polling but pay the full cost.
Suggested fixes (in order of value)
-
Add a connectors.discoveryEnabled (or similar) chart/server config flag that disables the background polling and the eager Docker Hub calls in /spec, /versions, and /releases. Default true for backwards compat; set to false for Fusion-only deployments. Probably worth defaulting false when running on K8s where the "Docker daemon cached fallback" path doesn't apply anyway.
-
Back off on 504s instead of polling every 30s. Exponential backoff after a few consecutive failures would reduce log noise dramatically without any feature change.
-
Cache driver specs in postgres (the existing TODO at destination.go:263). One-shot fetch on first use; subsequent calls served from DB. This makes the user-facing slowness a one-time cost per version and survives Docker Hub being down.
-
Allow pointing connector discovery at a custom registry/mirror. Useful for air-gapped or rate-limit-managed deployments — same shape as image.repository overrides but for the discovery API. (Helm chart side; would need server changes too.)
-
Reword the K8s warning. Cached fallback unavailable on Kubernetes (no Docker daemon) is technically accurate but reads like a deficiency. Something like Docker Hub query failed; no local cache available in K8s mode — connector discovery unavailable, this is expected if connectors are unused would be less alarming.
I'm happy to send a PR for (1) — gate the polling and the eager /spec and /versions Docker-Hub calls behind a config flag — if there's interest.
Environment
olake/olake Helm chart 0.0.18 on EKS
- Fusion enabled (
fusion.enabled: true), no CDC source/destination configured
- Polaris (Snowflake-hosted) as the Iceberg REST catalog
- Outbound to
registry-1.docker.io (image pulls) works; outbound to hub.docker.com (metadata API) intermittently 504s
Summary
The UI server hard-couples several code paths to
hub.docker.com(the metadata API, not the registry), which is fragile in any deployment where outbound to Docker Hub is rate-limited, slow, or restricted — and is dead weight in Fusion-only deployments that never run CDC sync jobs.I'm running OLake on Kubernetes purely as the Iceberg maintenance layer (Fusion + Spark optimizer) against an existing Polaris catalog. No
olakeWorkerCDC, no source / destination connectors. Observed three classes of issue:1. Continuous background polling generates log noise
Every ~30s the UI polls Docker Hub for tags on every CDC source connector image, even when CDC is unused:
Eight connectors × ~30s polling = ~16 calls/min hitting Docker Hub. With the (very strict) anonymous metadata-API rate limits, these mostly 504 and just generate continuous warnings.
The
Cached fallback unavailable on Kubernetes (no Docker daemon)text is also confusing — it implies a feature is missing on K8s rather than that the deployment doesn't need it.2. User-facing endpoints become slow / fail when Docker Hub throttles
When the rate limit is in effect, user-facing endpoints that also call Docker Hub become very slow or 500:
POST /destinations/spec(server/internal/services/etl/destination.go:264) callsutils.GetDriverImageTags(...)before doing the actual work — even though the existing// TODO: cache spec in db for each version(line 263) acknowledges this.The four-minute hangs are particularly bad — the user clicks a tab and the UI just spins until they assume something's broken.
3. No way to disable connector discovery
There's no config flag to disable Docker Hub polling, and
useStandardResources: true/ K8s mode doesn't gate it. Deployments that only use Fusion (or pin specific connector versions, or run air-gapped against a private registry mirror) get no benefit from the polling but pay the full cost.Suggested fixes (in order of value)
Add a
connectors.discoveryEnabled(or similar) chart/server config flag that disables the background polling and the eager Docker Hub calls in/spec,/versions, and/releases. Defaulttruefor backwards compat; set tofalsefor Fusion-only deployments. Probably worth defaultingfalsewhen running on K8s where the "Docker daemon cached fallback" path doesn't apply anyway.Back off on 504s instead of polling every 30s. Exponential backoff after a few consecutive failures would reduce log noise dramatically without any feature change.
Cache driver specs in postgres (the existing TODO at
destination.go:263). One-shot fetch on first use; subsequent calls served from DB. This makes the user-facing slowness a one-time cost per version and survives Docker Hub being down.Allow pointing connector discovery at a custom registry/mirror. Useful for air-gapped or rate-limit-managed deployments — same shape as
image.repositoryoverrides but for the discovery API. (Helm chart side; would need server changes too.)Reword the K8s warning.
Cached fallback unavailable on Kubernetes (no Docker daemon)is technically accurate but reads like a deficiency. Something likeDocker Hub query failed; no local cache available in K8s mode — connector discovery unavailable, this is expected if connectors are unusedwould be less alarming.I'm happy to send a PR for (1) — gate the polling and the eager
/specand/versionsDocker-Hub calls behind a config flag — if there's interest.Environment
olake/olakeHelm chart0.0.18on EKSfusion.enabled: true), no CDC source/destination configuredregistry-1.docker.io(image pulls) works; outbound tohub.docker.com(metadata API) intermittently 504s