fix(8.9): orchestration startup blocks on unreachable secondary-storage schema init (intermittent nosec failures)

## Summary

While fixing the consistent `8.9 - nosec - install - gke` (`noSecondaryStorage`) CI failure (see #6346), a deeper, latent robustness bug surfaced that is worth investigating on its own: **when the orchestration app is configured with a secondary-storage exporter/schema target that is unreachable, the Spring bootstrap blocks on `SchemaManager` "init schema" retries and never binds the `:9600` management port** — so the kubelet startup probe gets `connection refused`, the container is killed and restart-loops, and Helm `--wait` times out.

This is **intermittent**, which is why it hid for a while:
- Run [26953963891](https://github.com/camunda/camunda-platform-helm/actions/runs/26953963891/job/79526281793) (2026-06-04): **passed** in 4m43s with the misconfigured values.
- Runs [26999716632](https://github.com/camunda/camunda-platform-helm/actions/runs/26999716632) and [26998647996](https://github.com/camunda/camunda-platform-helm/actions/runs/26998647996) (2026-06-05): **failed** with `context deadline exceeded`.

## Controlled reproduction (GKE, back-to-back, same cluster)

| | misconfigured (exporter → absent ES) | corrected (no exporter) |
|---|---|---|
| helm install | timeout, FAIL in 15m30s | success in 3m18s |
| broker pods | `0/1 Running`, restarts | `1/1 Running`, 0 restarts |
| startup probe | `:9600 connection refused` ×55 over 14m | healthy |
| `init schema` | retried to attempt 29 | n/a |

Broker signature:
```
io.camunda.search.schema.SchemaManager - Schema creation is enabled. Start Schema management.
RetryDecorator - Retrying operation for 'init schema': attempt 29. Message: Failed to check existence of index ...
io.camunda.zeebe.broker.exporter - Failed to open exporter 'camundaexporter'. Retrying...
Startup probe failed: dial tcp :9600: connect: connection refused
Container orchestration failed startup probe, will be restarted
```

#6346 fixes the immediate CI scenario by not pointing the exporter at a non-existent backend. But the underlying behavior is a bug regardless of that scenario.

## Questions to investigate

1. Should `SchemaManager` init run on the bootstrap/startup path synchronously at all, or should it be async / retried in the background so the management server can bind `:9600` and report an `unhealthy` (503) startup state instead of refusing connections?
2. Should an unreachable/misconfigured secondary storage **fail fast with a clear error** rather than retry indefinitely behind a blocked startup?
3. Is the startup probe budget (`failureThreshold=30`, `period=10s`, `delay=30s` ≈ 330s) appropriate, and should `connection refused` vs `503` be distinguished?
4. Why intermittent? Determine the timing/race that let 2026-06-04 pass with the same config (e.g., schema-check timeout vs probe budget, node CPU at startup).

## Notes
- Likely spans the helm chart (probe config) and the orchestration app (`io.camunda.search.schema.SchemaManager`, exporter open lifecycle) in camunda-monorepo — may need a cross-repo issue/transfer.
- Immediate CI fix: #6346. This issue tracks the robustness/flakiness root cause behind it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(8.9): orchestration startup blocks on unreachable secondary-storage schema init (intermittent nosec failures) #6347

Summary

Controlled reproduction (GKE, back-to-back, same cluster)

Questions to investigate

Notes

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

	misconfigured (exporter → absent ES)	corrected (no exporter)
helm install	timeout, FAIL in 15m30s	success in 3m18s
broker pods	`0/1 Running`, restarts	`1/1 Running`, 0 restarts
startup probe	`:9600 connection refused` ×55 over 14m	healthy
`init schema`	retried to attempt 29	n/a

fix(8.9): orchestration startup blocks on unreachable secondary-storage schema init (intermittent nosec failures) #6347

Description

Summary

Controlled reproduction (GKE, back-to-back, same cluster)

Questions to investigate

Notes

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions