Skip to content

Option to deploy Trino workers as StatefulSet (needed for unique per‑pod FQDN under Istio STRICT mTLS) #355

@YueZhaoDreams

Description

@YueZhaoDreams

Background

  • I'm customizing the build 1.39.1
  • I have a requirement to use namespace peer‑authentication is STRICT

Problem statement

Under STRICT mTLS each Trino worker must be addressed by a DNS name that Envoy can map to a SPIFFE workload identity.
With the current Deployment‑based worker template this is not possible:

  1. spec.hostname is identical across all replicas, so CoreDNS returns multiple IPs for the same name (A round‑robin).
  2. Trino caches whichever IP it resolves first, which may point to the wrong replica or change after a restart.
  3. In some cases the worker advertises its Pod IP; Envoy cannot map an IP to a mesh identity → mTLS handshake fails → coordinator logs
    503 UC: upstream connect error or disconnect before headers.

Attempts to solve this via DestinationRule / PeerAuthentication exceptions either break the STRICT requirement (PERMISSIVE) or introduce plaintext hops.

Why StatefulSet helps

A StatefulSet gives every replica a stable ordinal hostname
<setName>-0<setName>-1, … and CoreDNS publishes one A record per pod:

trino-worker-0.trino-worker-headless.<ns>.svc A 172.20.9.117
trino-worker-1.trino-worker-headless.<ns>.svc A 172.20.9.182

That guarantees a deterministic node.id ⇆ host ⇆ pod‑IP mapping and Envoy
can complete the mTLS handshake without custom traffic policies.

Feature request

  • Add a chart value, e.g.

    worker:
      controller: statefulset   # or "deployment" (default)

which:

Creates a headless Service (clusterIP: None) named <fullname>-worker-governing (or similar).

Generates a StatefulSet for workers when the flag is set, preserving all existing env/volume/lifecycle options from the current Deployment template.

Leaves coordinator and other components unchanged.

Document how to set

-Dnode.id=$(HOSTNAME)
-Dnode.internal-address=$(HOSTNAME).<governing-svc>.$(POD_NAMESPACE).svc.cluster.local

so the chart works in both mesh and non‑mesh clusters.

Questions for the maintainers

  • Is supporting StatefulSet for workers acceptable in the upstream chart?
  • Any objections to the worker.controller switch, or is a separate chart preferred?
  • Are there edge‑cases (autoscaling, KEDA, gracefulShutdown hooks) we should account for when converting the template?

Thanks for your time and for maintaining the Trino Helm charts!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions