Skip to content

Add StatefulSet support for Trino coordinator and workers#392

Open
hpopuri2 wants to merge 1 commit intotrinodb:mainfrom
hpopuri2:statefulset
Open

Add StatefulSet support for Trino coordinator and workers#392
hpopuri2 wants to merge 1 commit intotrinodb:mainfrom
hpopuri2:statefulset

Conversation

@hpopuri2
Copy link

@hpopuri2 hpopuri2 commented Jan 13, 2026

StatefulSet Support for Trino Helm Chart

Summary

This PR adds StatefulSet support for Trino coordinators and workers while maintaining Deployment as the default behavior.

Changes

Core Changes

  1. New Templates:

    • charts/trino/templates/statefulset-coordinator.yaml - StatefulSet template for coordinator
    • charts/trino/templates/statefulset-worker.yaml - StatefulSet template for workers
  2. Modified Templates:

    • charts/trino/templates/deployment-coordinator.yaml - Added conditional rendering (only when StatefulSet disabled)
    • charts/trino/templates/deployment-worker.yaml - Added conditional rendering (only when StatefulSet disabled)
  3. Configuration (charts/trino/values.yaml):

    • Added coordinator.statefulset configuration block
    • Added worker.statefulset configuration block
    • Default: statefulset.enabled: false (Deployments are default)
    • Supports volumeClaimTemplates for persistent storage
    • Supports pod management policies (OrderedReady/Parallel)
  4. Documentation:

    • charts/trino/README.md - Updated with StatefulSet documentation
    • charts/trino/README.md.gotmpl - Added StatefulSet section
    • charts/trino/STATEFULSET_TESTING.md - Comprehensive testing guide

Test Files (Optional)

  • statefulset-values.yaml - Example values file for StatefulSet mode
  • tests/gateway/test-query-history-values.yaml - Gateway test values

Features

StatefulSet Benefits

  • Persistent storage per pod using volumeClaimTemplates
  • Unique per-pod FQDN (required for Istio STRICT mTLS)
  • Stable pod identities with predictable naming (e.g., trino-worker-0)
  • Ordered or parallel pod management for controlled rollouts

Deployment Mode (Default)

  • ✅ Maintains existing behavior
  • ✅ No breaking changes
  • ✅ Deployments created by default when StatefulSet not enabled

Testing

Test 1: Default Deployment Mode ✅

helm install my-trino charts/trino

Result: Creates Deployments (not StatefulSets), pods have random suffixes, no PVCs

Test 2: StatefulSet Mode ✅

helm install my-trino charts/trino -f statefulset-values.yaml

Result: Creates StatefulSets, pods have ordinal suffixes (-0, -1), PVCs created and bound

Usage Examples

Enable StatefulSet for Workers Only

worker:
  statefulset:
    enabled: true
    volumeClaimTemplates:
      - metadata:
          name: data
        mountPath: /data/trino
        spec:
          accessModes:
            - ReadWriteOnce
          storageClassName: gp3
          resources:
            requests:
              storage: 10Gi

Enable StatefulSet for Both Coordinator and Workers

coordinator:
  statefulset:
    enabled: true
    volumeClaimTemplates:
      - metadata:
          name: data
        mountPath: /data/trino
        spec:
          accessModes:
            - ReadWriteOnce
          storageClassName: gp3
          resources:
            requests:
              storage: 5Gi

worker:
  statefulset:
    enabled: true
    podManagementPolicy: Parallel
    volumeClaimTemplates:
      - metadata:
          name: data
        mountPath: /data/trino
        spec:
          accessModes:
            - ReadWriteOnce
          storageClassName: gp3
          resources:
            requests:
              storage: 10Gi

Backward Compatibility

  • No breaking changes: Existing deployments continue to work
  • Default behavior unchanged: Deployments are still the default
  • Opt-in feature: StatefulSet must be explicitly enabled

Migration Path

Users can migrate from Deployment to StatefulSet by:

  1. Uninstalling existing release
  2. Re-installing with statefulset.enabled: true
  3. Note: volumeClaimTemplates cannot be modified on existing StatefulSets

Files to Review

  • charts/trino/values.yaml - Configuration defaults
  • charts/trino/templates/statefulset-coordinator.yaml - New template
  • charts/trino/templates/statefulset-worker.yaml - New template
  • charts/trino/templates/deployment-coordinator.yaml - Modified with conditionals
  • charts/trino/templates/deployment-worker.yaml - Modified with conditionals
  • charts/trino/README.md - Updated documentation
  • charts/trino/STATEFULSET_TESTING.md - Testing guide

Tested in local :
image

Notes

  • The statefulset-values.yaml in the root directory is an example file for testing purposes and can be excluded from the PR if desired
  • Let me know about readme.MD it should be autogenerated ?

This commit adds support for deploying Trino coordinators and workers
as StatefulSets instead of Deployments, enabling persistent storage and
stable pod identities required for advanced use cases like Istio STRICT mTLS.

Key Features:
- Persistent storage per pod using volumeClaimTemplates
- Unique per-pod FQDN for service mesh compatibility
- Stable pod identities with predictable naming (e.g., trino-worker-0)
- Configurable pod management policies (OrderedReady/Parallel)
- Backward compatible - Deployments remain the default

Implementation:
- Add statefulset-coordinator.yaml template
- Add statefulset-worker.yaml template
- Add conditional rendering to existing deployment templates
- Add statefulset configuration blocks to values.yaml (disabled by default)
- Update README with StatefulSet documentation and examples
- Add STATEFULSET_TESTING.md with comprehensive testing guide

Breaking Changes:
None - StatefulSet support is opt-in via configuration
@cla-bot
Copy link

cla-bot bot commented Jan 13, 2026

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

Copy link
Member

@nineinchnick nineinchnick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the use cases for stateful sets in Trino? How are the pods supposed to recover if they enter into an unknown state, for example, when a Kubernetes node crashes?

@@ -0,0 +1,425 @@
# StatefulSet Testing Guide
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very verbose, I don't really want to read all of this. Testing should be done in tests/trino/test.sh in the first place. Any manual testing should be done as an exception, with a very good reason.

@@ -0,0 +1,319 @@
{{- if .Values.coordinator.statefulset.enabled -}}
{{- $coordinatorJmx := merge .Values.jmx.coordinator (omit .Values.jmx "coordinator" "worker") -}}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this largely duplicates the deployment, I don't want to maintain both. Can you figure out how to deduplicate them?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you mean to have a common template and use them for both deployment and statefulset_deployment .. i think that would deduplicate them ..

@hpopuri2
Copy link
Author

hpopuri2 commented Jan 15, 2026

What are the use cases for stateful sets in Trino? How are the pods supposed to recover if they enter into an unknown state, for example, when a Kubernetes node crashes?

https://trinodb.slack.com/archives/C0305TQ05KL/p1767619191984709 this is onr of the thread where i saw people are trying to achieve it

Primary Use Cases:

Stable Caching: Benefit for Hive/Iceberg metadata caching or disk-based exchange spooling (Fault-Tolerant Execution) where data persists across pod restarts.
Persistent caching in Trino benefits:

  1. Spill-to-disk performance - Large queries can spill intermediate results to local disk
  2. Exchange data caching - Fault-tolerant execution with task retries
  3. Faster restarts - Cache survives pod restarts
  4. Better resource utilization - Reduces memory pressure

StatefulSet is the right choice when:

  • Workers need persistent storage for cache/spill directories
  • You want cache to survive pod restarts/rescheduling
  • Performance matters for large analytical queries

As you know its a config its user choice to enable it or not right based on their use case

Predictable Identity: Simplifies monitoring and debugging when tracking specific worker behavior over time.

On Node Failures & Recovery: You’re right to be skeptical about recovery. If using Local Storage, data is lost on node failure regardless. With Network Storage (EBS/Ceph), the PVC reattaches to a new node, but you trade performance for persistence. Given Trino’s stateless nature, the coordinator simply drops failed workers and retries queries—persistence is a "nice to have" for performance, not a requirement for stability.

And about istio its not that good use case ....

@nineinchnick
Copy link
Member

Thanks, but you have not responded to:

How are the pods supposed to recover if they enter into an unknown state, for example, when a Kubernetes node crashes?

If there's a pod that cannot recover without manual intervention, the Trino cluster will operate at reduced capacity, with fewer worker nodes.

The use cases you described are valid, but I think using stateful sets requires some kind of operator. I haven't used it myself, but it looks like https://github.com/stackabletech/trino-operator uses stateful sets.

@hpopuri2
Copy link
Author

hpopuri2 commented Jan 15, 2026

Thanks, but you have not responded to:

How are the pods supposed to recover if they enter into an unknown state, for example, when a Kubernetes node crashes?

If there's a pod that cannot recover without manual intervention, the Trino cluster will operate at reduced capacity, with fewer worker nodes.

The use cases you described are valid, but I think using stateful sets requires some kind of operator. I haven't used it myself, but it looks like https://github.com/stackabletech/trino-operator uses stateful sets.

You're absolutely right. I agree that StatefulSets require some kind of operator for proper management, especially for handling node failures and recovery scenarios.

When a Kubernetes node crashes, StatefulSet pods can get stuck in unknown states and require manual intervention to recover. Without an operator to handle these scenarios automatically, the operational burden is significant.

Given this limitation and the fact that solutions like the Stackable Trino operator already handle StatefulSets properly with automated recovery, I understand your concern about adding this to the base Helm chart.

would you be open to keeping it as an experimental/opt-in feature for the users with clear documentation about the limitations ?

Appreciate the reference i would check on that stackabletech trino-operator

@nineinchnick
Copy link
Member

would you be open to keeping it as an experimental/opt-in feature for the users with clear documentation about the limitations ?

No, I don't want to maintain extra tests for this if we know about the limitations that can only be solved with an operator.

@hpopuri2
Copy link
Author

would you be open to keeping it as an experimental/opt-in feature for the users with clear documentation about the limitations ?

No, I don't want to maintain extra tests for this if we know about the limitations that can only be solved with an operator.

you mean for making trino to statefulset_deployment without any manual intervention you should use above trino-operator which you referred..

@nineinchnick
Copy link
Member

Sorry, I didn't understand that

@hpopuri2
Copy link
Author

Sorry, I didn't understand that

How can we implement the transition to a StatefulSet in the Trino chart, given the edge case of pods in an 'Unknown' state? Do you believe there is a viable way to merge this change? If we want to support both Deployment and StatefulSet options, is it possible to achieve this using the Trino Operator as you suggested?.

@nineinchnick
Copy link
Member

I already answered this. I don't want to support a feature that's inherently broken. As for the existing Trino operators, you have to try them out yourself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments