Add StatefulSet support for Trino coordinator and workers#392
Add StatefulSet support for Trino coordinator and workers#392hpopuri2 wants to merge 1 commit intotrinodb:mainfrom
Conversation
This commit adds support for deploying Trino coordinators and workers as StatefulSets instead of Deployments, enabling persistent storage and stable pod identities required for advanced use cases like Istio STRICT mTLS. Key Features: - Persistent storage per pod using volumeClaimTemplates - Unique per-pod FQDN for service mesh compatibility - Stable pod identities with predictable naming (e.g., trino-worker-0) - Configurable pod management policies (OrderedReady/Parallel) - Backward compatible - Deployments remain the default Implementation: - Add statefulset-coordinator.yaml template - Add statefulset-worker.yaml template - Add conditional rendering to existing deployment templates - Add statefulset configuration blocks to values.yaml (disabled by default) - Update README with StatefulSet documentation and examples - Add STATEFULSET_TESTING.md with comprehensive testing guide Breaking Changes: None - StatefulSet support is opt-in via configuration
|
Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla |
nineinchnick
left a comment
There was a problem hiding this comment.
What are the use cases for stateful sets in Trino? How are the pods supposed to recover if they enter into an unknown state, for example, when a Kubernetes node crashes?
| @@ -0,0 +1,425 @@ | |||
| # StatefulSet Testing Guide | |||
There was a problem hiding this comment.
This is very verbose, I don't really want to read all of this. Testing should be done in tests/trino/test.sh in the first place. Any manual testing should be done as an exception, with a very good reason.
| @@ -0,0 +1,319 @@ | |||
| {{- if .Values.coordinator.statefulset.enabled -}} | |||
| {{- $coordinatorJmx := merge .Values.jmx.coordinator (omit .Values.jmx "coordinator" "worker") -}} | |||
There was a problem hiding this comment.
If this largely duplicates the deployment, I don't want to maintain both. Can you figure out how to deduplicate them?
There was a problem hiding this comment.
you mean to have a common template and use them for both deployment and statefulset_deployment .. i think that would deduplicate them ..
https://trinodb.slack.com/archives/C0305TQ05KL/p1767619191984709 this is onr of the thread where i saw people are trying to achieve it Primary Use Cases: Stable Caching: Benefit for Hive/Iceberg metadata caching or disk-based exchange spooling (Fault-Tolerant Execution) where data persists across pod restarts.
StatefulSet is the right choice when:
As you know its a config its user choice to enable it or not right based on their use case Predictable Identity: Simplifies monitoring and debugging when tracking specific worker behavior over time. On Node Failures & Recovery: You’re right to be skeptical about recovery. If using Local Storage, data is lost on node failure regardless. With Network Storage (EBS/Ceph), the PVC reattaches to a new node, but you trade performance for persistence. Given Trino’s stateless nature, the coordinator simply drops failed workers and retries queries—persistence is a "nice to have" for performance, not a requirement for stability. And about istio its not that good use case .... |
|
Thanks, but you have not responded to:
If there's a pod that cannot recover without manual intervention, the Trino cluster will operate at reduced capacity, with fewer worker nodes. The use cases you described are valid, but I think using stateful sets requires some kind of operator. I haven't used it myself, but it looks like https://github.com/stackabletech/trino-operator uses stateful sets. |
You're absolutely right. I agree that StatefulSets require some kind of operator for proper management, especially for handling node failures and recovery scenarios. When a Kubernetes node crashes, StatefulSet pods can get stuck in unknown states and require manual intervention to recover. Without an operator to handle these scenarios automatically, the operational burden is significant. Given this limitation and the fact that solutions like the Stackable Trino operator already handle StatefulSets properly with automated recovery, I understand your concern about adding this to the base Helm chart. would you be open to keeping it as an experimental/opt-in feature for the users with clear documentation about the limitations ? Appreciate the reference i would check on that stackabletech trino-operator |
No, I don't want to maintain extra tests for this if we know about the limitations that can only be solved with an operator. |
you mean for making trino to statefulset_deployment without any manual intervention you should use above trino-operator which you referred.. |
|
Sorry, I didn't understand that |
How can we implement the transition to a StatefulSet in the Trino chart, given the edge case of pods in an 'Unknown' state? Do you believe there is a viable way to merge this change? If we want to support both Deployment and StatefulSet options, is it possible to achieve this using the Trino Operator as you suggested?. |
|
I already answered this. I don't want to support a feature that's inherently broken. As for the existing Trino operators, you have to try them out yourself. |
StatefulSet Support for Trino Helm Chart
Summary
This PR adds StatefulSet support for Trino coordinators and workers while maintaining Deployment as the default behavior.
Changes
Core Changes
New Templates:
charts/trino/templates/statefulset-coordinator.yaml- StatefulSet template for coordinatorcharts/trino/templates/statefulset-worker.yaml- StatefulSet template for workersModified Templates:
charts/trino/templates/deployment-coordinator.yaml- Added conditional rendering (only when StatefulSet disabled)charts/trino/templates/deployment-worker.yaml- Added conditional rendering (only when StatefulSet disabled)Configuration (
charts/trino/values.yaml):coordinator.statefulsetconfiguration blockworker.statefulsetconfiguration blockstatefulset.enabled: false(Deployments are default)volumeClaimTemplatesfor persistent storageDocumentation:
charts/trino/README.md- Updated with StatefulSet documentationcharts/trino/README.md.gotmpl- Added StatefulSet sectioncharts/trino/STATEFULSET_TESTING.md- Comprehensive testing guideTest Files (Optional)
statefulset-values.yaml- Example values file for StatefulSet modetests/gateway/test-query-history-values.yaml- Gateway test valuesFeatures
StatefulSet Benefits
trino-worker-0)Deployment Mode (Default)
Testing
Test 1: Default Deployment Mode ✅
Result: Creates Deployments (not StatefulSets), pods have random suffixes, no PVCs
Test 2: StatefulSet Mode ✅
Result: Creates StatefulSets, pods have ordinal suffixes (-0, -1), PVCs created and bound
Usage Examples
Enable StatefulSet for Workers Only
Enable StatefulSet for Both Coordinator and Workers
Backward Compatibility
Migration Path
Users can migrate from Deployment to StatefulSet by:
statefulset.enabled: truevolumeClaimTemplatescannot be modified on existing StatefulSetsFiles to Review
charts/trino/values.yaml- Configuration defaultscharts/trino/templates/statefulset-coordinator.yaml- New templatecharts/trino/templates/statefulset-worker.yaml- New templatecharts/trino/templates/deployment-coordinator.yaml- Modified with conditionalscharts/trino/templates/deployment-worker.yaml- Modified with conditionalscharts/trino/README.md- Updated documentationcharts/trino/STATEFULSET_TESTING.md- Testing guideTested in local :

Notes
statefulset-values.yamlin the root directory is an example file for testing purposes and can be excluded from the PR if desired