docs: document PrivateLoadZone tolerations limitation for runner pods#2187
docs: document PrivateLoadZone tolerations limitation for runner pods#2187bonnywelsford-source wants to merge 3 commits intomainfrom
Conversation
- Add 'Tolerations not supported for PrivateLoadZone' to troubleshooting common scenarios (shared content and main troubleshooting page note). - Clarify that Helm tolerations apply only to controller manager; PLZ CRD has no tolerations field for runner pods, so no automated way when using PLZ. - Add lead-in linking PLZ, tainted nodes, and the need for tolerations. Co-authored-by: Cursor <cursoragent@cursor.com>
|
|
|
💻 Deploy preview available (docs: document PrivateLoadZone tolerations limitation for runner pods): |
heitortsergent
left a comment
There was a problem hiding this comment.
I just left two small comments, I'll leave the technical review to @yorugac. :)
|
|
||
| #### Current limitation | ||
|
|
||
| The PrivateLoadZone CRD does not support tolerations for runner pods, so there is no automated way to add them when using PLZ. |
There was a problem hiding this comment.
| The PrivateLoadZone CRD does not support tolerations for runner pods, so there is no automated way to add them when using PLZ. | |
| The PrivateLoadZone CRD doesn't support tolerations for runner pods, so there is no automated way to add them when using PLZ. |
| This topic includes instructions to help you troubleshoot common issues with the k6 Operator. | ||
|
|
||
| If you’re using Private Load Zones in Grafana Cloud k6, refer to [Troubleshoot Private Load Zones](https://grafana.com/docs/grafana-cloud/testing/k6/author-run/private-load-zone/troubleshoot/). | ||
| If you’re using Private Load Zones in Grafana Cloud k6, refer to [Troubleshoot Private Load Zones](https://grafana.com/docs/grafana-cloud/testing/k6/author-run/private-load-zone/troubleshoot/). **Note:** The PrivateLoadZone CRD does not support tolerations for runner pods; tolerations in Helm values apply only to the controller manager. See [Tolerations not supported for PrivateLoadZone](#tolerations-not-supported-for-privateloadzone) in Common errors if runner pods fail to schedule. |
There was a problem hiding this comment.
| If you’re using Private Load Zones in Grafana Cloud k6, refer to [Troubleshoot Private Load Zones](https://grafana.com/docs/grafana-cloud/testing/k6/author-run/private-load-zone/troubleshoot/). **Note:** The PrivateLoadZone CRD does not support tolerations for runner pods; tolerations in Helm values apply only to the controller manager. See [Tolerations not supported for PrivateLoadZone](#tolerations-not-supported-for-privateloadzone) in Common errors if runner pods fail to schedule. | |
| If you’re using Private Load Zones in Grafana Cloud k6, refer to [Troubleshoot Private Load Zones](https://grafana.com/docs/grafana-cloud/testing/k6/author-run/private-load-zone/troubleshoot/). | |
| {{< admonition type="note" >}} | |
| The PrivateLoadZone CRD does not support tolerations for runner pods; tolerations in Helm values apply only to the controller manager. See [Tolerations not supported for PrivateLoadZone](#tolerations-not-supported-for-privateloadzone) in Common errors if runner pods fail to schedule. | |
| {{< /admonition >}} |
Just a suggestion, but feel free to ignore. I'm wondering if we should use an admonition here instead of **Note:**.
|
Hello 👋 Thanks for looking at this, @bonnywelsford-source, and for tagging me, @heitortsergent! I've checked the escalation and support tickets, and it seems to me like there's a bit of misunderstanding of what needs to be documented 😅 I'm not sure who requested documentation specifically (it doesn't seem to come from the user?), but here's how it looks from my perspective. Firstly, support for tolerations in PLZ is an open issue / FR. We don't usually document what is not yet supported, unless we know it's unlikely to be supported or requires long-term work. Neither is the case for tolerations. In fact, by my current plans, tolerations will likely be added by the next release of k6-operator, in March. If these changes are merged, I'll need to remove them after release. Moreover, tolerations is just one thing among many that can be configured via Kubernetes, and most of those things are not supported in PLZ. I don't think it's a good idea to document each one of them separately. I.e. if tolerations are put into troubleshooting as unsupported, then why not schedulers or volumes or dns configs, etc.? For PLZ specifically, we maintain a whitelist of supported features here, not in troubleshooting page. Maintaining two lists, for supported and unsupported features, seems an overkill to me (given the amount of configuration in Kubernetes). If people don't find that page with whitelist of supported features for some reason, perhaps, we should figure out why? Troubleshooting, IMO, is about helping people to make supported features work in their setup. Finally, looking at the original message from the user in support ticket, the main problem was in thinking that they can configure PLZ via Helm chart. It is actually a common misconception, both from OSS and Cloud users: as you wrote in the PR, @bonnywelsford-source, Helm chart configures only k6-operator app and never the PLZ itself. That is true for any field there, not only tolerations. So perhaps, we could re-make the warning to that instead? I.e. remove tolerations from the picture completely and focus on the main misconception: configuration of the app via Helm values VS configuration of PLZ via CRD definition. WDYT? PS sorry for the essay! It took me some time to waddle through what is happening here fully and I wanted to be as clear as possible with the reasoning 🙂 |
|
Thank you for the review and the explanation. I found it very helpful. I'll take a look at your suggestion and revise the docs. |
…dd note in install and remove tolerations from trbs.
|
Removed all tolerations-specific content from the troubleshooting docs. Troubleshooting is unchanged from main; no “unsupported” or non-pattern content added there. Added a short clarification on the Install k6 Operator page (in the “Deploy with Helm” section): the Helm chart configures only the k6-operator application (e.g. the controller manager), not PrivateLoadZone or TestRun resources—those are configured via their CRD definitions (or, for Grafana Cloud k6, the PLZ spec). I’ve linked to PLZ configuration options as the whitelist of what can be configured on PLZ. |
yorugac
left a comment
There was a problem hiding this comment.
Apologies for the delay! I didn't notice the update in the torrent of GH notifications.
Thank you @bonnywelsford-source! I think this is great and clear. I added a small suggestion, but otherwise LGTM 🙌
docs/sources/k6/next/set-up/set-up-distributed-k6/install-k6-operator.md
Outdated
Show resolved
Hide resolved
…perator.md Co-authored-by: Olha Yevtushenko <yorugac@users.noreply.github.com>
What?
Document that tolerations are not supported for PrivateLoadZone (PLZ) runner pods and where tolerations do apply:
New subsection “Tolerations not supported for PrivateLoadZone” in the shared troubleshooting scenarios (troubleshooting-common-scenarios.md), placed with other scheduling topics (after Non-existent nodeSelector). It explains that when running tests via PLZ, runner pods must schedule on the cluster; if nodes are tainted, those pods need tolerations; Helm values.yaml tolerations apply only to the k6-operator controller manager, not to runner pods; and the PLZ CRD has no tolerations field for runner pods, so there is no automated way to add them when using PLZ.
Note on the main troubleshooting page (set-up-distributed-k6/troubleshooting.md) that the PrivateLoadZone CRD does not support tolerations for runner pods and that Helm tolerations apply only to the controller manager, with a link to the new subsection.
Updates in both docs/sources/k6/next and docs/sources/k6/v1.5.x.
Addresses customer confusion when runner pods fail to schedule (e.g. FailedScheduling on tainted nodes) despite setting tolerations in Helm values.
Checklist
npm startcommand locally and verified that the changes look good.docs/sources/k6/nextfolder of the documentation.docs/sources/k6/v{most_recent_release}folder of the documentation.Related PR(s)/Issue(s)
Customer report: PLZ runner pods not receiving tolerations from Helm values; FailedScheduling on tainted nodes: https://github.com/grafana/support-escalations/issues/20296