Skip to content

docs: document PrivateLoadZone tolerations limitation for runner pods#2187

Open
bonnywelsford-source wants to merge 3 commits intomainfrom
docs/plz-tolerations-limitation
Open

docs: document PrivateLoadZone tolerations limitation for runner pods#2187
bonnywelsford-source wants to merge 3 commits intomainfrom
docs/plz-tolerations-limitation

Conversation

@bonnywelsford-source
Copy link
Contributor

@bonnywelsford-source bonnywelsford-source commented Feb 6, 2026

What?

Document that tolerations are not supported for PrivateLoadZone (PLZ) runner pods and where tolerations do apply:

  • New subsection “Tolerations not supported for PrivateLoadZone” in the shared troubleshooting scenarios (troubleshooting-common-scenarios.md), placed with other scheduling topics (after Non-existent nodeSelector). It explains that when running tests via PLZ, runner pods must schedule on the cluster; if nodes are tainted, those pods need tolerations; Helm values.yaml tolerations apply only to the k6-operator controller manager, not to runner pods; and the PLZ CRD has no tolerations field for runner pods, so there is no automated way to add them when using PLZ.

  • Note on the main troubleshooting page (set-up-distributed-k6/troubleshooting.md) that the PrivateLoadZone CRD does not support tolerations for runner pods and that Helm tolerations apply only to the controller manager, with a link to the new subsection.

Updates in both docs/sources/k6/next and docs/sources/k6/v1.5.x.
Addresses customer confusion when runner pods fail to schedule (e.g. FailedScheduling on tainted nodes) despite setting tolerations in Helm values.

Checklist

  • I have used a meaningful title for the PR.
  • I have described the changes I've made in the "What?" section above.
  • I have performed a self-review of my changes.
  • I have run the npm start command locally and verified that the changes look good.
  • I have made my changes in the docs/sources/k6/next folder of the documentation.
  • I have reflected my changes in the docs/sources/k6/v{most_recent_release} folder of the documentation.
  • I have reflected my changes in the relevant folders of the two previous k6 versions of the documentation (if still applicable to previous versions).

Related PR(s)/Issue(s)

Customer report: PLZ runner pods not receiving tolerations from Helm values; FailedScheduling on tainted nodes: https://github.com/grafana/support-escalations/issues/20296

- Add 'Tolerations not supported for PrivateLoadZone' to troubleshooting
  common scenarios (shared content and main troubleshooting page note).
- Clarify that Helm tolerations apply only to controller manager; PLZ CRD
  has no tolerations field for runner pods, so no automated way when using PLZ.
- Add lead-in linking PLZ, tainted nodes, and the need for tolerations.

Co-authored-by: Cursor <cursoragent@cursor.com>
@CLAassistant
Copy link

CLAassistant commented Feb 6, 2026

CLA assistant check
All committers have signed the CLA.

@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@github-actions
Copy link
Contributor

github-actions bot commented Feb 6, 2026

💻 Deploy preview available (docs: document PrivateLoadZone tolerations limitation for runner pods):

Copy link
Collaborator

@heitortsergent heitortsergent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just left two small comments, I'll leave the technical review to @yorugac. :)


#### Current limitation

The PrivateLoadZone CRD does not support tolerations for runner pods, so there is no automated way to add them when using PLZ.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The PrivateLoadZone CRD does not support tolerations for runner pods, so there is no automated way to add them when using PLZ.
The PrivateLoadZone CRD doesn't support tolerations for runner pods, so there is no automated way to add them when using PLZ.

This topic includes instructions to help you troubleshoot common issues with the k6 Operator.

If you’re using Private Load Zones in Grafana Cloud k6, refer to [Troubleshoot Private Load Zones](https://grafana.com/docs/grafana-cloud/testing/k6/author-run/private-load-zone/troubleshoot/).
If you’re using Private Load Zones in Grafana Cloud k6, refer to [Troubleshoot Private Load Zones](https://grafana.com/docs/grafana-cloud/testing/k6/author-run/private-load-zone/troubleshoot/). **Note:** The PrivateLoadZone CRD does not support tolerations for runner pods; tolerations in Helm values apply only to the controller manager. See [Tolerations not supported for PrivateLoadZone](#tolerations-not-supported-for-privateloadzone) in Common errors if runner pods fail to schedule.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
If you’re using Private Load Zones in Grafana Cloud k6, refer to [Troubleshoot Private Load Zones](https://grafana.com/docs/grafana-cloud/testing/k6/author-run/private-load-zone/troubleshoot/). **Note:** The PrivateLoadZone CRD does not support tolerations for runner pods; tolerations in Helm values apply only to the controller manager. See [Tolerations not supported for PrivateLoadZone](#tolerations-not-supported-for-privateloadzone) in Common errors if runner pods fail to schedule.
If you’re using Private Load Zones in Grafana Cloud k6, refer to [Troubleshoot Private Load Zones](https://grafana.com/docs/grafana-cloud/testing/k6/author-run/private-load-zone/troubleshoot/).
{{< admonition type="note" >}}
The PrivateLoadZone CRD does not support tolerations for runner pods; tolerations in Helm values apply only to the controller manager. See [Tolerations not supported for PrivateLoadZone](#tolerations-not-supported-for-privateloadzone) in Common errors if runner pods fail to schedule.
{{< /admonition >}}

Just a suggestion, but feel free to ignore. I'm wondering if we should use an admonition here instead of **Note:**.

@heitortsergent heitortsergent added the Area:operator k6-operator label Feb 6, 2026
@yorugac
Copy link
Contributor

yorugac commented Feb 9, 2026

Hello 👋 Thanks for looking at this, @bonnywelsford-source, and for tagging me, @heitortsergent!

I've checked the escalation and support tickets, and it seems to me like there's a bit of misunderstanding of what needs to be documented 😅 I'm not sure who requested documentation specifically (it doesn't seem to come from the user?), but here's how it looks from my perspective.

Firstly, support for tolerations in PLZ is an open issue / FR. We don't usually document what is not yet supported, unless we know it's unlikely to be supported or requires long-term work. Neither is the case for tolerations. In fact, by my current plans, tolerations will likely be added by the next release of k6-operator, in March. If these changes are merged, I'll need to remove them after release.

Moreover, tolerations is just one thing among many that can be configured via Kubernetes, and most of those things are not supported in PLZ. I don't think it's a good idea to document each one of them separately. I.e. if tolerations are put into troubleshooting as unsupported, then why not schedulers or volumes or dns configs, etc.?

For PLZ specifically, we maintain a whitelist of supported features here, not in troubleshooting page. Maintaining two lists, for supported and unsupported features, seems an overkill to me (given the amount of configuration in Kubernetes). If people don't find that page with whitelist of supported features for some reason, perhaps, we should figure out why? Troubleshooting, IMO, is about helping people to make supported features work in their setup.

Finally, looking at the original message from the user in support ticket, the main problem was in thinking that they can configure PLZ via Helm chart. It is actually a common misconception, both from OSS and Cloud users: as you wrote in the PR, @bonnywelsford-source, Helm chart configures only k6-operator app and never the PLZ itself. That is true for any field there, not only tolerations. So perhaps, we could re-make the warning to that instead? I.e. remove tolerations from the picture completely and focus on the main misconception: configuration of the app via Helm values VS configuration of PLZ via CRD definition. WDYT?

PS sorry for the essay! It took me some time to waddle through what is happening here fully and I wanted to be as clear as possible with the reasoning 🙂

@bonnywelsford-source
Copy link
Contributor Author

Thank you for the review and the explanation. I found it very helpful. I'll take a look at your suggestion and revise the docs.

…dd note in install and remove tolerations from trbs.
@bonnywelsford-source
Copy link
Contributor Author

Removed all tolerations-specific content from the troubleshooting docs. Troubleshooting is unchanged from main; no “unsupported” or non-pattern content added there.

Added a short clarification on the Install k6 Operator page (in the “Deploy with Helm” section): the Helm chart configures only the k6-operator application (e.g. the controller manager), not PrivateLoadZone or TestRun resources—those are configured via their CRD definitions (or, for Grafana Cloud k6, the PLZ spec). I’ve linked to PLZ configuration options as the whitelist of what can be configured on PLZ.

Copy link
Contributor

@yorugac yorugac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies for the delay! I didn't notice the update in the torrent of GH notifications.

Thank you @bonnywelsford-source! I think this is great and clear. I added a small suggestion, but otherwise LGTM 🙌

…perator.md

Co-authored-by: Olha Yevtushenko <yorugac@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Area:operator k6-operator

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants