cloud_broker: avoid script name collisions during injection by ballard26 · Pull Request #29511 · redpanda-data/redpanda

ballard26 · 2026-02-03T22:33:52Z

When multiple CloudBroker instances inject scripts with the same name to the same agent node, they can overwrite each other. Use a UUID prefix for unique script names and clean up from the agent node after copying to the pod.

Backports Required

Release Notes

none

When multiple CloudBroker instances inject scripts with the same name to the same agent node, they can overwrite each other. Use a UUID prefix for unique script names and clean up from the agent node after copying to the pod.

Copilot

Pull request overview

This PR prevents CloudBroker instances from overwriting each other's scripts when multiple instances inject scripts with the same name to the same Kubernetes agent node.

Changes:

Add UUID prefix to script names during injection to ensure uniqueness
Clean up temporary script files from the agent node after copying to the pod

Copilot · 2026-02-03T22:34:22Z

tests/rptest/services/cloud_broker.py

-        return
+
+        # Remove script from agent node
+        self._kubeclient._ssh_cmd(["rm", unique_script_name])


The SSH command to remove the script is constructed but never executed. This should be subprocess.check_output(self._kubeclient._ssh_cmd(["rm", unique_script_name])) or similar to actually run the cleanup command.

Suggested change

self._kubeclient._ssh_cmd(["rm", unique_script_name])

_rm_cmd = self._kubeclient._ssh_cmd(["rm", unique_script_name])

subprocess.check_output(_rm_cmd)

self._kubeclient._ssh_cmd(...) does execute the command AFAIK.

StephanDollberg

Why is this copying scripts in the first place? Can't it just run the command via ssh directly?

ballard26 · 2026-02-03T22:54:44Z

Why is this copying scripts in the first place? Can't it just run the command via ssh directly?

Probably to avoid gotchas with quotes and env var expansions when running complex commands across ssh. Didn't really look into it too much. Just saw a potential issue here while checking if it was safe to run some test setup code in parallel.

vbotbuildovich · 2026-02-03T23:43:55Z

CI test results

test results on build#80054

test_class	test_method	test_arguments	test_kind	job_url	test_status	passed	reason	test_history
WriteCachingFailureInjectionE2ETest	test_crash_all	{"use_transactions": false}	integration	https://buildkite.com/redpanda/redpanda/builds/80054#019c25af-fc0a-48ed-a6d0-0a93938be70f	FLAKY	8/11	Test PASSES after retries.No significant increase in flaky rate(baseline=0.1002, p0=0.2645, reject_threshold=0.0100. adj_baseline=0.2714, p1=0.4625, trust_threshold=0.5000)	https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=WriteCachingFailureInjectionE2ETest&test_method=test_crash_all

travisdowns · 2026-02-06T18:46:55Z

Why is this copying scripts in the first place? Can't it just run the command via ssh directly?

@StephanDollberg it was for efficiency as running each command has a long fixed overhead of ~5 seconds.

StephanDollberg · 2026-02-06T18:57:29Z

@StephanDollberg it was for efficiency as running each command has a long fixed overhead of ~5 seconds.

I don't follow this sorry. What does script vs tsh -- foo have to do with how long the command runs?

travisdowns · 2026-02-06T19:34:08Z

I don't follow this sorry. What does script vs tsh -- foo have to do with how long the command runs?

Sorry I was answering the question of why there is a script in the first place: to group many commands together. When you said "why can't it run the command" I thought you mean like running individual commands from the script over tsh.

I guess you are asking why the whole script body isn't sent over tsh to execute immediately in the shell? Not sure. I think the semantics are a bit different doing that (e.g., treatment of failures, comments, etc) but it seems feasible. It's also probably possible to send the script as direct input to bash, e..g, as a heredoc which would probably have the same script execution semantics.

cloud_broker: avoid script name collisions during injection

55ac858

When multiple CloudBroker instances inject scripts with the same name to the same agent node, they can overwrite each other. Use a UUID prefix for unique script names and clean up from the agent node after copying to the pod.

ballard26 requested review from StephanDollberg, Copilot and travisdowns February 3, 2026 22:33

Copilot AI reviewed Feb 3, 2026

View reviewed changes

StephanDollberg approved these changes Feb 3, 2026

View reviewed changes

ballard26 enabled auto-merge February 3, 2026 23:35

ballard26 merged commit 3f46e87 into redpanda-data:dev Feb 3, 2026
20 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

cloud_broker: avoid script name collisions during injection#29511

cloud_broker: avoid script name collisions during injection#29511
ballard26 merged 1 commit intoredpanda-data:devfrom
ballard26:coll-avoid-cb

ballard26 commented Feb 3, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 3, 2026

Uh oh!

ballard26 Feb 3, 2026

Uh oh!

StephanDollberg left a comment

Uh oh!

ballard26 commented Feb 3, 2026 •

edited

Loading

Uh oh!

vbotbuildovich commented Feb 3, 2026

Uh oh!

Uh oh!

travisdowns commented Feb 6, 2026

Uh oh!

StephanDollberg commented Feb 6, 2026

Uh oh!

travisdowns commented Feb 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

	self._kubeclient._ssh_cmd(["rm", unique_script_name])
	_rm_cmd = self._kubeclient._ssh_cmd(["rm", unique_script_name])
	subprocess.check_output(_rm_cmd)

Comments

Conversation

ballard26 commented Feb 3, 2026

Backports Required

Release Notes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

ballard26 Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

StephanDollberg left a comment

Choose a reason for hiding this comment

Uh oh!

ballard26 commented Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vbotbuildovich commented Feb 3, 2026

CI test results

Uh oh!

Uh oh!

travisdowns commented Feb 6, 2026

Uh oh!

StephanDollberg commented Feb 6, 2026

Uh oh!

travisdowns commented Feb 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ballard26 commented Feb 3, 2026 •

edited

Loading