cloud_broker: avoid script name collisions during injection#29511
cloud_broker: avoid script name collisions during injection#29511ballard26 merged 1 commit intoredpanda-data:devfrom
Conversation
When multiple CloudBroker instances inject scripts with the same name to the same agent node, they can overwrite each other. Use a UUID prefix for unique script names and clean up from the agent node after copying to the pod.
There was a problem hiding this comment.
Pull request overview
This PR prevents CloudBroker instances from overwriting each other's scripts when multiple instances inject scripts with the same name to the same Kubernetes agent node.
Changes:
- Add UUID prefix to script names during injection to ensure uniqueness
- Clean up temporary script files from the agent node after copying to the pod
| return | ||
|
|
||
| # Remove script from agent node | ||
| self._kubeclient._ssh_cmd(["rm", unique_script_name]) |
There was a problem hiding this comment.
The SSH command to remove the script is constructed but never executed. This should be subprocess.check_output(self._kubeclient._ssh_cmd(["rm", unique_script_name])) or similar to actually run the cleanup command.
| self._kubeclient._ssh_cmd(["rm", unique_script_name]) | |
| _rm_cmd = self._kubeclient._ssh_cmd(["rm", unique_script_name]) | |
| subprocess.check_output(_rm_cmd) |
There was a problem hiding this comment.
self._kubeclient._ssh_cmd(...) does execute the command AFAIK.
StephanDollberg
left a comment
There was a problem hiding this comment.
Why is this copying scripts in the first place? Can't it just run the command via ssh directly?
Probably to avoid gotchas with quotes and env var expansions when running complex commands across ssh. Didn't really look into it too much. Just saw a potential issue here while checking if it was safe to run some test setup code in parallel. |
CI test resultstest results on build#80054
|
@StephanDollberg it was for efficiency as running each command has a long fixed overhead of ~5 seconds. |
I don't follow this sorry. What does script vs |
Sorry I was answering the question of why there is a script in the first place: to group many commands together. When you said "why can't it run the command" I thought you mean like running individual commands from the script over tsh. I guess you are asking why the whole script body isn't sent over tsh to execute immediately in the shell? Not sure. I think the semantics are a bit different doing that (e.g., treatment of failures, comments, etc) but it seems feasible. It's also probably possible to send the script as direct input to |
When multiple CloudBroker instances inject scripts with the same name to the same agent node, they can overwrite each other. Use a UUID prefix for unique script names and clean up from the agent node after copying to the pod.
Backports Required
Release Notes