Tentacle can detect Powershell scripts that don't start, and can abandon them.#1200
Tentacle can detect Powershell scripts that don't start, and can abandon them.#1200LukeButters wants to merge 28 commits intomainfrom
Conversation
source/Octopus.Tentacle.Core/Services/Scripts/WorkSpace/BashScriptWorkspace.cs
Outdated
Show resolved
Hide resolved
source/Octopus.Tentacle.Core/Services/Scripts/PowerShellStartupDetection.cs
Outdated
Show resolved
Hide resolved
…ssue so no need to have it for *nix
…ent comment exist
This reverts commit 4300b37.
…nt in the script body
5a599f2 to
5bb0dd6
Compare
| /// but silently stall before executing any script content. This was seen happening because | ||
| /// CrowdStrike prevented the script body from running. | ||
| /// | ||
| /// When this happens, we get no output from the script AND the script is un-killable. |
There was a problem hiding this comment.
Is this because it deviates from what we expect or can we literally not kill the powershell process?
There was a problem hiding this comment.
Tentacle is unable to kill the script, with whatever the standard kill command dotnet uses.
We have taken dumps and seen crowdstrike is in the dump, I never saw the dump myself though.
I suspect the issue is something like powershell calls something that enters the kernel which hangs. Since it is in the kernel it can never be killed.
source/Octopus.Tentacle.Core/Services/Scripts/PowerShellStartup/PowerShellStartupDetection.cs
Show resolved
Hide resolved
source/Octopus.Tentacle.Core/Services/Scripts/PowerShellStartup/PowerShellStartupMonitor.cs
Show resolved
Hide resolved
rhysparry
left a comment
There was a problem hiding this comment.
I'm a little unclear on the cancellation behaviour.
source/Octopus.Tentacle.Contracts/PowerShellStartupDetectionTemplateValues.cs
Show resolved
Hide resolved
source/Octopus.Tentacle.Core/Services/Scripts/WorkSpace/ScriptWorkspace.cs
Show resolved
Hide resolved
source/Octopus.Tentacle.Core/Services/Scripts/WorkSpace/ScriptWorkspace.cs
Show resolved
Hide resolved
| readonly IOctopusFileSystem fileSystem; | ||
| readonly IHomeDirectoryProvider home; | ||
| readonly SensitiveValueMasker sensitiveValueMasker; | ||
| readonly bool useBashWorkspace; |
There was a problem hiding this comment.
Should this be an enum to select a specific workspace type?
| [IntegrationTestTimeout] | ||
| public class PowerShellStartupDetectionTests : IntegrationTest | ||
| { | ||
| static (ScriptServiceV2 service, ScriptWorkspaceFactory workspaceFactory, ScriptStateStoreFactory stateStoreFactory, TemporaryDirectory tempDir) CreateScriptService( |
There was a problem hiding this comment.
Can we replace this with a nested class? It should implement IDisposable for the TemporaryDirectory.
| # Sleep for a long time to simulate PowerShell hanging before reaching detection code | ||
| Start-Sleep -Seconds 3600 | ||
| # TENTACLE-POWERSHELL-STARTUP-DETECTION |
There was a problem hiding this comment.
Does this highlight a risk if we don't inject the startup detection at the top of the script?
There was a problem hiding this comment.
Yes if you don't do this first it is a concern, perhaps the template variable name could reflect that to be self documenting?
|
|
||
| directOutputText.Should().Contain("PowerShell startup detection", "The detection code should have run and reported why it exited"); | ||
|
|
||
| // On Mac/Linux exit codes are unsigned 8-bit, so -47 wraps to 209 |
There was a problem hiding this comment.
Should we be checking for that value then?
|
|
||
| directOutputText.Should().Contain("PowerShell startup detection", "The detection code should have run and reported why it exited"); | ||
|
|
||
| // On Mac/Linux exit codes are unsigned 8-bit, so -47 wraps to 209 |
There was a problem hiding this comment.
As above, should we be checking that value?
Background
When
powershell.exeis invoked to run a script, it can sometimes start the process but never actually begin executing the script body. Additionally the script could never be terminated. This results in hung deployments where we may wait forever for a script that will never complete.Refs: EFT-365
Refs: EFT-3145
Refs: HPY-1295, HPY-1296
Results
This PR introduces a startup detection mechanism that lets Tentacle detect and report when PowerShell never executes the script content.
Scripts that opt in include a special marker comment:
# OCTOPUS-POWERSHELL-STARTUP-DETECTIONWhen Tentacle bootstraps the script, it replaces this marker with generated detection code that:
.octopus_powershell_startedsentinel file.octopus_powershell_should_runfile (written by Tentacle before execution)Concurrently,
RunningScriptruns a monitoring task alongside the script execution task. If the started file isn't created within the timeout window (default: 5 minutes), the monitor concludes PowerShell never executed the script and returns exit code-47(PowerShellNeverStartedExitCode). Additionally tentacle will ensure the power shell body is never executed.Notes
RunningScriptconstructor (tests use a shorter value)powershell.exe);pwshsupport on Linux/Mac is possible as a follow-upFixes https://github.com/OctopusDeploy/OctopusTentacle/issues/... (optional public issue)
When Octopus feature toggle is enabled
Message written to server task log when PowerShell startup detection is enabled
Warning written to Tentacle logs when it takes longer than the configured timeout to start powershell:
How to review this PR
PowerShellStartupDetection.cs(new) — Generates and injects the detection code; manages sentinel file pathsPowerShellStartupStatus.cs(new) — Enum for monitoring outcomes:NotMonitored,Started,NeverStartedRunningScript.cs— MadeExecuteasync; added concurrent startup monitoring taskScriptWorkspace.cs— Injects detection code during bootstrap when the marker comment is present; creates theshould_runfileScriptExitCodes.cs— AddedPowerShellNeverStartedExitCode=-47StartScriptCommandV2.cs/ExecuteShellScriptCommand.cs— AddedDurationToWaitForPowerShellToStartupparameter to allow configurable timeoutPowerShellStartupDetectionTests.cs(new) — Integration tests covering successful detection, failure detection, monitoring timeout, and scripts without the markerQuality ✔️
Pre-requisites