Add --pause-after-job flag to pause agent between jobs instead of disconnecting#3753
Draft
gregmagolan wants to merge 1 commit intobuildkite:mainfrom
Draft
Add --pause-after-job flag to pause agent between jobs instead of disconnecting#3753gregmagolan wants to merge 1 commit intobuildkite:mainfrom
gregmagolan wants to merge 1 commit intobuildkite:mainfrom
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a new
--pause-after-jobconfiguration option that pauses the agent after completing a job instead of disconnecting. The agent remains connected and pinging Buildkite, preserving its identity in the UI (job history affinity), and can be resumed via the API when ready for the next job.This is useful for orchestrators that need to run health checks or maintenance between jobs without losing the agent's connection and history. With
--disconnect-after-job, each reconnection creates a new agent instance in the Buildkite UI, losing the association between jobs run on the same machine.Motivation
When using
--disconnect-after-job, the agent disconnects and reconnects after every job. Each reconnection registers as a new agent in the Buildkite UI, which means:With
--pause-after-job, the agent stays connected and simply pauses itself viaPOST /pauseafter finishing a job. An external orchestrator (or the Buildkite API) can then resume the agent when ready, and it will accept the next job withranJobreset — all under the same agent identity.Changes
agent/agent_configuration.go: AddedPauseAfterJob boolfield toAgentConfigurationclicommand/agent_start.go: Added--pause-after-jobCLI flag withBUILDKITE_AGENT_PAUSE_AFTER_JOBenv var, feature reporting, mutual exclusion validation with--disconnect-after-job, and config transferagent/agent_worker_action.go:ignoreAgentInDispatcheswhenPauseAfterJobis enabled (same asDisconnectAfterJob)POST /pauseAPI instead of disconnectingranJobonly forPauseAfterJobagents so they can accept the next jobagent/agent_worker_test.go: AddedTestAgentWorker_PauseAfterJobintegration testagent/fake_api_server_test.go: AddedPOST /pausehandler andPauseCallscounter toFakeAgentTest plan
TestAgentWorker_PauseAfterJobtest passes — verifies job runs, agent self-pauses via API, resumes correctly, and stays connectedTestAgentWorker_DisconnectAfterJob_Start_Pause_Unpausestill passes —ranJobreset is gated onPauseAfterJobonlyTestAgentWorker_Streaming_DisconnectAfterJob_Start_Pause_Unpausestill passesTestAgentWorker_DisconnectAfterUptimestill passes