SUP-5805: Add shared plugin path, caching, and file locking for plugins when used in agent-stack-k8s#3652
Open
SUP-5805: Add shared plugin path, caching, and file locking for plugins when used in agent-stack-k8s#3652
Conversation
added 2 commits
December 23, 2025 13:42
…ed in agent-stack-k8s
Contributor
Author
|
Tested with EKS cluster using EFS-backed PVC for Job 1: checks to ensure shared plugin path is empty Fix is mainly between Jobs 2 and 3. No race condition between them, and only one of them downloaded the binary by locking the process. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Problem: In Kubernetes environments with ephemeral pods, multiple jobs starting simultaneously would redundantly download the same plugin, wasting time and bandwidth.
Solution: Implemented file locking for shared plugin storage in Kubernetes agents to prevent race conditions during plugin downloads. Added enhanced logging to show:
Context
Linear: SUP-5805
Changes
Core Implementation (
internal/job/plugin.go):openCachedPlugin()helper function to DRY up duplicated cached plugin handling codeacquirePluginLock()with logging for lock acquisition wait statescheckoutPlugin()to support shared plugin storage with file locking whenBUILDKITE_PLUGINS_PATH_INCLUDES_AGENT_NAME=falseTesting
go test ./...). Buildkite employees may check this if the pipeline has run automatically.go tool gofumpt -extra -w .)Disclosures / Credits
I consulted Claude for potential approaches, then wrote the implementation myself