-
Notifications
You must be signed in to change notification settings - Fork 54
Description
Occasionally g10k is hanging at control_repo branch iteration. There are 4 child processes, that are just sleeping, looking like a deadlock (checking by ps, pstree, lsof, iotop).
Plarform: Linux amd64
Version: github release 0.9.9
$ g10k --version g10k v0.9.9-1-gfc83c96 Build time: 2024-02-08_11:56:54 UTC
Config file g10k.yaml :
`---
:cachedir: "./g10k-cache"
use_cache_fallback: true
retry_git_commands: true
ignore_unreachable_modules: false
git:
default_ref: main
deploy:
purge_levels: ["deployment", "environment", "puppetfile"]
purge_allowlist: [ '/.gitkeep', '.latest_revision', '.resource_types', 'resource_types/*.pp', '/.pp' ]
deployment_purge_allowlist: [ 'production_s' ]
generate_types: false
sources:
control_repo: &control_repo
remote: "https://private_gitlab/control_repo.git"
basedir: "./environments"
filter_regex: "^(production|staging)$"
force_forge_versions: false
prefix: false
exit_if_unreachable: true
invalid_branches: "correct"
warn_if_branch_is_missing: true
`
Execute like:
g10k -verbose -debug -branch production -config ./g10k.yaml
Logs of problematic execution:
... 2025/01/27 14:02:03 Executing git --git-dir g10k-cache/environments/control_repo.git remote -v took 0.00079s ... 2025/01/27 14:02:04 Executing git --git-dir g10k-cache/environments/control_repo.git remote update --prune took 0.37934s 2025/01/27 14:02:04 DEBUG executeCommand(): Executing git --git-dir g10k-cache/environments/control_repo.git branch 2025/01/27 14:02:04 Executing git --git-dir g10k-cache/environments/control_repo.git branch took 0.00108s 2025/01/27 14:02:04 DEBUG Skipping branch main of source control_repo, because of filter_regex setting 2025/01/27 14:02:04 DEBUG Environment staging of source control_repo does not match branch name filter 'production', skipping 2025/01/27 14:02:04 DEBUG Skipping branch test-branch1 of source control_repo, because of filter_regex setting 2025/01/27 14:02:04 DEBUG Skipping branch test-branch2 of source control_repo, because of filter_regex setting 2025/01/27 14:02:04 DEBUG Skipping branch testing of source control_repo, because of filter_regex setting 2025/01/27 14:02:04 DEBUG 1(): Resolving environment production of source control_repo 2025/01/27 14:02:04 DEBUG Skipping branch testing-branch3 of source control_repo, because of filter_regex setting 2025/01/27 14:02:04 DEBUG Skipping branch testing-branch4 of source control_repo, because of filter_regex setting 2025/01/27 14:02:04 DEBUG Skipping branch testing-branch5 of source control_repo, because of filter_regex setting 2025/01/27 14:02:04 DEBUG Skipping branch testing-branch6 of source control_repo, because of filter_regex setting 2025/01/27 14:02:04 DEBUG Skipping branch testing_branch7 of source control_repo, because of filter_regex setting [ hang indefinitely ]
Logs of correct execution:
... 2025/01/29 08:56:52 Executing git --git-dir g10k-cache/environments/control_repo.git remote -v took 0.00100s ... 2025/01/29 08:56:52 Executing git --git-dir g10k-cache/environments/control_repo.git remote update --prune took 0.26773s 2025/01/29 08:56:52 DEBUG executeCommand(): Executing git --git-dir g10k-cache/environments/control_repo.git branch 2025/01/29 08:56:52 Executing git --git-dir g10k-cache/environments/control_repo.git branch took 0.00146s 2025/01/29 08:56:52 DEBUG Skipping branch main of source control_repo, because of filter_regex setting 2025/01/29 08:56:52 DEBUG Environment staging of source control_repo does not match branch name filter 'production', skipping 2025/01/29 08:56:52 DEBUG Skipping branch test-branch1 of source control_repo, because of filter_regex setting 2025/01/29 08:56:52 DEBUG Skipping branch test-branch2 of source control_repo, because of filter_regex setting 2025/01/29 08:56:52 DEBUG Skipping branch testing of source control_repo, because of filter_regex setting 2025/01/29 08:56:52 DEBUG Skipping branch testing-branch3 of source control_repo, because of filter_regex setting 2025/01/29 08:56:52 DEBUG Skipping branch testing-branch4 of source control_repo, because of filter_regex setting 2025/01/29 08:56:52 DEBUG Skipping branch testing-branch5 of source control_repo, because of filter_regex setting 2025/01/29 08:56:52 DEBUG 1(): Resolving environment production of source control_repo 2025/01/29 08:56:52 DEBUG Skipping branch testing-branch6 of source control_repo, because of filter_regex setting 2025/01/29 08:56:52 DEBUG Skipping branch testing_branch7 of source control_repo, because of filter_regex setting 2025/01/29 08:56:52 DEBUG executeCommand(): Executing git --git-dir g10k-cache/environments/control_repo.git rev-parse --verify 'production^{object}' 2025/01/29 08:56:52 Executing git --git-dir g10k-cache/environments/control_repo.git rev-parse --verify 'production^{object}' took 0.00103s ...
So last printings (at deadlock case) are about the filtering of most branches ( https://github.com/xorpaul/g10k/blob/v0.9.9/puppetfile.go#L105 ) and for branch "production" the start of syncing https://github.com/xorpaul/g10k/blob/v0.9.9/puppetfile.go#L122 . At first look, it seems that the random order of branch iteration is causing the deadlock. The issue was reproducible for one system user, while not for another...
I tried with a custom build with more debug messages around sizedwaitgroup's calls and mutex locks/unlocks, against latest go version 1.23, but I couldn't reproduce yet.