Skip to content

Fix credential refresh race during worker activation#567

Merged
EDsCODE merged 1 commit into
mainfrom
eric/skip-refresh-pending-workers
May 18, 2026
Merged

Fix credential refresh race during worker activation#567
EDsCODE merged 1 commit into
mainfrom
eric/skip-refresh-pending-workers

Conversation

@EDsCODE
Copy link
Copy Markdown
Contributor

@EDsCODE EDsCODE commented May 18, 2026

Summary

  • Skip reserved/activating workers in the credential refresh due-worker query
  • Prevent the refresh scheduler from bumping owner_epoch while first ActivateTenant is still in flight
  • Add regression coverage for reserved/activating rows with NULL or past-due credential expiry

Context

In the 24-client same-org burst QA, several clients failed with:

same-tenant takeover requires newer owner epoch 1 (current N)

During burst activation, worker rows become org-bound before the first ActivateTenant RPC completes and before s3_credentials_expires_at is stamped. The refresh scheduler treated those NULL-expiry reserved/activating rows as immediately due, bumped owner_epoch, and could race the original activation.

Tests

  • go test ./tests/configstore -run 'TestListWorkersDueForCredentialRefresh|TestMarkCredentialsRefreshed'
  • go test -tags kubernetes ./controlplane -run 'TestCredentialRefreshScheduler|TestK8sPoolActivateReservedWorker|TestSharedWorkerActivator'

@EDsCODE EDsCODE merged commit 31c36e9 into main May 18, 2026
22 checks passed
@EDsCODE EDsCODE deleted the eric/skip-refresh-pending-workers branch May 18, 2026 21:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant