Skip to content

api: set Stop field on ExponentialBackOff in LifetimeWatcher#31896

Open
raman1236 wants to merge 2 commits into
hashicorp:mainfrom
raman1236:fix/lifetime-watcher-backoff-stop
Open

api: set Stop field on ExponentialBackOff in LifetimeWatcher#31896
raman1236 wants to merge 2 commits into
hashicorp:mainfrom
raman1236:fix/lifetime-watcher-backoff-stop

Conversation

@raman1236
Copy link
Copy Markdown

Description

Fixes a regression introduced in #26868 where the backoff package upgrade from v3 to v4 broke error propagation in the LifetimeWatcher.

Root Cause

When the ExponentialBackOff struct was upgraded from backoff v3 to v4, it was directly initialized instead of using backoff.NewExponentialBackOff(). In v4, the Stop field on ExponentialBackOff has a zero value of 0 (a time.Duration), but the sentinel value backoff.Stop is -1s. The constructor NewExponentialBackOff() sets Stop: backoff.Stop, but the direct struct initialization in lifetime_watcher.go omitted this field.

This means that when MaxElapsedTime is exceeded during persistent renewal failures, NextBackOff() returns 0 (the struct's Stop field value) instead of backoff.Stop (-1s). The subsequent check:

if sleepDuration == backoff.Stop {
    return err
}

never matches, so doneCh never receives the renewal error. The watcher eventually exits via the grace period check returning nil, silently losing the error.

Fix

Add Stop: backoff.Stop to the ExponentialBackOff struct literal so that NextBackOff() correctly returns backoff.Stop when the maximum elapsed time is exceeded. This is a one-line change.

Testing

  • Added TestLifetimeWatcherErrorBackoffStops regression test that creates a LifetimeWatcher with a short lease and persistent renewal failures, verifying the watcher terminates promptly instead of getting stuck
  • All existing TestLifetimeWatcher, TestRenewer_NewRenewer, and TestCalcSleepPeriod tests continue to pass
$ go test -run "TestLifetimeWatcher|TestCalcSleepPeriod|TestRenewer" -v ./api/
--- PASS: TestRenewer_NewRenewer (0.00s)
--- PASS: TestLifetimeWatcherErrorBackoffStops (0.96s)
--- PASS: TestLifetimeWatcher (71.34s)
    --- PASS: TestLifetimeWatcher/no_error (0.00s)
    --- PASS: TestLifetimeWatcher/short_increment_duration (0.00s)
    --- PASS: TestLifetimeWatcher/one_error (0.61s)
    --- PASS: TestLifetimeWatcher/many_errors (8.06s)
    --- PASS: TestLifetimeWatcher/only_errors (8.58s)
    --- PASS: TestLifetimeWatcher/negative_lease_duration (0.00s)
    --- PASS: TestLifetimeWatcher/permission_denied_error (54.10s)
PASS

Fixes #28611

@raman1236 raman1236 requested a review from a team as a code owner April 5, 2026 18:25
@vercel
Copy link
Copy Markdown

vercel Bot commented Apr 5, 2026

@ramanvasi is attempting to deploy a commit to the HashiCorp Team on Vercel.

A member of the Team first needs to authorize it.

@raman1236 raman1236 requested a review from raskchanky April 5, 2026 18:25
@dosubot dosubot Bot added bug Used to indicate a potential bug clientapi regression Used to indicate possible regressions between versions labels Apr 5, 2026
@hashicorp-cla-app
Copy link
Copy Markdown

hashicorp-cla-app Bot commented Apr 5, 2026

CLA assistant check
All committers have signed the CLA.

@raskchanky raskchanky removed their request for review April 24, 2026 16:01
@raman1236
Copy link
Copy Markdown
Author

Friendly ping — this PR fixes a subtle regression where ExponentialBackOff runs forever because Stop is never set. The CLA is signed and tests pass. Would appreciate a review when you have a moment. Thanks!

When the backoff package was upgraded from v3 to v4 in PR hashicorp#26868, the
ExponentialBackOff struct in lifetime_watcher.go was directly initialized
instead of using the backoff.NewExponentialBackOff() constructor. In v4,
the Stop field defaults to the zero value (0) instead of backoff.Stop
(-1s). This meant that when MaxElapsedTime was exceeded during persistent
renewal failures, NextBackOff() returned 0 instead of backoff.Stop, so
the check "sleepDuration == backoff.Stop" never matched and doneCh
never received an error.

Fix: Add "Stop: backoff.Stop" to the struct literal so that the
LifetimeWatcher properly detects when the backoff has been exhausted
and propagates the renewal error through doneCh.

Fixes hashicorp#28611
@raman1236 raman1236 force-pushed the fix/lifetime-watcher-backoff-stop branch from e3cd10f to b51efd5 Compare May 13, 2026 00:43
@dosubot dosubot Bot added the size:XS This PR changes 0-9 lines, ignoring generated files. label May 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Used to indicate a potential bug clientapi regression Used to indicate possible regressions between versions size:XS This PR changes 0-9 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Exponential backoff in the LifetimeWatcher does not send an error to doneCh if we reach the lease expiration while failing to renew

1 participant