Skip to content

Fix circuit breaker probe lockup, cache key collision, evaluators leak, and HOCON list corruption#107

Merged
EtaCassiopeia merged 4 commits into
mainfrom
fix/codebase-review-fixes
May 12, 2026
Merged

Fix circuit breaker probe lockup, cache key collision, evaluators leak, and HOCON list corruption#107
EtaCassiopeia merged 4 commits into
mainfrom
fix/codebase-review-fixes

Conversation

@EtaCassiopeia
Copy link
Copy Markdown
Owner

@EtaCassiopeia EtaCassiopeia commented Mar 31, 2026

Summary

CircuitBreakerProvider: half-open probe slot permanently stuck on application errors

Application errors (FlagNotFound, TypeMismatch) in half-open state did not call recordSuccess or recordFailure, leaving the probe slot locked. All subsequent requests were rejected forever. Now treats application errors as success for circuit purposes since the provider is reachable.

CachingProvider: cache key hash collision serves wrong flag values across users

CacheKey used a 32-bit contextHash: Int. Two different evaluation contexts could collide, causing user A to receive user B's cached flag value. Replaced with contextFingerprint: String using stable sorted serialization of context attributes.

CachingProvider: unbounded evaluators map leaks memory

The evaluators: ConcurrentHashMap side-channel had no eviction — entries accumulated for every unique cache key over the provider's lifetime. Changed evaluators.put() to evaluators.remove() in the Lookup callback so entries are cleaned up after cache population.

HoconProvider: LIST config values become list of nulls

configValueToSdkValue called .asObject() on each list element, which returns null for scalar values. A config like allowed-regions = ["us", "eu"] produced [null, null]. Removed the .asObject() call.

Event bridge: getOrThrowFiberFailure can crash Java SDK event thread

All event bridge handlers used getOrThrowFiberFailure() which could throw on defects (e.g., Hub shutdown), killing the Java SDK's internal event dispatch thread. Replaced with getOrElse to absorb unexpected failures.

Test assertions: overly broad result.isLeft checks

  • BehaviorControlsSpec: setErrorMode(ProviderNotReady) now asserts the specific error type
  • EvaluationTimeoutSpec: per-call timeout tests now assert ProviderError type

@EtaCassiopeia EtaCassiopeia changed the title Fix critical issues from codebase review Fix circuit breaker probe lockup, cache key collision, evaluators leak, and HOCON list corruption Mar 31, 2026
@EtaCassiopeia EtaCassiopeia merged commit a16c808 into main May 12, 2026
1 check passed
@EtaCassiopeia EtaCassiopeia deleted the fix/codebase-review-fixes branch May 12, 2026 23:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant