Skip to content

set_state operation may silently fail or create inconsistent state during failover #543

@bolshakov

Description

@bolshakov

When set_state is called during a Redis outage, the state is written to the in-memory failover store. The impact depends on the caller:

Admin panel: Operation silently has no effect

Admin's FailSafe instance writes to admin's local Memory store. Application instances never see it - they read from Redis (or their own isolated Memory stores):

  1. Admin calls light.lock(RED) during Redis outage
  2. set_state(LOCKED_RED) -> written to Admin's Memory
  3. Application instances read from Redis -> UNLOCKED
  4. Admin sees success, but lock had zero effect

Automated process (same application instance): State lost or leaked

When automation runs in-process (deployment hooks, health checks, feature flags), it shares the FailSafe instance with request handling:

State lost after recovery:

  1. Automation calls lock(RED) during Redis outage -> Memory
  2. Redis recovers
  3. Requests read from Redis -> UNLOCKED

Stale state resurfaces on future outages:

  1. Automation locks during outage -> Memory
  2. Redis recovers, automation unlocks -> Redis
  3. Future Redis outage
  4. Requests read from Memory -> stale LOCKED_RED

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions