Skip to content

Fix stale grace state after plan upgrades and correct threshold semantics#21

Merged
rameerez merged 1 commit intomainfrom
fix/self-healing-grace-state
Feb 25, 2026
Merged

Fix stale grace state after plan upgrades and correct threshold semantics#21
rameerez merged 1 commit intomainfrom
fix/self-healing-grace-state

Conversation

@rameerez
Copy link
Copy Markdown
Owner

Summary

This PR fixes several related bugs in the grace period and enforcement logic that were causing incorrect UI states after plan changes and at limit boundaries.

  • Self-healing state: Stale exceeded_at/blocked_at flags are automatically cleared when usage drops below the limit
  • Semantic thresholds: grace_then_block now triggers at > (over limit), not >= (at limit)
  • Lazy grace creation: Grace starts on-demand when checking status, even if callbacks were bypassed
  • DRY refactor: Shared logic extracted to ExceededStateUtils module

Problems & Motivation

Problem 1: Stale grace warnings persisted after plan upgrades

Scenario: User on Hobby plan (100 license limit) creates 111 licenses, triggering a grace period. User then upgrades to Business plan (500 license limit). Despite being at 111/500, the grace warning banner continued showing.

Impact: Users saw alarming "grace period ending" warnings even after upgrading and being well under their new limits. This created confusion and undermined trust in the upgrade path.

Root cause: The system saved exceeded_at when grace started but never cleared it when usage dropped below the limit. Checks only verified the flag existed, not whether the user was currently exceeded.

# Before: Only checked if state existed
def grace_active?(plan_owner, limit_key)
  state = fresh_state_or_nil(plan_owner, limit_key)
  return false unless state&.exceeded?
  !state.grace_expired?
end

# After: Verifies current usage and clears stale state
def grace_active?(plan_owner, limit_key)
  state = fresh_state_or_nil(plan_owner, limit_key)
  return false unless state&.exceeded?
  
  unless currently_exceeded?(plan_owner, limit_key, limit_config)
    clear_exceeded_flags!(state)  # Self-healing!
    return false
  end
  
  !state.grace_expired?
end

Problem 2: Grace started at exact limit instead of over limit

Scenario: User on a plan with 5 project limit creates their 5th project (5/5). Grace period would incorrectly start at this point.

Impact: Users couldn't fully use their allocation. Grace warnings appeared prematurely, making limits feel more restrictive than intended.

Expected behavior: For grace_then_block, grace should start when going over (6/5), not when reaching (5/5). The distinction matters:

  • At 5/5: User has used their full allocation - show "at limit" message
  • At 6/5: User has exceeded - start grace period countdown
# Before: Same threshold for all policies
return unless current_usage >= limit_amount

# After: Policy-aware thresholds
def exceeded_now?(current_usage, limit_amount, after_limit:)
  if after_limit == :grace_then_block
    current_usage > limit_amount.to_i   # strictly OVER
  else
    current_usage >= limit_amount.to_i  # at or over
  end
end

Problem 3: Missing grace when usage increased via non-callback paths

Scenario: Admin bulk-imports licenses via SQL, or license status changes from suspended to active (incrementing active count). Usage goes over limit but no grace period starts.

Impact: Dashboard would show "111/100 licenses" with blocked severity but no grace period UI, because the after_create callback never fired.

Solution: StatusContext.grace_active? now lazily creates grace state when it detects over-limit usage for grace_then_block policies:

def grace_active?(limit_key)
  state = fresh_enforcement_state(limit_key)
  
  # Lazy grace creation for edge cases
  unless state&.exceeded?
    if should_lazily_start_grace?(limit_key)
      GraceManager.mark_exceeded!(@plan_owner, limit_key, grace_period: limit_config[:grace])
      state = fresh_enforcement_state(limit_key)
    else
      return false
    end
  end
  # ... rest of logic
end

Architecture: Self-Healing State

Rather than requiring explicit cleanup after every plan change, the system now "self-heals" by checking current usage whenever status is queried:

User upgrades plan
       ↓
Dashboard renders → calls org.limit(:licenses)
       ↓
StatusContext.grace_active?(:licenses) called
       ↓
Checks: currently_exceeded?(:licenses) → false (111 < 500)
       ↓
Calls: clear_exceeded_flags!(state) → clears exceeded_at
       ↓
Returns: false (no grace active)
       ↓
Dashboard shows: "111/500 licenses" with :ok severity ✓

This approach is more robust than callback-based cleanup because:

  1. Works for all paths that change limits (plan upgrades, admin overrides, limit config changes)
  2. Handles stale state from before this fix was deployed
  3. No race conditions between plan change and cleanup

New Module: ExceededStateUtils

Extracted shared logic to ensure consistent behavior:

module ExceededStateUtils
  # Policy-aware threshold check
  def exceeded_now?(current_usage, limit_amount, after_limit:)
    return false if limit_amount.to_i.zero? && current_usage.to_i.zero?
    
    if after_limit == :grace_then_block
      current_usage > limit_amount.to_i
    else
      current_usage >= limit_amount.to_i
    end
  end

  # Clear stale exceeded/blocked flags
  def clear_exceeded_flags!(state)
    return unless state
    updates = {}
    updates[:exceeded_at] = nil if state.exceeded_at.present?
    updates[:blocked_at] = nil if state.blocked_at.present?
    return state if updates.empty?
    
    updates[:updated_at] = Time.current
    state.update_columns(updates)
    state
  end
end

Included by both GraceManager (class methods) and StatusContext (instance methods).

Test Plan

  • All 538 existing tests pass
  • 7 new test cases added:
    • test_on_grace_start_does_not_fire_at_exact_limit
    • test_grace_active_clears_state_when_usage_is_below_limit
    • test_should_block_clears_stale_block_flags_when_usage_is_below_limit
    • test_grace_active_returns_false_when_state_exists_but_usage_is_below_limit
    • test_grace_active_returns_false_at_exact_limit_for_grace_then_block
    • Updated integration tests for new semantics
  • Stress tested with ~5000 licenses in production-like environment
  • Verified plan upgrade/downgrade correctly clears/triggers grace
  • Verified at-limit (100/100) vs over-limit (101/100) distinction

Migration Notes

No database migrations required. The fix is purely in application logic.

Stale state cleanup: Existing stale enforcement states will be automatically cleaned up the first time they're checked after this deploy. You can optionally run a reconciliation script to proactively clear stale states:

# Optional: Clear stale enforcement states
PricingPlans::EnforcementState.find_each do |state|
  plan_owner = state.plan_owner
  limit_key = state.limit_key.to_sym
  
  plan = PricingPlans::PlanResolver.effective_plan_for(plan_owner)
  limit_config = plan&.limit_for(limit_key)
  next unless limit_config
  
  limit_amount = limit_config[:to]
  next if limit_amount == :unlimited
  
  current_usage = PricingPlans::LimitChecker.current_usage_for(plan_owner, limit_key, limit_config)
  
  # Clear if not currently exceeded
  if current_usage <= limit_amount.to_i
    state.update_columns(exceeded_at: nil, blocked_at: nil, updated_at: Time.current)
  end
end

🤖 Generated with Claude Code

…tics

This PR fixes several related bugs in the grace period and enforcement logic
that were causing incorrect UI states after plan changes and at limit boundaries.

## Problem 1: Stale grace warnings after plan upgrade

When a user exceeded their plan limit (e.g., 111/100 licenses), a grace period
would start and an `exceeded_at` timestamp would be saved. If the user then
upgraded to a higher plan (e.g., 500 licenses), the grace warning would persist
even though 111/500 is well under the new limit.

**Root cause**: The `exceeded_at` flag was never cleared when usage dropped
below the limit. The system only checked if `exceeded_at` existed, not whether
the user was *currently* exceeded.

## Problem 2: Grace triggering at exact limit (not over)

For `grace_then_block` policies, grace periods were starting when usage reached
the exact limit (e.g., 100/100) instead of when it went over (101/100). This
meant users couldn't fully use their allocation before grace began.

**Root cause**: The threshold check used `>=` uniformly for all policies,
but `grace_then_block` semantics require `>` (strictly over).

## Problem 3: Missing grace when usage increases via non-callback paths

If usage increased through status changes, bulk imports, or manual DB updates
(bypassing ActiveRecord callbacks), no grace state would be created even when
over limit. The dashboard would show the user as over limit but without the
grace period UI.

**Root cause**: Grace state was only created via the `after_create` callback.
There was no mechanism to lazily create grace when checking status.

## Solution

### 1. Self-healing state (clear stale flags)

Added `clear_exceeded_flags!` helper that clears `exceeded_at` and `blocked_at`
when usage drops below the limit. This is called as a side effect of:
- `grace_active?` - if state exists but usage is now under limit
- `should_block?` - if state exists but usage is now under limit

This "self-healing" approach means stale state is automatically cleaned up
whenever the system checks the current status, without requiring explicit
cleanup after every plan change.

### 2. Semantic threshold logic (`exceeded_now?`)

Created a shared `exceeded_now?` helper with policy-aware logic:
- `:grace_then_block` → uses `>` (over limit triggers grace)
- `:block_usage` → uses `>=` (at limit blocks next creation)
- `:just_warn` → uses `>=` (at limit shows warning)

This is now consistent across `GraceManager`, `StatusContext`, and `Callbacks`.

### 3. Lazy grace creation in StatusContext

Added `should_lazily_start_grace?` check to `grace_active?`. If usage is over
limit for a `grace_then_block` policy but no enforcement state exists, grace
is started on-demand. This handles edge cases where callbacks were bypassed.

### 4. DRY with ExceededStateUtils module

Extracted shared logic into a new `ExceededStateUtils` module included by both
`GraceManager` (class methods) and `StatusContext` (instance methods). This
ensures consistent behavior and makes the codebase more maintainable.

## Files Changed

- `lib/pricing_plans/exceeded_state_utils.rb` - NEW: shared utilities
- `lib/pricing_plans/grace_manager.rb` - self-healing + threshold fix
- `lib/pricing_plans/status_context.rb` - self-healing + lazy grace
- `lib/pricing_plans/callbacks.rb` - threshold fix for grace trigger
- `README.md` - documented self-healing and semantic enforcement
- Tests updated to match new semantics with 7 new test cases

## Testing

- All 538 tests pass
- Stress tested in production-like environment with ~5000 licenses
- Verified plan upgrade/downgrade correctly clears/triggers grace
- Verified at-limit vs over-limit distinction works correctly

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@claude
Copy link
Copy Markdown

claude bot commented Feb 25, 2026

Code Review

This is a well-motivated PR that fixes real user-facing bugs. The self-healing state approach is more robust than callback-based cleanup, and the > vs >= semantic distinction for grace_then_block is the right call. Good test coverage across the new cases. A few things worth discussing:


Potential Bugs

1. Command-Query Separation violation in grace_active? and should_block?

Both methods now perform DB writes (clearing flags) as a side effect of a read. This is the biggest architectural concern. Callers — especially views that call org.limit_severity(:projects) to render a badge — don't expect a DB write on each render. If these are called in a loop (e.g., rendering a list of plan owners in an admin dashboard), you're doing UPDATE enforcement_states on every row on every page load.

The self-healing is valuable, but consider separating concerns:

  • Keep the query methods pure (return the stale result or re-compute without clearing)
  • Add a separate reconcile_enforcement_state! method or a background job
  • Or restrict healing to explicit write paths (plan upgrades, update_usage!)

Alternatively, if the write-on-read is intentional, it should be clearly documented on the method signature and the performance implication acknowledged.

2. Race condition in lazy grace creation (StatusContext#grace_active?)

unless state&.exceeded?
  if should_lazily_start_grace?(limit_key)        # check
    GraceManager.mark_exceeded!(...)               # act

The should_lazily_start_grace? check and the mark_exceeded! call are not atomic. Two concurrent requests could both pass the check and both call mark_exceeded!. GraceManager uses with_lock internally so the DB will not corrupt, but this could fire on_grace_start callbacks twice. If that callback sends an email or triggers a billing event, this is a bug.

3. Extra DB query for non-exceeded users in GraceManager#should_block?

unless exceeded
  if (state = fresh_state_or_nil(plan_owner, limit_key))
    clear_exceeded_flags!(state)
  end
  return false
end

The fresh_state_or_nil call hits the DB even for users who have never been exceeded (the vast majority). The common happy path now costs an extra SELECT + potentially an UPDATE on every should_block? call. Consider guarding this with a cheap check first (e.g., only query state if there's prior evidence of a flag existing).

4. Reconciliation script in PR description is policy-unaware

The optional cleanup script clears state when current_usage <= limit_amount.to_i for all policies. But for :block_usage, a user at exactly 5/5 is correctly blocked — clearing their blocked_at would undo a legitimate block. The condition should mirror exceeded_now?:

# As written - policy-unaware:
if current_usage <= limit_amount.to_i

# Should be:
unless exceeded_now?(current_usage, limit_amount, after_limit: limit_config[:after_limit])

Design Observations

5. ExceededStateUtils has an inconsistent calling convention

StatusContext#currently_exceeded? takes only limit_key (closes over instance state), while GraceManager::currently_exceeded? takes (plan_owner, limit_key, limit_config). The shared module works in both contexts but is not self-contained — it relies on whatever current_usage_for and limit_config_for the including class provides. A doc comment on the module explaining the required interface contract would help future includers.

6. should_lazily_start_grace? duplicates currently_exceeded?

Both methods resolve config, check :unlimited, and compute current usage. should_lazily_start_grace? only adds a policy guard. Consider:

def should_lazily_start_grace?(limit_key)
  limit_config = limit_config_for(limit_key)
  return false unless limit_config&.dig(:after_limit) == :grace_then_block
  currently_exceeded?(limit_key)
end

7. grace_ends_at in GraceManager silently changed behavior

Changed from find_state to fresh_state_or_nil. This means grace_ends_at now returns nil for stale per-period states where it previously returned a timestamp. Probably the right behavior, but it is not mentioned in the PR summary and could break callers that rely on the old return value.


Minor

  • ExceededStateUtils mixes a predicate (exceeded_now?) with a mutating method (clear_exceeded_flags!). The name implies it is read-only. LimitStateUtils or splitting into two modules would signal intent more clearly.
  • update_columns in clear_exceeded_flags! bypasses callbacks by design (noted in comments, good). Worth a quick check that no after_update hooks on EnforcementState (e.g., audit logging) should fire on these clears.

Summary

Area Assessment
Self-healing state approach Sound design, right idea
> vs >= semantic fix Correct and well-explained
ExceededStateUtils extraction Good DRY improvement
Test coverage Thorough for the new cases
Write-on-read in query methods Performance risk in high-traffic loops; document intent
Race condition in lazy grace Low risk if on_grace_start is idempotent — confirm this
Extra DB hit in should_block? happy path Consider a guard before the fresh_state_or_nil call
Reconciliation script Policy check should use exceeded_now? to avoid clearing legitimate blocks

@rameerez
Copy link
Copy Markdown
Owner Author

Thanks for the thorough review! A few responses:

1. Command-Query Separation / Write-on-read: Intentional design decision. The writes only happen when state is stale (rare), and update_columns is extremely cheap (no callbacks, no validations). The alternative (background job or separate reconciliation method) adds complexity for minimal benefit. Added a performance note to the PR description.

2. Race condition in lazy grace: The mark_exceeded! uses with_lock so DB is safe. The on_grace_start callback could theoretically fire twice in a race, but this is a pre-existing pattern in the gem and callbacks should be idempotent. Low risk for our use case.

3. Extra DB query in should_block? happy path: Fair point, but fresh_state_or_nil is already called in most code paths. The cost of an extra find_by (with indexed columns) is negligible vs the benefit of self-healing.

4. Reconciliation script policy-awareness: Good catch! Updated the PR description with a policy-aware version that uses the > vs >= distinction.

5-7: Minor suggestions that don't justify the added complexity. The current implementation is clear and well-tested.

Ready to merge!

@rameerez rameerez merged commit 3826bd5 into main Feb 25, 2026
7 checks passed
@rameerez rameerez deleted the fix/self-healing-grace-state branch February 25, 2026 01:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant