Skip to content

Commit a0f97e4

Browse files
committed
docs: Add snapshot storm detection and demo session to bug spec
- Document snapshot storm mechanics and how it would manifest - List existing defenses (debounce, threshold, interval) - Add detection methods (metrics to monitor, log patterns) - Describe resolution strategy if storm detected during rollout - Add "The Vanishing Pattern" demo scenario showing data loss vs preservation - Include test script for the demo - Add user story for acceptance criteria
1 parent b02f12a commit a0f97e4

File tree

1 file changed

+182
-0
lines changed

1 file changed

+182
-0
lines changed

specs/research/STEP-ARRAY-INVARIANT-BUG.md

Lines changed: 182 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -450,6 +450,188 @@ These tests were added in af466ff and verify the bug, not correct behavior.
450450
451451
---
452452
453+
## Snapshot Storm Detection and Resolution
454+
455+
### What is a Snapshot Storm?
456+
457+
If server and client have different array lengths, their state hashes will differ. This causes:
458+
459+
1. Client computes hash of truncated state (64 elements)
460+
2. Server sends hash of full state (128 elements)
461+
3. Hashes don't match
462+
4. Client requests snapshot
463+
5. Client receives snapshot (128 elements)
464+
6. User changes stepCount → client reducer truncates to 64
465+
7. Hashes don't match again
466+
8. Repeat forever
467+
468+
### Existing Defenses
469+
470+
The codebase already has snapshot storm prevention:
471+
472+
| Mechanism | Location | Protection |
473+
|-----------|----------|------------|
474+
| Debounce | `multiplayer.ts:96` | `RECOVERY_DEBOUNCE_MS = 2000` |
475+
| Consecutive threshold | `sync-health.ts:64` | `mismatchThreshold: 2` |
476+
| Hash check interval | `multiplayer.ts:106` | `STATE_HASH_CHECK_INTERVAL_MS = 30000` |
477+
478+
**Worst case**: 1 snapshot every 30 seconds (hash check interval) × 2 (threshold) = 1 snapshot per minute.
479+
480+
### Detection During Rollout
481+
482+
**Metrics to monitor** (from `SyncHealthMetrics`):
483+
484+
```typescript
485+
interface SyncHealthMetrics {
486+
hashCheckCount: number; // Total checks
487+
mismatchCount: number; // Total mismatches
488+
consecutiveMismatches: number; // Current streak
489+
// ...
490+
}
491+
```
492+
493+
**Storm indicators**:
494+
- `mismatchCount` growing rapidly
495+
- `consecutiveMismatches` repeatedly hitting threshold then resetting
496+
- High `request_snapshot` frequency in server logs
497+
498+
**Log patterns to grep for**:
499+
```bash
500+
# On server (Cloudflare Workers logs)
501+
grep "request_snapshot" | wc -l # Should be low
502+
grep "state_sync" | wc -l # High = many snapshots sent
503+
504+
# On client (browser console)
505+
grep "[RECOVERY] Requesting snapshot" | uniq -c | sort -rn
506+
```
507+
508+
### Resolution if Storm Detected
509+
510+
1. **Immediate**: Roll back server change (re-enable array resizing)
511+
2. **Root cause**: Client reducer is still truncating
512+
3. **Fix**: Deploy client fix first, then server fix
513+
514+
### Prevention: Atomic Rollout
515+
516+
**Recommended deployment order**:
517+
1. Deploy client bundle with fixed `grid.tsx` reducer
518+
2. Old clients continue working (server still resizes)
519+
3. Once CDN propagated, deploy server fix
520+
4. New clients + new server = consistent 128-length arrays
521+
522+
**Or use feature flag**:
523+
```typescript
524+
// live-session.ts
525+
const FIXED_ARRAY_LENGTH = env.FEATURE_FIXED_ARRAYS ?? false;
526+
527+
if (!FIXED_ARRAY_LENGTH) {
528+
// Old resizing behavior for backward compatibility
529+
if (msg.stepCount < oldStepCount) {
530+
track.steps = track.steps.slice(0, msg.stepCount);
531+
}
532+
}
533+
```
534+
535+
---
536+
537+
## Demo Session: Before/After Impact
538+
539+
### Scenario: "The Vanishing Pattern"
540+
541+
A session that demonstrates data loss with current behavior and data preservation with fix.
542+
543+
#### Setup
544+
545+
1. Create a track with `stepCount = 128`
546+
2. Add a distinctive pattern in positions 64-127:
547+
```
548+
Steps 64-79: ●○○○●○○○●○○○●○○○ (kick pattern)
549+
Steps 80-95: ○●○●○●○●○●○●○●○● (hi-hat pattern)
550+
Steps 96-111: ●○○●○○●○○●○○●○○● (syncopated)
551+
Steps 112-127: ○○○○●●●●○○○○●●●● (build-up)
552+
```
553+
3. Switch to `stepCount = 64` (work on first half)
554+
4. Switch back to `stepCount = 128`
555+
556+
#### Current Behavior (BUG)
557+
558+
| Step | stepCount | Array Length | Pattern 64-127 |
559+
|------|-----------|--------------|----------------|
560+
| 1 | 128 | 128 | ✅ Present |
561+
| 2 | 64 | **64** (truncated) | ❌ **DELETED** |
562+
| 3 | 128 | 128 (padded with `false`) | ❌ **GONE** |
563+
564+
**User experience**: "My pattern disappeared when I changed step count!"
565+
566+
#### Fixed Behavior
567+
568+
| Step | stepCount | Array Length | Pattern 64-127 |
569+
|------|-----------|--------------|----------------|
570+
| 1 | 128 | 128 | ✅ Present |
571+
| 2 | 64 | 128 (unchanged) | ✅ Still there (hidden) |
572+
| 3 | 128 | 128 (unchanged) | ✅ **VISIBLE AGAIN** |
573+
574+
**User experience**: "My pattern came back when I expanded the view!"
575+
576+
### Test Script
577+
578+
```typescript
579+
describe('Demo: The Vanishing Pattern', () => {
580+
it('should preserve hidden steps when reducing stepCount (FIXED)', async () => {
581+
const session = createMockSession('demo');
582+
583+
// Create track with MAX_STEPS arrays
584+
const steps = Array(128).fill(false);
585+
steps[64] = true; // Kick at 64
586+
steps[80] = true; // Hi-hat at 80
587+
steps[100] = true; // Syncopated at 100
588+
steps[120] = true; // Build-up at 120
589+
590+
session['state'].tracks = [{
591+
id: 'demo-track',
592+
name: 'Demo',
593+
sampleId: 'kick',
594+
steps,
595+
parameterLocks: Array(128).fill(null),
596+
volume: 1,
597+
muted: false,
598+
playbackMode: 'oneshot',
599+
transpose: 0,
600+
stepCount: 128,
601+
}];
602+
603+
const ws = session.connect('player-1');
604+
605+
// Reduce to 64 steps
606+
ws.send(JSON.stringify({ type: 'set_track_step_count', trackId: 'demo-track', stepCount: 64 }));
607+
await vi.waitFor(() => expect(session.getState().tracks[0].stepCount).toBe(64));
608+
609+
// Pattern should still exist in the array (just hidden from view)
610+
expect(session.getState().tracks[0].steps[64]).toBe(true); // FAILS with bug
611+
expect(session.getState().tracks[0].steps[100]).toBe(true); // FAILS with bug
612+
613+
// Expand back to 128
614+
ws.send(JSON.stringify({ type: 'set_track_step_count', trackId: 'demo-track', stepCount: 128 }));
615+
await vi.waitFor(() => expect(session.getState().tracks[0].stepCount).toBe(128));
616+
617+
// Pattern is visible again
618+
expect(session.getState().tracks[0].steps[64]).toBe(true); // FAILS with bug
619+
expect(session.getState().tracks[0].steps[100]).toBe(true); // FAILS with bug
620+
});
621+
});
622+
```
623+
624+
### User Story for Demo
625+
626+
> **As a producer**, I want to work on just the first 64 steps of a 128-step pattern without losing my work in steps 65-128, so that I can focus on one section without destroying another.
627+
>
628+
> **Acceptance criteria**:
629+
> - Reducing stepCount hides but does not delete steps beyond the new count
630+
> - Increasing stepCount reveals previously hidden steps
631+
> - Pattern data survives any sequence of stepCount changes
632+
633+
---
634+
453635
## Test Plan
454636
455637
### Failing Test (Write First)

0 commit comments

Comments
 (0)