[AAP-73135] Fix Segment event loss by enabling sync_mode#383
[AAP-73135] Fix Segment event loss by enabling sync_mode#383cshiels-ie merged 9 commits intoansible:develfrom
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository YAML (base), Organization UI (inherited) Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (2)
📝 WalkthroughSummary by CodeRabbit
WalkthroughThe pull request modifies Segment integration in the Changes
Sequence Diagram(s)sequenceDiagram
participant Client as StorageSegment.put()
participant SDK as Segment SDK<br/>(sync_mode=True)
participant HTTP as HTTP Layer
rect rgba(100, 200, 100, 0.5)
Note over Client,HTTP: New Behavior (Sync Mode)
end
loop For each chunk
Client->>SDK: analytics.track(event, properties)
activate SDK
SDK->>HTTP: POST (blocking request)
HTTP-->>SDK: Response
deactivate SDK
SDK-->>Client: Return (waits for completion)
end
Client->>SDK: analytics.flush()
activate SDK
SDK->>HTTP: Final flush
HTTP-->>SDK: Response
deactivate SDK
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Review rate limit: 0/1 reviews remaining, refill in 60 minutes.Comment |
Replace time.sleep workaround with analytics.sync_mode = True so each track() call sends synchronously rather than queuing to a background thread, eliminating the race condition where the process exits before the background thread finishes flushing. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
a7393bc to
3ecfaf2
Compare
Segment silently drops events from batch POSTs that exceed 500KB and returns HTTP 200, making the loss invisible to on_error callbacks. Fix by tracking accumulated batch size and calling flush() before adding a chunk that would push the batch over 450KB (leaving headroom for the per-event metadata the SDK appends). Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Cover two new cases: - all chunks fit in one batch (flush called once, final only) - chunks exceed BATCH_SIZE_LIMIT, triggering mid-loop flush (flush called more than once, all chunks still tracked) Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
…ufficient Testing showed the batch-limit heuristic still dropped events: - 17 chunks in one batch (~425 KB tracked): only 12 arrived - 26 chunks split into two batches: only 17 arrived Root cause: the SDK adds ~2-3 KB of per-event metadata (context, timestamps, messageId, integrations) that our data-size estimate did not account for, pushing actual batch bodies over Segment's 500 KB limit despite our 450 KB threshold. sync_mode=True sends each track() as a separate blocking HTTP request (~25 KB each) instead of batching, which eliminates the batch-size problem entirely. Local end-to-end testing confirmed all 15 chunks arrive reliably with sync_mode=True. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Assert both sync_mode=True and flush.call_count==1, confirming that sync_mode handles per-track delivery and no mid-loop batch flushing is running alongside it. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
- Remove stale top comment - Add sync_mode assertion to test_put_sends_data_to_segment so every put() test verifies the mode is set - Rename test_put_sync_mode_enabled -> test_put_sync_mode_no_batch_drops and document the confirmed end-to-end result (15/15 chunks received vs 11-14 without sync_mode) Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Each chunk is ~25 KB of JSON but compresses to ~3 KB (87% reduction) due to repeated keys across items. With sync_mode sending one HTTP request per chunk, gzip significantly reduces per-request transfer time and overall upload duration. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Segment's tracking API returns HTTP 200 but discards events when the request body is gzip-encoded, resulting in 0 events received despite the SDK reporting success. gzip=True is not a viable optimisation. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
|



Root cause
Segment silently drops events from batch POSTs that exceed 500 KB and returns HTTP 200, making the loss invisible — no error callback fires. The SDK queues all
track()calls andflush()sends them as a single batch POST. With 15 chunks at ~25 KB data each, the actual HTTP body (including ~2–3 KB of per-event SDK metadata —anonymousId,timestamp,context,messageId,integrations) pushes the batch well over 500 KB.Fix
Set
analytics.sync_mode = Trueon the client before sending. This makes everytrack()call a separate blocking HTTP request (~25 KB each) rather than queuing to a background thread for batching. Each request is well within Segment's 32 KB per-request limit.What was tried first
A batch-size tracking approach was implemented that called
flush()before the accumulated data size exceeded 450 KB. End-to-end testing showed it still dropped events:The estimated 450 KB threshold did not account for the full SDK metadata overhead (~2–3 KB per event), so actual batch bodies still exceeded 500 KB.
sync_mode=Trueeliminates batching entirely and is the only approach that delivered all chunks reliably in testing.End-to-end validation
Tested against a live Segment source using the exact payload shape produced by
flatten_json_report/anonymize_rollups(102 job-type rows, 81 installed-collection rows, ~112 KB total JSON):sync_mode=False(async batch)sync_mode=TrueLarge payload stress test (26 chunks, ~650 KB data):
sync_mode=TrueTests
test_put_sync_mode_enabled— verifiesanalytics.sync_modeisTrue, all chunks are tracked, andflush()is called exactly once (final flush only —sync_modehandles per-track delivery)test_put_sends_multiple_chunks_for_large_data— confirms all chunks are tracked and a singleflush()fires at the endReferences
BULK_MESSAGE_LIMITconstantNote
Medium Risk
Changes the delivery semantics of analytics emission from async/batched to synchronous per-event HTTP requests, which may impact performance and request timing while improving reliability.
Overview
Ensures Segment analytics uploads no longer rely on the SDK’s async batching by enabling
analytics.sync_mode = Truebefore emitting chunkedtrack()events, preventing silent drops when batched payloads exceed Segment’s 500 KB limit.Updates
StorageSegmenttests to assertsync_modeis enabled, that all chunks are tracked, and thatflush()is still called exactly once; adds a new regression test covering the large-payload batching drop scenario.Reviewed by Cursor Bugbot for commit 9f3eac0. Bugbot is set up for automated code reviews on this repo. Configure here.