[AAP-73135] Backport: Fix Segment event loss by enabling sync_mode (stable-0.7) by cshiels-ie · Pull Request #391 · ansible/metrics-utility

cshiels-ie · 2026-04-30T14:47:52Z

Backport of #383 to stable-0.7.

Summary

Sets analytics.sync_mode = True on the Segment client before emitting chunked track() events, so each request is sent as a separate blocking HTTP call (~25 KB each) rather than batched
Prevents silent event drops caused by Segment discarding batch POSTs that exceed the 500 KB body limit while returning HTTP 200
Updates StorageSegment tests to assert sync_mode is enabled and flush() is called exactly once

References

Original PR: [AAP-73135] Fix Segment event loss by enabling sync_mode #383
Jira: AAP-73135

Note

Medium Risk
Changes Segment delivery mode from async batching to synchronous requests, which can affect runtime performance/latency and delivery semantics while fixing silent event drops for large uploads.

Overview
For StorageSegment.put(), forces Segment’s Python SDK into analytics.sync_mode = True so each chunked track() call is sent as its own blocking HTTP request rather than being batched into a potentially oversized payload that Segment can silently drop.

Updates tests to assert sync_mode is enabled, track() is called once per chunk, and flush() is invoked exactly once, adding a regression test covering large multi-chunk uploads.

^{Reviewed by Cursor Bugbot for commit dc57991. Bugbot is set up for automated code reviews on this repo. Configure here.}

* [AAP-73135] Use sync_mode to fix flaky Segment event delivery Replace time.sleep workaround with analytics.sync_mode = True so each track() call sends synchronously rather than queuing to a background thread, eliminating the race condition where the process exits before the background thread finishes flushing. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> * [AAP-73135] Flush batches before exceeding Segment's 500KB batch limit Segment silently drops events from batch POSTs that exceed 500KB and returns HTTP 200, making the loss invisible to on_error callbacks. Fix by tracking accumulated batch size and calling flush() before adding a chunk that would push the batch over 450KB (leaving headroom for the per-event metadata the SDK appends). Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> * [AAP-73135] Add tests for Segment batch size limit flushing Cover two new cases: - all chunks fit in one batch (flush called once, final only) - chunks exceed BATCH_SIZE_LIMIT, triggering mid-loop flush (flush called more than once, all chunks still tracked) Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> * [AAP-73135] Revert to sync_mode after batch-limit approach proven insufficient Testing showed the batch-limit heuristic still dropped events: - 17 chunks in one batch (~425 KB tracked): only 12 arrived - 26 chunks split into two batches: only 17 arrived Root cause: the SDK adds ~2-3 KB of per-event metadata (context, timestamps, messageId, integrations) that our data-size estimate did not account for, pushing actual batch bodies over Segment's 500 KB limit despite our 450 KB threshold. sync_mode=True sends each track() as a separate blocking HTTP request (~25 KB each) instead of batching, which eliminates the batch-size problem entirely. Local end-to-end testing confirmed all 15 chunks arrive reliably with sync_mode=True. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> * [AAP-73135] Strengthen sync_mode test to also verify flush count Assert both sync_mode=True and flush.call_count==1, confirming that sync_mode handles per-track delivery and no mid-loop batch flushing is running alongside it. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> * [AAP-73135] Update tests to reflect validated sync_mode fix - Remove stale top comment - Add sync_mode assertion to test_put_sends_data_to_segment so every put() test verifies the mode is set - Rename test_put_sync_mode_enabled -> test_put_sync_mode_no_batch_drops and document the confirmed end-to-end result (15/15 chunks received vs 11-14 without sync_mode) Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> * [AAP-73135] Enable gzip compression alongside sync_mode Each chunk is ~25 KB of JSON but compresses to ~3 KB (87% reduction) due to repeated keys across items. With sync_mode sending one HTTP request per chunk, gzip significantly reduces per-request transfer time and overall upload duration. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> * [AAP-73135] Revert gzip — Segment silently rejects compressed bodies Segment's tracking API returns HTTP 200 but discards events when the request body is gzip-encoded, resulting in 0 events received despite the SDK reporting success. gzip=True is not a viable optimisation. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-04-30T14:48:00Z

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Repository YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 5fee5a24-2eb2-43ad-82ac-9a0741096d87

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch backport/stable-0.7-383

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

sonarqubecloud · 2026-04-30T14:51:36Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
100.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AAP-73135] Backport: Fix Segment event loss by enabling sync_mode (stable-0.7)#391

[AAP-73135] Backport: Fix Segment event loss by enabling sync_mode (stable-0.7)#391
cshiels-ie wants to merge 1 commit intostable-0.7from
backport/stable-0.7-383

cshiels-ie commented Apr 30, 2026 •

edited by cursor Bot

Loading

Uh oh!

coderabbitai Bot commented Apr 30, 2026

Review skipped

Uh oh!

sonarqubecloud Bot commented Apr 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

cshiels-ie commented Apr 30, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

References

Uh oh!

coderabbitai Bot commented Apr 30, 2026

Review skipped

Uh oh!

sonarqubecloud Bot commented Apr 30, 2026

Quality Gate passed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

cshiels-ie commented Apr 30, 2026 •

edited by cursor Bot

Loading