Skip to content

Dirty fix for flaky send to segment#382

Draft
MilanPospisil wants to merge 8 commits intoansible:develfrom
MilanPospisil:milan_fix_flaky_send_to_segment
Draft

Dirty fix for flaky send to segment#382
MilanPospisil wants to merge 8 commits intoansible:develfrom
MilanPospisil:milan_fix_flaky_send_to_segment

Conversation

@MilanPospisil
Copy link
Copy Markdown
Contributor

@MilanPospisil MilanPospisil commented Apr 28, 2026

[AAP-73135]

Fixes flaky sending to segment.com, before, segment did not received all the messages. Now it receives most messages (but still not completely guaranteed).

This is a temporary workaround for unreliable delivery of chunked Segment events.
Without pacing, we observe missing chunks in roughly 10–20% of uploads.
Adding a delay between chunk sends significantly reduces missing chunks before release.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 28, 2026

📝 Walkthrough

Walkthrough

Modifies chunk dispatch in StorageSegment.put so each analytics.track call uses a unique anonymous_id (base UUID + chunk index) and calls analytics.flush() after each chunk. A commented-out time.sleep(3) throttle line was added but not enabled. No public API changes.

Changes

Cohort / File(s) Summary
Segment chunk dispatch
metrics_utility/library/storage/segment.py
Each chunked analytics.track now gets a distinct anonymous_id derived from the base UUID plus chunk index; analytics.flush() moved inside the per-chunk loop to run after each chunk. A commented time.sleep(3) line was added (optional throttle).

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Description check ⚠️ Warning PR description is significantly incomplete; missing mandatory Testing, Required Actions, and Self-Review Checklist sections specified in the template. Add Testing section with prerequisites and steps to reproduce the flaky sending issue, check Required Actions if documentation/coordination needed, and complete the Self-Review Checklist items.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Dirty fix for flaky send to segment' is directly related to the main objective of fixing unreliable delivery to segment.com, though it uses informal language ('dirty fix').
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Warning

Review ran into problems

🔥 Problems

Timed out fetching pipeline failures after 30000ms


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@MilanPospisil MilanPospisil force-pushed the milan_fix_flaky_send_to_segment branch from 744e737 to af52c62 Compare April 28, 2026 09:16
@MilanPospisil MilanPospisil marked this pull request as ready for review April 28, 2026 09:27
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@metrics_utility/library/storage/segment.py`:
- Around line 178-179: The loop always calls time.sleep(3), causing an
unnecessary delay after the final chunk; update the loop in segment.py so the
sleep is conditional (only when i < total_chunks) — after calling track(...)
check the loop index/remaining chunks and call time.sleep(3) only between
chunks, leaving the final iteration free of the sleep to avoid holding upstream
locks in workers/0-gather-controller-crc.py and
workers/1-gather-controller-full.py; specifically modify the location where
time.sleep(3) is used alongside variables i and total_chunks and the track(...)
call.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: a22ea98d-527e-4790-a225-8a455fee52d3

📥 Commits

Reviewing files that changed from the base of the PR and between 828c8c1 and af52c62.

📒 Files selected for processing (1)
  • metrics_utility/library/storage/segment.py

Comment on lines +178 to +179
# sleep for 3 seconds
time.sleep(3)
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Avoid sleeping after the final chunk.

time.sleep(3) runs on every iteration, so single-chunk uploads still pay a 3-second delay and multi-chunk uploads keep the upstream lock held longer in workers/0-gather-controller-crc.py:24-29 and workers/1-gather-controller-full.py:41-47. Make the pause conditional on i < total_chunks so it only applies between track calls.

Suggested fix
-            time.sleep(3)
+            if i < total_chunks:
+                time.sleep(3)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# sleep for 3 seconds
time.sleep(3)
# sleep for 3 seconds
if i < total_chunks:
time.sleep(3)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@metrics_utility/library/storage/segment.py` around lines 178 - 179, The loop
always calls time.sleep(3), causing an unnecessary delay after the final chunk;
update the loop in segment.py so the sleep is conditional (only when i <
total_chunks) — after calling track(...) check the loop index/remaining chunks
and call time.sleep(3) only between chunks, leaving the final iteration free of
the sleep to avoid holding upstream locks in workers/0-gather-controller-crc.py
and workers/1-gather-controller-full.py; specifically modify the location where
time.sleep(3) is used alongside variables i and total_chunks and the track(...)
call.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do not have better solution now, without sleep, the segment is not receiving all messages. With this fix, it is receiving most messages, its much better.

Release is at the door and we need solution quickly.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MilanPospisil, understood — the pacing is needed and I'm not suggesting removing it. The original suggestion just avoids sleeping after the last chunk, since at that point all track() calls are already queued and the next operation is analytics.flush(). The pacing between chunks is fully preserved.

-            # sleep for 3 seconds
-            time.sleep(3)
+            # sleep for 3 seconds between chunks (not after the last one)
+            if i < total_chunks:
+                time.sleep(3)

This is a one-line change that saves 3 seconds on every upload (including single-chunk ones) without touching the core workaround. Given release pressure, that makes it the lowest-risk improvement available.

pmflanagan
pmflanagan previously approved these changes Apr 28, 2026
Copy link
Copy Markdown
Contributor

@pmflanagan pmflanagan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@pmflanagan pmflanagan dismissed their stale review April 28, 2026 10:05

For further review

@MilanPospisil MilanPospisil marked this pull request as draft April 28, 2026 13:29
@sonarqubecloud
Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants