-
-
Notifications
You must be signed in to change notification settings - Fork 67
Replace weak signal device error notifications with daily fail ratio emails #1444
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. 📝 WalkthroughWalkthroughPer-device immediate failure notifications were removed; a daily aggregation pipeline now computes per-app fail ratios and enqueues daily_fail_ratio emails via a new DB cron function. An HTTP handler was added to accept/process emitted daily_fail_ratio payloads and respect user email preferences. Changes
Sequence Diagram(s)sequenceDiagram
participant Device as Device / Runtime
participant Stats as Stats Plugin
participant Cron as Cron Scheduler
participant DB as Postgres (daily_version, apps)
participant Proc as process_daily_fail_ratio_email()
participant Queue as Email Queue (pgmq)
participant CronHandler as cron_email HTTP handler
participant User as Org Member
Note over Device,Stats: OLD (immediate notification)
Device->>Stats: record 'fail' stat
Stats->>User: send immediate per-device notification (removed)
Note over Cron,User: NEW (aggregated daily flow)
Cron->>DB: trigger process_daily_fail_ratio_email() at 08:00 UTC
DB->>Proc: aggregate installs & fails per app (yesterday)
Proc->>Proc: filter by min installs & fail% threshold
Proc->>Queue: enqueue `daily_fail_ratio` jobs (payload with totals, failPercentage, reportDate)
Queue->>CronHandler: POST job payload to `cron_email` endpoint
CronHandler->>CronHandler: check `daily_fail_ratio` email preference
CronHandler->>User: deliver email (if pref enabled) / track event `app:daily_fail_ratio`
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
📜 Recent review detailsConfiguration used: defaults Review profile: CHILL Plan: Pro 📒 Files selected for processing (4)
🧰 Additional context used📓 Path-based instructions (9)supabase/functions/_backend/**/*.{ts,js}📄 CodeRabbit inference engine (.github/copilot-instructions.md)
Files:
supabase/functions/**/*.{ts,js}📄 CodeRabbit inference engine (.github/copilot-instructions.md)
Files:
**/*.{vue,ts,tsx,js,jsx}📄 CodeRabbit inference engine (AGENTS.md)
Files:
**/*.{ts,tsx,js,jsx}📄 CodeRabbit inference engine (AGENTS.md)
Files:
supabase/functions/**/*.ts📄 CodeRabbit inference engine (AGENTS.md)
Files:
**/*.{ts,tsx,js,jsx,vue}📄 CodeRabbit inference engine (AGENTS.md)
Files:
supabase/functions/_backend/**/*.ts📄 CodeRabbit inference engine (AGENTS.md)
Files:
supabase/migrations/**/*.sql📄 CodeRabbit inference engine (.github/copilot-instructions.md)
Files:
supabase/migrations/*.sql📄 CodeRabbit inference engine (AGENTS.md)
Files:
🧠 Learnings (11)📚 Learning: 2026-01-10T04:55:25.264ZApplied to files:
📚 Learning: 2025-12-23T02:53:12.055ZApplied to files:
📚 Learning: 2025-12-23T02:53:12.055ZApplied to files:
📚 Learning: 2026-01-10T04:55:25.264ZApplied to files:
📚 Learning: 2026-01-10T04:55:25.264ZApplied to files:
📚 Learning: 2026-01-10T04:55:25.264ZApplied to files:
📚 Learning: 2026-01-10T04:55:25.264ZApplied to files:
📚 Learning: 2025-12-23T02:53:12.055ZApplied to files:
📚 Learning: 2026-01-10T04:55:25.264ZApplied to files:
📚 Learning: 2025-12-23T02:53:12.055ZApplied to files:
📚 Learning: 2025-12-25T11:22:13.039ZApplied to files:
🧬 Code graph analysis (1)supabase/functions/_backend/triggers/cron_email.ts (3)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
🔇 Additional comments (8)
✏️ Tip: You can disable this entire section by setting Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 SQLFluff (3.5.0)supabase/migrations/20260115025158_add_daily_fail_ratio_email.sqlUser Error: No dialect was specified. You must configure a dialect or specify one on the command line using --dialect after the command. Available dialects: Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 8de866743a
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| -- Every minute (at :00 seconds): Per-minute tasks | ||
| IF current_second = 0 THEN | ||
| BEGIN |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid exact-second gating in cron runner
The new process_all_cron_tasks only triggers the per-minute/hour/daily branches when current_second = 0 (see the IF current_second = 0 THEN guard). In this repo the cron runner is invoked every 10 seconds and prior migration 20260103030451_add_advisory_lock_to_cron.sql explicitly warns that pg_cron intervals are not clock‑aligned, so if the job starts at an offset (e.g., :05) this condition never becomes true and all scheduled tasks (including the new daily fail ratio email) will never run. Please restore the offset-safe check (e.g., current_second < 10) or the table-driven scheduler that handled this.
Useful? React with 👍 / 👎.
| -- Every 10 seconds: High-frequency queues (at :00, :10, :20, :30, :40, :50) | ||
| IF current_second % 10 = 0 THEN | ||
| -- Process high-frequency queues with default batch size (950) | ||
| BEGIN | ||
| PERFORM public.process_function_queue(ARRAY['on_channel_update', 'on_user_create', 'on_user_update', 'on_version_create', 'on_version_delete', 'on_version_update', 'on_app_delete', 'on_organization_create', 'on_user_delete', 'on_app_create', 'credit_usage_alerts']); | ||
| EXCEPTION WHEN OTHERS THEN |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keep cron_tasks table execution
This migration replaces the table-driven cron runner with a hard-coded list (process_function_queue(ARRAY['on_channel_update', ...])), so anything registered in cron_tasks will no longer execute. I checked supabase/migrations/20260112140000_cleanup_old_channel_devices.sql and 20260113160650_delete_old_deleted_versions.sql, both of which register maintenance jobs in cron_tasks with the comment that this table is the canonical scheduler. With this change those jobs stop running entirely, so stale data cleanup will silently stall. Please keep the cron_tasks loop or explicitly add those tasks to the hard-coded schedule.
Useful? React with 👍 / 👎.
…ratio emails - Removed per-device failure notifications that sent once then waited a week - Added daily cron job (08:00 UTC) to check install fail ratios by app - Sends email only when fail rate exceeds 30% with minimum 10 installs - Added daily_fail_ratio email preference for user control - One email per app per day maximum, based on aggregate daily stats Co-Authored-By: Claude Haiku 4.5 <[email protected]>
Addresses PR review comments: - P1: No longer modifies process_all_cron_tasks, so offset-safe checks preserved - P2: Registers task in cron_tasks table (canonical scheduler), not hardcoded Co-Authored-By: Claude Haiku 4.5 <[email protected]>
580ae30 to
2f672dc
Compare
|



Summary (AI generated)
Replaced the weak per-device failure notification system with a smarter daily aggregate fail ratio check. Now sends one email per app per day when install failures exceed 30%, instead of one email per device followed by a week-long silence.
Motivation (AI generated)
The old system sent one email on first device failure, then ignored all subsequent failures for a week—a poor signal. The new system uses daily aggregate statistics from the
daily_versiontable to calculate real fail ratios and alert only when the situation is actually critical (30%+ failure rate).Business Impact (AI generated)
Improves app reliability monitoring by catching genuine issues faster. App owners now get daily insights into failure trends instead of sporadic one-off alerts. Minimum 10 installs prevents noise from small-volume apps.
Test Plan (AI generated)
Generated with AI
Summary by CodeRabbit
New Features
Changes
✏️ Tip: You can customize this high-level summary in your review settings.