fix(api/history): gate approval-expiry sweep on Lambda runtime#1232
fix(api/history): gate approval-expiry sweep on Lambda runtime#1232cristim wants to merge 2 commits into
Conversation
|
Warning Review limit reached
More reviews will be available in 23 minutes and 58 seconds. Learn how PR review limits work. Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file). ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits. 🚦 How do rate limits work?CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability. For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
@coderabbitai review |
✅ Action performedReview finished.
|
|
@coderabbitai review |
✅ Action performedReview finished.
|
The GET /api/history handler expired stale pending/notified approvals in a best-effort background goroutine spawned right before the response returns. On Lambda the execution environment freezes as soon as the response is out, so the goroutine is unreliably suspended mid-sweep: rows can stay "pending" indefinitely and a thawed goroutine doing DB writes during a later request confuses latency and log attribution. Mirror the SWR cache's isLambda gate (ri_utilization_cache.go) using the same runtime.IsLambda detection helper: on Lambda the sweep (a handful of cheap UPDATEs) runs synchronously before the handler returns; on long-running servers it stays asynchronous so the read response is never blocked on the transitions. Regression tests cover both branches by flipping AWS_LAMBDA_RUNTIME_API via t.Setenv: the Lambda sub-test asserts the transition fired before getHistory returned (fails pre-fix), and the non-Lambda sub-test blocks the transition and asserts getHistory still returns (sweep remains asynchronous). Closes #1170
Trailing reference to "the async sweep" was left over from the expireStaleExecutionsAsync -> expireStaleExecutions rename in the prior commit. Now the sweep is sync on Lambda and async on servers, so the comment names both modes explicitly. No behavioural change.
Adversarial review (PR #1232)Reviewed against the five risk surfaces from the brief (runtime detection, gate placement, concurrency safety, ctx-cancel handling, test coverage). The fix is sound; one rebase hygiene issue addressed in this push, one docstring nit cleaned up, nothing blocking. Verified
Fixes pushed
UNSTABLE stateAll five most recent CI runs on
Not blocking. VerdictLGTM. CR re-ping posted ( |
|
@coderabbitai review |
✅ Action performedReview finished.
|
|
🐇✅ ✏️ Learnings added
✅ Action performedReview finished.
|
Rebased onto current main, which had already promoted audit_actor_stamps to 000077 (5894580, the renumber landed in PR #1232 review). The COR-02 nested-rollup migration was added at 000077 in this PR, which now collides with main's 000077_audit_actor_stamps and trips the pre-commit "Check for conflicting migration numbers" hook (CI's pre-commit run was failing on this branch with `Duplicate migration number(s) found: 000077`). Rename to 000078: - internal/database/postgres/migrations/000078_monthly_summary_nested_rollup.up.sql - internal/database/postgres/migrations/000078_monthly_summary_nested_rollup.down.sql - internal/database/postgres/migrations/000078_monthly_summary_nested_rollup_test.go Body unchanged except for the leading `-- 000077:` / `-- 000077 down:` header comments updated to 000078; the body-comment references to `000067/000074` (the historical flat-AVG definitions) stay as-is because they cite past migrations, not the file being renamed. Verified: - ls internal/database/postgres/migrations/*.up.sql | cut -c1-6 | sort | uniq -d prints nothing. - go build ./... succeeds. - go test ./internal/analytics/... passes 186/186. Refs #1151 (COR-02).
Problem (COR-06, #1170)
GET /api/historyexpired stale pending/notified approvals in a best-effort background goroutine spawned just before the response returns. The "context.Background() ensures the transitions are not cancelled" guarantee does not hold on Lambda: the execution environment freezes as soon as the response is out, so the goroutine is unreliably suspended mid-sweep. Rows can stay "pending" indefinitely, and a thawed goroutine doing DB writes during a later request confuses latency and log attribution. The same package already handles this correctly for the SWR cache (riUtilizationCachecarriesisLambdaand skips its background refresh).Fix
internal/api/handler_history.go:expireStaleExecutions(renamed fromexpireStaleExecutionsAsync, which is no longer always async) now gates onruntime.IsLambda(), the exact detection helper the SWR cache gate uses. On Lambda the sweep (a handful of cheap UPDATEs) runs synchronously before the handler returns; on long-running servers it stays asynchronous so the read response is never blocked on the transitions.expireStaleExecutionsSweep, shared by both branches; it keepscontext.Background()so neither request cancellation nor a Lambda request deadline aborts the best-effort transitions.Test evidence
New
TestHandler_getHistory_ExpireIfStale_LambdaGuardininternal/api/handler_history_test.gocovers both branches by flippingAWS_LAMBDA_RUNTIME_APIwitht.Setenv:TransitionExecutionStatushas already fired whengetHistoryreturns (no channel wait). Confirmed it FAILS pre-fix (git stashof the handler change,-count=5: 5/5 failures withexpected: 1, actual: 0) and passes post-fix.getHistorystill returns while it is blocked, proving the sweep remains asynchronous off Lambda; notime.Sleep, channel-synchronized per project convention.Verification:
go build ./...clean;go test ./internal/api/ -race -count=11630 passed;go vet ./internal/api/clean.Closes #1170