fix: prevent duplicate preview deployments on concurrent pull_request webhooks#4290
fix: prevent duplicate preview deployments on concurrent pull_request webhooks#4290colocated wants to merge 2 commits intoDokploy:canaryfrom
Conversation
… webhooks Opening a PR with a preview label pre-attached fires opened and labeled in parallel; both raced past the "exists?" check and created two previews. - Unique index on (applicationId, pullRequestId); migration dedupes first. - createPreviewDeployment inserts before commenting; loser reuses winner row. - BullMQ jobId coalesces opened+labeled for the same SHA. - Per-app try/catch so one failure doesn't drop the batch.
| if (IS_CLOUD && app.serverId) { | ||
| jobData.serverId = app.serverId; | ||
| deploy(jobData).catch((error) => { | ||
| console.error("Background deployment failed:", error); | ||
| }); | ||
| continue; | ||
| } |
There was a problem hiding this comment.
IS_CLOUD path bypasses BullMQ dedup
The jobId-based deduplication is set up correctly for the queue path, but the IS_CLOUD && app.serverId branch calls deploy(jobData) directly and continues before myQueue.add is reached. For cloud instances with a serverId, two concurrent webhooks can still trigger two simultaneous deploy() calls for the same previewDeploymentId. The DB unique constraint correctly prevents two separate preview-app rows, but nothing coalesces the two raw deploy() invocations, so the same preview environment may receive concurrent deploys — the exact timing issue the queue dedup is meant to prevent.
- Migration DELETE tiebreaks createdAt ties on previewDeploymentId so same-ms duplicates don't both survive and fail the unique index build. - Cloud deploy() path goes through the same in-process dedup gate as BullMQ, closing the gap where two concurrent webhooks on one replica fire two simultaneous cloud deploys for one preview+SHA. - Clarify the race-loser's generated appName/domain are intentionally dropped.
|
@greptileai re-review |
|
@Siumauricio need some direction here on greptiles comment for the cloud version, not really my area / neither can i test it really |
What is this PR about?
Opening a PR with a configured preview label already attached causes GitHub to fire
pull_request.openedandpull_request.labeledin parallel. Both handlers inpages/api/deploy/github.tspassed the same check-then-insert ("does a preview exist for this PR?"), both calledcreatePreviewDeployment, and the result was two separate preview deployments: two rows, two bot comments on the PR, two deploys.Root cause
createPreviewDeploymentwas a classic TOCTOU:findPreviewDeploymentByApplicationId(...)returnsundefinedINSERTa new rowNo unique constraint on
(applicationId, pullRequestId), no advisory lock, no queue-level dedup.Fix
(applicationId, pullRequestId)onpreview_deployments. The migration first deletes existing duplicate rows (keeps the oldest per key) so the index can be built on instances that already have race artifacts.createPreviewDeploymentnow inserts the row first (with a""pullRequestCommentIdplaceholder, now defaulted). On the resulting23505unique violation the loser re-reads the winner's row, logs, and returns it. The GitHub comment is posted after the successful insert so the loser never orphans a second bot comment. Comment-post failures are caught and logged without rolling back the row —application.tsalready recreates missing comments on the first deploy viaissueCommentExists.jobId = preview:<previewDeploymentId>:<sha>. BullMQ ignores duplicate adds while a job with that id is waiting/active, so the opened+labeled burst (same SHA, different action) coalesces to one queued deploy. A latersynchronizeon a new SHA produces a new id and enqueues normally.try/catchso one app's failure is logged and the remaining apps still process.Benefit
No more duplicate preview apps, bot comments, or deploys from a single PR event. All error paths still log; logging is not suppressed anywhere.
Checklist
canarybranch.Issues related (if applicable)
Replaces #4289 (auto-closed by anti-slop on description length).
Greptile Summary
This PR adds three interlocking deduplication layers to prevent duplicate preview deployments triggered by concurrent
pull_request.opened+pull_request.labeledwebhooks: a DB unique index on(applicationId, pullRequestId), an insert-first / unique-violation-catch pattern increatePreviewDeployment, and BullMQjobId-based queue dedup. The migration correctly uses a tuple comparison(createdAt, previewDeploymentId)as the deletion predicate, so rows that share the same millisecond timestamp are still ordered deterministically by the primary key — the prior concern about same-millisecond ties is resolved.Confidence Score: 4/5
Safe to merge for self-hosted instances; the IS_CLOUD direct-deploy race (flagged in prior review) remains open for multi-replica cloud deployments.
The core TOCTOU race is well-addressed: the unique index enforces the invariant at the DB layer, the insert-first pattern makes the loser gracefully reuse the winner's row, and BullMQ jobId dedup coalesces queue adds across replicas. The migration tie-breaking is correctly handled via the compound (createdAt, previewDeploymentId) predicate. One prior-thread concern (IS_CLOUD cross-replica concurrent deploy() calls) is not addressed in this PR and remains a real issue for cloud multi-replica deployments, preventing a full 5.
apps/dokploy/pages/api/deploy/github.ts — the IS_CLOUD direct-deploy branch at line 524 bypasses all queue-level dedup.
Reviews (2): Last reviewed commit: "fix: address greptile review on preview-..." | Re-trigger Greptile