fix: add duplicate URL detection in useAddWebPages#2740
fix: add duplicate URL detection in useAddWebPages#2740apoorvdarshan wants to merge 3 commits intogiselles-ai:mainfrom
Conversation
|
@apoorvdarshan is attempting to deploy a commit to the Giselle Team on Vercel. A member of the Team first needs to authorize it. |
|
Finished running flow.
|
||||||||||||||||||
|
Review Summary by QodoAdd duplicate URL detection in useAddWebPages hook
WalkthroughsDescription• Adds duplicate URL detection to useAddWebPages hook • Prevents same URL from being added multiple times • Checks both current batch and existing webpages • Follows established dedup pattern from useUploadFile Diagramflowchart LR
A["URL Input"] --> B["Normalize URLs"]
B --> C["Create batchSeen Set"]
C --> D["Check Duplicates"]
D --> E{Duplicate Found?}
E -->|Yes| F["Show Error Message"]
E -->|No| G["Add to batchSeen"]
G --> H["Create WebPage"]
F --> I["Skip URL"]
H --> J["Add to Node"]
File Changes1. internal-packages/workflow-designer-ui/src/app-designer/store/usecases/use-add-web-pages.ts
|
Add a `batchSeen` Set and check against existing webpages to prevent the same URL from being added multiple times, consistent with the dedup pattern already used in `useUploadFile`. Closes giselles-ai#2490
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review infoConfiguration used: defaults Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughAdds batch-level duplicate filtering to the web page addition hook: validates the target node, builds an Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~8 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs). Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
796d9d8 to
9483a74
Compare
Code Review by Qodo
1. Dedup misses canonical URLs
|
internal-packages/workflow-designer-ui/src/app-designer/store/usecases/use-add-web-pages.ts
Outdated
Show resolved
Hide resolved
🔍 QA Testing Assistant by Giselle📋 Manual QA ChecklistBased on the changes in this PR, here are the key areas to test manually:
✨ Prompt for AI AgentsUse the following prompts with Cursor or Claude Code to automate E2E testing: 📝 E2E Test Generation Prompt4. MCP Integration Guidelines
5. CI-Ready Code Requirements
|
There was a problem hiding this comment.
Pull request overview
This PR adds duplicate URL detection to the useAddWebPages hook to prevent the same URL from being added multiple times to a web page node. The implementation follows the established pattern from useUploadFile, using a batchSeen Set to track URLs within the current batch and checking against existing URLs in the node's content.
Changes:
- Added
batchSeenSet to track URLs processed in the current batch - Added duplicate check that compares against both URLs in the current batch and existing URLs in the node
- Added error reporting for duplicate URLs using the
onErrorcallback
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
internal-packages/workflow-designer-ui/src/app-designer/store/usecases/use-add-web-pages.ts
Outdated
Show resolved
Hide resolved
- Normalize existing webpage URLs before comparison to catch equivalent URLs with different string representations (e.g. trailing slash) - Lowercase "duplicate" in error message to match useUploadFile pattern
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
internal-packages/workflow-designer-ui/src/app-designer/store/usecases/use-add-web-pages.ts (1)
61-63:existingUrlsrebuild on every iteration is redundant withbatchSeen.Because
updateNodeDataContentsynchronously commits to the Zustand store before theawaitcall, the node re-fetched at line 54 on the next iteration already contains the URL just added. This meansexistingUrlsalready covers the within-batch case — makingbatchSeen(line 64) partially redundant, and vice-versa.Both the rebuild cost (O(M) per iteration → O(N·M) total) and the redundancy can be avoided by constructing
existingUrlsonce before the loop from the initial node snapshot, and relying solely onbatchSeenfor the within-batch case:♻️ Proposed refactor: build `existingUrls` once before the loop
+ const initialNode = store.getState().nodes.find((n) => n.id === args.nodeId) as + | WebPageNode + | undefined; + if (!initialNode || initialNode.content.type !== "webPage") { + return; + } + const existingUrls = new Set( + initialNode.content.webpages.map((w) => normalizeHttpsUrl(w.url) ?? w.url), + ); + const batchSeen = new Set<string>(); for (const url of normalizedUrls) { const node = store.getState().nodes.find((n) => n.id === args.nodeId) as | WebPageNode | undefined; if (!node || node.content.type !== "webPage") { return; } - const existingUrls = new Set( - node.content.webpages.map((w) => normalizeHttpsUrl(w.url) ?? w.url), - ); if (batchSeen.has(url) || existingUrls.has(url)) { args.onError?.(`duplicate URL: ${url}`); continue; } batchSeen.add(url);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal-packages/workflow-designer-ui/src/app-designer/store/usecases/use-add-web-pages.ts` around lines 61 - 63, The code rebuilds existingUrls inside the loop causing O(N·M) cost and redundancy with batchSeen; modify useAddWebPages so that existingUrls is constructed once from the initial node snapshot (using node.content.webpages and normalizeHttpsUrl) before entering the loop, then remove the per-iteration rebuild and use batchSeen exclusively to track URLs added within the current batch; keep updateNodeDataContent as-is (it will still commit to Zustand) but ensure the loop only checks existingUrls and batchSeen to decide skips.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@internal-packages/workflow-designer-ui/src/app-designer/store/usecases/use-add-web-pages.ts`:
- Line 65: The error message passed to args.onError in the duplicate-URL branch
should match sentence case like the other messages; update the string in the
args.onError call inside the add-web-pages logic to "Duplicate URL: ${url}"
(capital D) so it matches the other messages ("Please enter..." and "Invalid URL
format...") — locate the args.onError?.(`duplicate URL: ${url}`) invocation in
use-add-web-pages.ts and change the message to "Duplicate URL: ${url}".
---
Nitpick comments:
In
`@internal-packages/workflow-designer-ui/src/app-designer/store/usecases/use-add-web-pages.ts`:
- Around line 61-63: The code rebuilds existingUrls inside the loop causing
O(N·M) cost and redundancy with batchSeen; modify useAddWebPages so that
existingUrls is constructed once from the initial node snapshot (using
node.content.webpages and normalizeHttpsUrl) before entering the loop, then
remove the per-iteration rebuild and use batchSeen exclusively to track URLs
added within the current batch; keep updateNodeDataContent as-is (it will still
commit to Zustand) but ensure the loop only checks existingUrls and batchSeen to
decide skips.
internal-packages/workflow-designer-ui/src/app-designer/store/usecases/use-add-web-pages.ts
Outdated
Show resolved
Hide resolved
shige
left a comment
There was a problem hiding this comment.
Thanks for the contribution! The duplicate detection logic itself is correct and follows the existing pattern well.
However, there's a performance concern that should be addressed before merging:
Performance Issue: O(N²) Complexity in Duplicate URL Check
The existingUrls Set is reconstructed inside the for (const url of normalizedUrls) loop (lines 61-63). This leads to O(N × M) complexity, where N is the number of new URLs and M is the number of existing URLs in the node.
When a user attempts to add a large batch of URLs (e.g., 100+ URLs to a node that already has many entries), this will block the browser's main thread, making the UI unresponsive.
Suggested fix: Move the initialization of existingUrls outside of the loop.
const batchSeen = new Set<string>();
// Build existingUrls once before the loop
const node = store.getState().nodes.find((n) => n.id === args.nodeId) as
| WebPageNode
| undefined;
if (!node || node.content.type !== "webPage") {
return;
}
const existingUrls = new Set(
node.content.webpages.map((w) => normalizeHttpsUrl(w.url) ?? w.url),
);
for (const url of normalizedUrls) {
if (batchSeen.has(url) || existingUrls.has(url)) {
args.onError?.(`duplicate URL: ${url}`);
continue;
}
batchSeen.add(url);
// ...
}Please address this and I'll be happy to approve!
Build the existingUrls Set once before the loop instead of reconstructing it on every iteration. Addresses reviewer feedback on PR giselles-ai#2740.
|
Thanks for the review! I've addressed the performance concern:
Please take another look when you get a chance! |
Summary
useAddWebPageshook, preventing the same URL from being added multiple timesbatchSeenSet to catch duplicates within the current batch, and checks against existingnode.content.webpagesURLsuseUploadFileCloses #2490
Test plan
Summary by CodeRabbit