Skip to content

fix: add duplicate URL detection in useAddWebPages#2740

Open
apoorvdarshan wants to merge 3 commits intogiselles-ai:mainfrom
apoorvdarshan:fix/duplicate-url-detection-web-pages
Open

fix: add duplicate URL detection in useAddWebPages#2740
apoorvdarshan wants to merge 3 commits intogiselles-ai:mainfrom
apoorvdarshan:fix/duplicate-url-detection-web-pages

Conversation

@apoorvdarshan
Copy link
Contributor

@apoorvdarshan apoorvdarshan commented Feb 17, 2026

Summary

  • Adds duplicate URL detection to the useAddWebPages hook, preventing the same URL from being added multiple times
  • Uses a batchSeen Set to catch duplicates within the current batch, and checks against existing node.content.webpages URLs
  • Follows the same dedup pattern already established in useUploadFile

Closes #2490

Test plan

  • Add a web page URL to a node, then try adding the same URL again — should show "Duplicate URL" error and skip
  • Paste multiple identical URLs at once — only the first should be added
  • Adding distinct URLs should still work normally

Summary by CodeRabbit

  • Bug Fixes
    • Improved duplicate handling when adding web pages: URLs added in the same operation or already present are now detected and skipped, preventing redundant submissions and avoiding unnecessary network requests. Duplicate attempts trigger immediate error feedback to the user.

Copilot AI review requested due to automatic review settings February 17, 2026 16:45
@apoorvdarshan apoorvdarshan requested a review from shige as a code owner February 17, 2026 16:45
@vercel
Copy link

vercel bot commented Feb 17, 2026

@apoorvdarshan is attempting to deploy a commit to the Giselle Team on Vercel.

A member of the Team first needs to authorize it.

@giselles-ai
Copy link

giselles-ai bot commented Feb 17, 2026

Finished running flow.

Step 1
🟢
On Pull Request OpenedStatus: Success Updated: Feb 17, 2026 4:45pm
Step 2
🟢
Manual QAStatus: Success Updated: Feb 17, 2026 4:47pm
🟢
Prompt for AI AgentsStatus: Success Updated: Feb 17, 2026 4:47pm
Step 3
🟢
Create a Comment for PRStatus: Success Updated: Feb 17, 2026 4:50pm
Step 4
🟢
Create Pull Request CommentStatus: Success Updated: Feb 17, 2026 4:50pm

@changeset-bot
Copy link

changeset-bot bot commented Feb 17, 2026

⚠️ No Changeset found

Latest commit: d4432e9

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

💥 An error occurred when fetching the changed packages and changesets in this PR
Some errors occurred when validating the changesets config:
The package or glob expression "giselles-ai" is specified in the `ignore` option but it is not found in the project. You may have misspelled the package name or provided an invalid glob expression. Note that glob expressions must be defined according to https://www.npmjs.com/package/micromatch.

@qodo-free-for-open-source-projects

Review Summary by Qodo

Add duplicate URL detection in useAddWebPages hook

🐞 Bug fix

Grey Divider

Walkthroughs

Description
• Adds duplicate URL detection to useAddWebPages hook
• Prevents same URL from being added multiple times
• Checks both current batch and existing webpages
• Follows established dedup pattern from useUploadFile
Diagram
flowchart LR
  A["URL Input"] --> B["Normalize URLs"]
  B --> C["Create batchSeen Set"]
  C --> D["Check Duplicates"]
  D --> E{Duplicate Found?}
  E -->|Yes| F["Show Error Message"]
  E -->|No| G["Add to batchSeen"]
  G --> H["Create WebPage"]
  F --> I["Skip URL"]
  H --> J["Add to Node"]
Loading

Grey Divider

File Changes

1. internal-packages/workflow-designer-ui/src/app-designer/store/usecases/use-add-web-pages.ts 🐞 Bug fix +11/-0

Implement duplicate URL detection logic

• Introduces batchSeen Set to track URLs within current batch
• Creates existingUrls Set from node's existing webpages
• Checks if URL exists in either batch or existing webpages
• Shows "Duplicate URL" error and skips duplicate URLs

internal-packages/workflow-designer-ui/src/app-designer/store/usecases/use-add-web-pages.ts


Grey Divider

Qodo Logo

Add a `batchSeen` Set and check against existing webpages to prevent
the same URL from being added multiple times, consistent with the
dedup pattern already used in `useUploadFile`.

Closes giselles-ai#2490
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 17, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c394f96 and d4432e9.

📒 Files selected for processing (1)
  • internal-packages/workflow-designer-ui/src/app-designer/store/usecases/use-add-web-pages.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • internal-packages/workflow-designer-ui/src/app-designer/store/usecases/use-add-web-pages.ts

📝 Walkthrough

Walkthrough

Adds batch-level duplicate filtering to the web page addition hook: validates the target node, builds an existingUrls set from the node’s webpages, tracks batchSeen for the current operation, and skips/emits errors for URLs already seen or present before creating/submitting new web pages.

Changes

Cohort / File(s) Summary
Duplicate URL Detection in Web Page Addition
internal-packages/workflow-designer-ui/src/app-designer/store/usecases/use-add-web-pages.ts
Validate target node type; derive existingUrls from node.content.webpages; add batchSeen to track current-batch URLs; skip and call onError for duplicates (batch or existing) before creating/submitting new WebPage entries, avoiding redundant network requests.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Poem

🐰 I hopped through lists with careful eyes,
I checked each URL before it flies,
Batch and existing, both in view,
Duplicates stopped — just one per queue,
Data tidy, hopping proud and wise. 🌿

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely describes the main change: adding duplicate URL detection to useAddWebPages hook.
Description check ✅ Passed The description follows the template with all key sections covered: Summary, Related Issue, Changes (implied through Summary), and Test plan with specific test cases.
Linked Issues check ✅ Passed The PR fully implements the requirements from issue #2490: adds duplicate URL detection via batchSeen Set, checks against existing node.content.webpages, prevents redundant WebPage creation, and matches useUploadFile behavior.
Out of Scope Changes check ✅ Passed All changes are directly scoped to the stated objective of adding duplicate URL detection in useAddWebPages; no unrelated modifications are present.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Tip

Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs).
Share your feedback on Discord.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@qodo-free-for-open-source-projects
Copy link

qodo-free-for-open-source-projects bot commented Feb 17, 2026

Code Review by Qodo

🐞 Bugs (2) 📘 Rule violations (1) 📎 Requirement gaps (0)

Grey Divider


Action required

1. Dedup misses canonical URLs 🐞 Bug ✓ Correctness
Description
The duplicate check compares newly-normalized URLs (URL.href) against stored webpage.url strings
without normalizing the stored values. If existing data contains equivalent-but-not-identical URL
strings (e.g., https://example.com vs https://example.com/), duplicates can still be added,
undermining the feature.
Code

internal-packages/workflow-designer-ui/src/app-designer/store/usecases/use-add-web-pages.ts[R61-66]

+				const existingUrls = new Set(
+					node.content.webpages.map((w) => w.url),
+				);
+				if (batchSeen.has(url) || existingUrls.has(url)) {
+					args.onError?.(`Duplicate URL: ${url}`);
+					continue;
Evidence
New inputs are normalized to url.href, but the newly-added dedup Set uses raw stored strings
(w.url). The protocol only validates url as a URL (no canonicalization), and the backend returns
...args.webpage verbatim, so stored URLs can legitimately differ in canonical form and evade the
equality check.

internal-packages/workflow-designer-ui/src/app-designer/store/usecases/use-add-web-pages.ts[13-20]
internal-packages/workflow-designer-ui/src/app-designer/store/usecases/use-add-web-pages.ts[61-66]
packages/protocol/src/node/variables/web-page.ts[8-12]
packages/giselle/src/sources/add-web-page.ts[42-45]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
The dedup logic compares canonicalized input URLs (`URL.href`) to raw stored URL strings. If stored URLs are equivalent but not canonicalized the same way, duplicates will slip through.
## Issue Context
- New URLs are normalized via `new URL(raw).href`.
- Existing URLs are used as-is.
- Protocol validates URLs but does not canonicalize them, and backend returns the URL verbatim.
## Fix Focus Areas
- internal-packages/workflow-designer-ui/src/app-designer/store/usecases/use-add-web-pages.ts[13-20]
- internal-packages/workflow-designer-ui/src/app-designer/store/usecases/use-add-web-pages.ts[61-74]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools



Remediation recommended

2. Duplicate error exposes URL 📘 Rule violation ⛨ Security
Description
The duplicate-URL path returns a user-facing error that includes the full URL, which can leak
sensitive data (e.g., query tokens) to end users. Per secure error-handling guidance, user-facing
errors should be generic while details are kept to internal diagnostics.
Code

internal-packages/workflow-designer-ui/src/app-designer/store/usecases/use-add-web-pages.ts[R64-66]

+				if (batchSeen.has(url) || existingUrls.has(url)) {
+					args.onError?.(`Duplicate URL: ${url}`);
+					continue;
Evidence
PR Compliance ID 4 requires user-facing errors not to expose potentially sensitive internal details.
The new code calls args.onError with Duplicate URL: ${url}, which includes the full URL string.

Rule 4: Generic: Secure Error Handling
internal-packages/workflow-designer-ui/src/app-designer/store/usecases/use-add-web-pages.ts[64-66]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
The duplicate URL handler currently emits a user-facing error that includes the full URL (`Duplicate URL: ${url}`), which can leak sensitive info (e.g., tokens in query params).
## Issue Context
This appears to be a UI-facing `onError` callback; per secure error-handling guidance, end-user messages should be generic and not include potentially sensitive details.
## Fix Focus Areas
- internal-packages/workflow-designer-ui/src/app-designer/store/usecases/use-add-web-pages.ts[64-66]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


3. Rebuilds URL Set per item🐞 Bug ➹ Performance
Description
existingUrls is rebuilt inside the per-URL loop, re-walking all existing webpages for every URL in
the batch. This is O(batchSize × existingWebpages) and can become slow with large lists.
Code

internal-packages/workflow-designer-ui/src/app-designer/store/usecases/use-add-web-pages.ts[R53-55]

  	for (const url of normalizedUrls) {
  		const node = store.getState().nodes.find((n) => n.id === args.nodeId) as
  			| WebPageNode
Evidence
The Set construction iterates over node.content.webpages and is executed once per url in
normalizedUrls, which can multiply work for large batches/nodes.

internal-packages/workflow-designer-ui/src/app-designer/store/usecases/use-add-web-pages.ts[53-63]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`existingUrls` is rebuilt for every URL in the batch, repeatedly iterating over the full `node.content.webpages` list.
## Issue Context
This can cause unnecessary CPU work when pasting many URLs or when a node already contains many webpages.
## Fix Focus Areas
- internal-packages/workflow-designer-ui/src/app-designer/store/usecases/use-add-web-pages.ts[51-69]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

ⓘ The new review experience is currently in Beta. Learn more

Grey Divider

Qodo Logo

@giselles-ai
Copy link

giselles-ai bot commented Feb 17, 2026

🔍 QA Testing Assistant by Giselle

📋 Manual QA Checklist

Based on the changes in this PR, here are the key areas to test manually:

  • Scenario: Adding a URL that already exists

    1. Navigate to the workflow designer and select a node that accepts web pages (e.g., Knowledge).
    2. Add a single, valid web page URL (e.g., https://www.google.com).
    3. Confirm that the URL is successfully added to the node's list.
    4. Attempt to add the exact same URL (https://www.google.com) again.
    5. Expected Result: An error message like "Duplicate URL: https://www.google.com" should appear, and the URL should not be added a second time. The list should still show only one entry for this URL.
  • Scenario: Pasting multiple identical URLs in one batch

    1. Select a node that accepts web pages.
    2. In the input field for adding URLs, paste a list containing duplicates (e.g., https://www.claude.ai, https://www.anthropic.com, https://www.claude.ai).
    3. Submit the list.
    4. Expected Result:
      • https://www.claude.ai is added to the list only once.
      • https://www.anthropic.com is added to the list.
      • An error message for the duplicate https://www.claude.ai is displayed.
      • The final list should contain just one entry for each unique URL.
  • Scenario: Mix of new, duplicate, and pre-existing URLs

    1. Select a node that already contains the URL https://www.google.com.
    2. Attempt to add the following list of URLs at once: https://www.bing.com, https://www.google.com, https://www.bing.com.
    3. Expected Result:
      • https://www.bing.com is added once.
      • Error messages are shown for the pre-existing https://www.google.com and the duplicate https://www.bing.com.
      • The final list in the node should contain https://www.google.com (from before) and https://www.bing.com (added now), with no new duplicates.
  • Scenario: Adding multiple unique URLs

    1. Select a node that accepts web pages.
    2. Add several different URLs at the same time (e.g., https://example.com, https://test.com, https://another-site.org).
    3. Expected Result: All unique URLs should be added to the node's list successfully and without any errors.

✨ Prompt for AI Agents

Use the following prompts with Cursor or Claude Code to automate E2E testing:

📝 E2E Test Generation Prompt
You are an expert QA engineer. Your task is to write a new E2E test file using Playwright and TypeScript for a web-based workflow designer application. The test will validate the functionality introduced in a recent Pull Request.

**The PR added duplicate URL detection when adding web pages to a node in the workflow designer.**

Please generate the complete test file (`add-web-pages.spec.ts`) based on the detailed instructions below.

---

### **1. Context Summary**

*   **PR Functionality:** The change prevents users from adding the same web page URL to a "Web Page Node" more than once. The detection logic checks for two conditions:
    1.  **Batch Duplicates:** If a user pastes multiple identical URLs at the same time, only one is added.
    2.  **Existing Duplicates:** If a user tries to add a URL that already exists in the node's list of web pages, an error is shown, and the duplicate is not added.
*   **Key User Flow:** The user interacts with a "Web Page Node" on the canvas, opens an interface to add URLs, enters one or more URLs, and submits them.
*   **Critical Paths to Test:**
    *   Successfully adding unique URLs.
    *   Attempting to add a single duplicate URL and verifying the error message.
    *   Attempting to add a batch of URLs containing duplicates and verifying only unique ones are added.

---

### **2. Test Scenarios**

Create a test suite named `Feature: Add Web Pages - Duplicate Detection` that includes the following scenarios:

*   **Setup:** Each test should begin with a clean state. Use a `test.beforeEach` block to:
    1.  Navigate to the workflow designer.
    2.  Create a new workflow.
    3.  Add a "Web Page" node to the canvas. This node will be the target for all tests.

*   **Scenario 1: Happy Path - Adding Distinct URLs**
    *   **Description:** This is a regression test to ensure existing functionality is not broken.
    *   **Steps:**
        1.  Open the "Add Web Page" input for the node.
        2.  Enter two *different*, valid URLs (e.g., `https://google.com` and `https://bing.com`).
        3.  Submit the URLs.
    *   **Expected Result:** Both URLs should be added successfully as two distinct items within the node's content area.

*   **Scenario 2: Edge Case - Sequential Duplicate**
    *   **Description:** Tests adding a URL that already exists.
    *   **Steps:**
        1.  Add a single URL (e.g., `https://example.com`).
        2.  Verify it was added successfully.
        3.  Attempt to add the *exact same URL* again (`https://example.com`).
    *   **Expected Result:** An error message (e.g., a toast notification) containing the text "Duplicate URL" should appear. The node should still only contain the first instance of the URL.

*   **Scenario 3: Edge Case - Batch Duplicates**
    *   **Description:** Tests the `batchSeen` logic by pasting multiple identical URLs at once.
    *   **Steps:**
        1.  Open the "Add Web Page" input.
        2.  Enter a list of URLs containing duplicates, separated by newlines (e.g., `https://github.com\nhttps://github.com\nhttps://gitlab.com`).
        3.  Submit the URLs.
    *   **Expected Result:** Only the unique URLs (`https://github.com` and `https://gitlab.com`) should be added. The node should contain exactly two web page items.

*   **Scenario 4: Edge Case - Mixed Duplicates (Existing + Batch)**
    *   **Description:** A comprehensive test covering both duplicate detection mechanisms.
    *   **Steps:**
        1.  Add a single URL (`https://first.com`) and verify it was added.
        2.  In a second action, attempt to add a batch of URLs: `https://first.com\nhttps://second.com`.
    *   **Expected Result:**
        *   An error message for the duplicate (`https://first.com`) should be displayed.
        *   The new, unique URL (`https://second.com`) should be added successfully.
        *   The node should now contain two items in total: `https://first.com` and `https://second.com`.

---

### **3. Playwright Implementation Instructions**

*   **Test File:** `tests/e2e/add-web-pages.spec.ts`
*   **Selectors:** Use `data-testid` attributes for reliable selectors. Assume the following exist:
    *   Web Page Node on canvas: `[data-testid="node-web-page"]`
    *   Button to open URL input: `[data-testid="add-web-page-button"]` (or similar trigger on the node)
    *   URL input field (likely a `textarea`): `[data-testid="web-page-url-input"]`
    *   Submit button for URLs: `[data-testid="submit-web-pages-button"]`
    *   A container for each added web page item: `[data-testid="web-page-item"]`
    *   The URL text within a web page item: `[data-testid="web-page-item-url"]`
    *   Error toast notification: `[data-testid="toast-error-message"]`

*   **User Interactions & Assertions:**
    *   Use `page.locator(...).click()` for clicks.
    *   Use `page.locator(...).fill()` to enter URLs into the textarea. Use `\n` for newlines to simulate pasting multiple URLs.
    *   Use `expect(page.locator(...)).toHaveCount(N)` to verify the number of web page items.
    *   Use `expect(page.locator(...)).toContainText('...')` to check for error messages and URL content.
    *   Use `expect(page.locator(...)).toBeVisible()` to confirm elements appear on the page.

*   **Example Code Snippet (for Scenario 2):**

```typescript
import { test, expect } from '@playwright/test';

// Assume a beforeEach block sets up the page with a Web Page Node.

test('should show an error when adding a duplicate URL sequentially', async ({ page }) => {
  const urlInput = page.locator('[data-testid="web-page-url-input"]');
  const submitButton = page.locator('[data-testid="submit-web-pages-button"]');
  const webPageItems = page.locator('[data-testid="web-page-item"]');
  const errorToast = page.locator('[data-testid="toast-error-message"]');

  const duplicateUrl = 'https://example.com';

  // 1. Add the URL for the first time
  await page.locator('[data-testid="add-web-page-button"]').click();
  await urlInput.fill(duplicateUrl);
  await submitButton.click();

  // 2. Verify it was added
  await expect(webPageItems).toHaveCount(1);
  await expect(webPageItems.first()).toContainText(duplicateUrl);

  // 3. Attempt to add the same URL again
  await page.locator('[data-testid="add-web-page-button"]').click();
  await urlInput.fill(duplicateUrl);
  await submitButton.click();

  // 4. Assert error is shown and no new item is added
  await expect(errorToast).toBeVisible();
  await expect(errorToast).toContainText(`Duplicate URL: ${duplicateUrl}`);
  await expect(webPageItems).toHaveCount(1); // Should still be 1
});

4. MCP Integration Guidelines

  • Command Structure: The tests should be runnable via Playwright MCP. The standard command will be used.
    mcp test -- "tests/e2e/add-web-pages.spec.ts"
  • Environment: The tests should not rely on any specific environment variables beyond the standard BASE_URL which is typically configured in playwright.config.ts.

5. CI-Ready Code Requirements

  • Organization: Structure the tests within a test.describe('Feature: Add Web Pages - Duplicate Detection', () => { ... }); block.
  • Naming Conventions: Use descriptive test names that clearly state the intent and expected outcome (e.g., test('should add unique URLs from a batch containing duplicates', ...))
  • Atomicity: Ensure each test is is self-contained and does not depend on the state of other tests. The test.beforeEach hook is critical for this.
  • Clarity: Write clean, readable code with comments explaining complex steps or assertions if necessary. Use constants for repeated values like URLs or selector strings.
  • Error Handling: Rely on Playwright's built-in auto-waiting and assertion retries. No custom try/catch blocks are needed for standard element interactions.

</details>

---

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds duplicate URL detection to the useAddWebPages hook to prevent the same URL from being added multiple times to a web page node. The implementation follows the established pattern from useUploadFile, using a batchSeen Set to track URLs within the current batch and checking against existing URLs in the node's content.

Changes:

  • Added batchSeen Set to track URLs processed in the current batch
  • Added duplicate check that compares against both URLs in the current batch and existing URLs in the node
  • Added error reporting for duplicate URLs using the onError callback

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

- Normalize existing webpage URLs before comparison to catch equivalent
  URLs with different string representations (e.g. trailing slash)
- Lowercase "duplicate" in error message to match useUploadFile pattern
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
internal-packages/workflow-designer-ui/src/app-designer/store/usecases/use-add-web-pages.ts (1)

61-63: existingUrls rebuild on every iteration is redundant with batchSeen.

Because updateNodeDataContent synchronously commits to the Zustand store before the await call, the node re-fetched at line 54 on the next iteration already contains the URL just added. This means existingUrls already covers the within-batch case — making batchSeen (line 64) partially redundant, and vice-versa.

Both the rebuild cost (O(M) per iteration → O(N·M) total) and the redundancy can be avoided by constructing existingUrls once before the loop from the initial node snapshot, and relying solely on batchSeen for the within-batch case:

♻️ Proposed refactor: build `existingUrls` once before the loop
+		const initialNode = store.getState().nodes.find((n) => n.id === args.nodeId) as
+			| WebPageNode
+			| undefined;
+		if (!initialNode || initialNode.content.type !== "webPage") {
+			return;
+		}
+		const existingUrls = new Set(
+			initialNode.content.webpages.map((w) => normalizeHttpsUrl(w.url) ?? w.url),
+		);
+
 		const batchSeen = new Set<string>();

 		for (const url of normalizedUrls) {
 			const node = store.getState().nodes.find((n) => n.id === args.nodeId) as
 				| WebPageNode
 				| undefined;
 			if (!node || node.content.type !== "webPage") {
 				return;
 			}

-			const existingUrls = new Set(
-				node.content.webpages.map((w) => normalizeHttpsUrl(w.url) ?? w.url),
-			);
 			if (batchSeen.has(url) || existingUrls.has(url)) {
 				args.onError?.(`duplicate URL: ${url}`);
 				continue;
 			}
 			batchSeen.add(url);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@internal-packages/workflow-designer-ui/src/app-designer/store/usecases/use-add-web-pages.ts`
around lines 61 - 63, The code rebuilds existingUrls inside the loop causing
O(N·M) cost and redundancy with batchSeen; modify useAddWebPages so that
existingUrls is constructed once from the initial node snapshot (using
node.content.webpages and normalizeHttpsUrl) before entering the loop, then
remove the per-iteration rebuild and use batchSeen exclusively to track URLs
added within the current batch; keep updateNodeDataContent as-is (it will still
commit to Zustand) but ensure the loop only checks existingUrls and batchSeen to
decide skips.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@internal-packages/workflow-designer-ui/src/app-designer/store/usecases/use-add-web-pages.ts`:
- Line 65: The error message passed to args.onError in the duplicate-URL branch
should match sentence case like the other messages; update the string in the
args.onError call inside the add-web-pages logic to "Duplicate URL: ${url}"
(capital D) so it matches the other messages ("Please enter..." and "Invalid URL
format...") — locate the args.onError?.(`duplicate URL: ${url}`) invocation in
use-add-web-pages.ts and change the message to "Duplicate URL: ${url}".

---

Nitpick comments:
In
`@internal-packages/workflow-designer-ui/src/app-designer/store/usecases/use-add-web-pages.ts`:
- Around line 61-63: The code rebuilds existingUrls inside the loop causing
O(N·M) cost and redundancy with batchSeen; modify useAddWebPages so that
existingUrls is constructed once from the initial node snapshot (using
node.content.webpages and normalizeHttpsUrl) before entering the loop, then
remove the per-iteration rebuild and use batchSeen exclusively to track URLs
added within the current batch; keep updateNodeDataContent as-is (it will still
commit to Zustand) but ensure the loop only checks existingUrls and batchSeen to
decide skips.

Copy link
Member

@shige shige left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution! The duplicate detection logic itself is correct and follows the existing pattern well.

However, there's a performance concern that should be addressed before merging:

Performance Issue: O(N²) Complexity in Duplicate URL Check

The existingUrls Set is reconstructed inside the for (const url of normalizedUrls) loop (lines 61-63). This leads to O(N × M) complexity, where N is the number of new URLs and M is the number of existing URLs in the node.

When a user attempts to add a large batch of URLs (e.g., 100+ URLs to a node that already has many entries), this will block the browser's main thread, making the UI unresponsive.

Suggested fix: Move the initialization of existingUrls outside of the loop.

const batchSeen = new Set<string>();

// Build existingUrls once before the loop
const node = store.getState().nodes.find((n) => n.id === args.nodeId) as
    | WebPageNode
    | undefined;
if (!node || node.content.type !== "webPage") {
    return;
}
const existingUrls = new Set(
    node.content.webpages.map((w) => normalizeHttpsUrl(w.url) ?? w.url),
);

for (const url of normalizedUrls) {
    if (batchSeen.has(url) || existingUrls.has(url)) {
        args.onError?.(`duplicate URL: ${url}`);
        continue;
    }
    batchSeen.add(url);
    // ...
}

Please address this and I'll be happy to approve!

Build the existingUrls Set once before the loop instead of
reconstructing it on every iteration. Addresses reviewer feedback
on PR giselles-ai#2740.
@apoorvdarshan
Copy link
Contributor Author

Thanks for the review! I've addressed the performance concern:

  • Moved existingUrls Set initialization outside the loop so it's built once (O(N + M)) instead of reconstructed every iteration (O(N × M))
  • The node is still re-fetched inside the loop for mutations since state changes between iterations, but the dedup check is now O(1) per URL

Please take another look when you get a chance!

@apoorvdarshan apoorvdarshan requested a review from shige February 27, 2026 07:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: Missing duplicate URL detection when adding web pages

3 participants