Skip to content
Merged
Show file tree
Hide file tree
Changes from 7 commits
Commits
Show all changes
18 commits
Select commit Hold shift + click to select a range
8e22daa
fix: update guidelines for document parsing to prevent duplicate memo…
pratham-kpatil Mar 14, 2026
4e12a42
fix: enhance document parsing by adding deduplication logic and impro…
pratham-kpatil Mar 14, 2026
e6e73a0
fix: improve document import process with enhanced status messaging a…
pratham-kpatil Mar 14, 2026
aad22b8
fix: add observability to document parser with telemetry integration
pratham-kpatil Mar 14, 2026
3ce150c
fix: enhance document parsing with abort handling and debounce logic …
pratham-kpatil Mar 14, 2026
a47c8d0
fix: refactor document import logic for improved error handling and d…
pratham-kpatil Mar 14, 2026
5acf4bb
fix: enhance document import dialog with duplicate memory preview and…
pratham-kpatil Mar 14, 2026
a532bee
chore: clean up comments in document import dialog
mikr13 Mar 14, 2026
2141fbe
chore: Clean up import-dialog-shared.tsx by removing comments
mikr13 Mar 14, 2026
dfe6680
chore: remove comment on pdfjs module caching
mikr13 Mar 14, 2026
f4295ad
refactor: PDF text extraction and normalization functions
mikr13 Mar 14, 2026
543cec2
chore: remove telemetry debug code from document-parser.ts
mikr13 Mar 14, 2026
304395e
chore: Remove comment on secondary dedup key
mikr13 Mar 14, 2026
21ae305
chore: Remove documentation for findDuplicates function
mikr13 Mar 14, 2026
b6c30de
Remove comment from BaseImport type
mikr13 Mar 14, 2026
3f61c12
fix: typo
mikr13 Mar 14, 2026
36a27b1
Fix typo in doc parser prompt update message
mikr13 Mar 14, 2026
e0bdbf7
fix: typo
mikr13 Mar 14, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .changeset/fifty-words-lick.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
"superfill.ai": patch
---

add obersvability in the document parser
Comment thread
coderabbitai[bot] marked this conversation as resolved.
Outdated
5 changes: 5 additions & 0 deletions .changeset/huge-suns-prove.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
"superfill.ai": minor
---

Updated doc parser prompt to stop redeundant memory creation and added logic for deduplication of memory
Comment thread
coderabbitai[bot] marked this conversation as resolved.
Outdated
5 changes: 5 additions & 0 deletions .changeset/tender-months-wish.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
"superfill.ai": major
---

Updated the deduplication logic to give the preview of the new and exisitng duplicate memory in the UI
Comment thread
mikr13 marked this conversation as resolved.
Outdated
158 changes: 110 additions & 48 deletions src/components/features/document/document-import-dialog.tsx
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
import { FileTextIcon, UploadIcon } from "lucide-react";
import { useRef, useState } from "react";
import { useCallback, useRef, useState } from "react";
import { toast } from "sonner";
import {
ImportDialogFooter,
ImportDialogShell,
Expand All @@ -10,13 +11,15 @@ import {
} from "@/components/features/import/import-dialog-shared";
import { Button } from "@/components/ui/button";
import { useImportDialog } from "@/hooks/use-import-dialog";
import { useMemories } from "@/hooks/use-memories";
import {
convertToImportItems,
type DocumentImportItem,
type DocumentParserStatus,
parseDocument,
} from "@/lib/document/document-parser";
import { createLogger } from "@/lib/logger";
import { findDuplicates } from "@/lib/storage/memories";

const logger = createLogger("component:document-import-dialog");

Expand All @@ -43,6 +46,7 @@ function getDescription(status: DocumentParserStatus): string {
case "success":
return "Select the information you want to import.";
case "reading":
return "Reading your document...";
case "parsing":
return "AI is extracting your information...";
case "error":
Expand All @@ -63,6 +67,13 @@ export function DocumentImportDialog({
}: DocumentImportDialogProps) {
const fileInputRef = useRef<HTMLInputElement>(null);
const [fileName, setFileName] = useState<string | null>(null);
// Tracks the last successfully initiated import to debounce same-file re-imports
const lastImportKeyRef = useRef<string | null>(null);
const lastImportTimeRef = useRef<number>(0);
// AbortController for cancelling an in-flight parse when the dialog closes
const abortControllerRef = useRef<AbortController | null>(null);

const { entries: existingMemories } = useMemories();

const {
status,
Expand Down Expand Up @@ -91,65 +102,116 @@ export function DocumentImportDialog({

const progress = PROGRESS_BY_STATUS[status];

const handleFileSelect = async (
event: React.ChangeEvent<HTMLInputElement>,
) => {
const file = event.target.files?.[0];
if (!file) return;

const isPdf =
file.type === "application/pdf" ||
file.name.toLowerCase().endsWith(".pdf");
const isTxt =
file.type === "text/plain" || file.name.toLowerCase().endsWith(".txt");

if (!isPdf && !isTxt) {
setError("Please select a PDF or text file");
setStatus("error");
event.target.value = "";
return;
}

setFileName(file.name);
setStatus("parsing");
setError(null);
setImportItems([]);
const handleFileSelect = useCallback(
async (event: React.ChangeEvent<HTMLInputElement>) => {
const file = event.target.files?.[0];
if (!file) return;

const currentRequestId = ++requestIdRef.current;
const isPdf =
file.type === "application/pdf" ||
file.name.toLowerCase().endsWith(".pdf");
const isTxt =
file.type === "text/plain" || file.name.toLowerCase().endsWith(".txt");

try {
const result = await parseDocument(file);

if (requestIdRef.current !== currentRequestId) return;

if (!result.success || !result.items) {
if (!isPdf && !isTxt) {
setError("Please select a PDF or text file");
setStatus("error");
setError(result.error || "Failed to extract data from document");
event.target.value = "";
return;
}

const items = convertToImportItems(result.items);
setImportItems(items);
setStatus("success");
// Debounce: skip if the same file (name+size) was already kicked off within 5 seconds
const importKey = `${file.name}:${file.size}`;
const now = Date.now();
if (
importKey === lastImportKeyRef.current &&
now - lastImportTimeRef.current < 5_000
) {
logger.debug("Skipping duplicate import for:", file.name);
event.target.value = "";
return;
}
lastImportKeyRef.current = importKey;
lastImportTimeRef.current = now;

logger.debug("Successfully extracted document data:", items.length);
} catch (err) {
if (requestIdRef.current !== currentRequestId) return;
// Cancel any previously in-flight parse
abortControllerRef.current?.abort();
const controller = new AbortController();
abortControllerRef.current = controller;
Comment on lines +133 to +137

logger.error("Import error:", err);
setStatus("error");
setError(
err instanceof Error ? err.message : "An unexpected error occurred",
);
}
setFileName(file.name);
setStatus("reading");
setError(null);
setImportItems([]);

if (fileInputRef.current) {
fileInputRef.current.value = "";
}
};
const currentRequestId = ++requestIdRef.current;
const requestId = String(currentRequestId);

try {
const result = await parseDocument(file, {
Comment on lines +147 to +148
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Ensure file input is reset on all exit paths.

There are early returns in the try/catch flow (e.g., cancellation / AbortError) before Line [203], so the file input may not be cleared. That can prevent selecting the same file again.

🔧 Suggested fix
-try {
+try {
   // existing logic
 } catch (err) {
   // existing logic
+} finally {
+  if (fileInputRef.current) {
+    fileInputRef.current.value = "";
+  }
 }
-
-if (fileInputRef.current) {
-  fileInputRef.current.value = "";
-}

Also applies to: 163-170, 190-205

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/components/features/document/document-import-dialog.tsx` around lines 150
- 151, The file input value is not cleared on all early-exit paths in the
try/catch around the parseDocument(file, ...) call in DocumentImportDialog,
which prevents re-selecting the same file; update the handler so that every exit
path (success, AbortError/cancellation, other errors, and any early returns)
clears the file input and resets any selectedFile state. Concretely, locate the
function containing parseDocument(file, ...) and ensure you either call a shared
cleanup helper (e.g., resetFileInput()) or directly set
fileInputRef.current.value = '' and setSelectedFile(null) in a finally block or
immediately before each return/throw (including AbortError handling), so the
input is always reset.

requestId,
signal: controller.signal,
onStageChange: (stage) => {
if (requestIdRef.current !== currentRequestId) return;
setStatus(stage);
},
});

if (requestIdRef.current !== currentRequestId) return;

if (!result.success || !result.items) {
// Silently ignore user-driven cancellation
if (result.error === "cancelled") return;
const errorMsg =
result.error || "Failed to extract data from document";
setStatus("error");
setError(errorMsg);
toast.error(errorMsg);
return;
}

const items = convertToImportItems(result.items);

// Enrich each item with its matching existing memory (if any)
const duplicatesMap = await findDuplicates(items, existingMemories);
const enrichedItems = items.map((item, i) => {
const duplicate = duplicatesMap.get(i);
return duplicate ? { ...item, existingDuplicate: duplicate } : item;
});

setImportItems(enrichedItems);
setStatus("success");
Comment on lines +171 to +178
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add stale-request guard after duplicate lookup.

findDuplicates(...) is awaited, but there’s no request-id recheck afterward. A newer import can start during that await, and this older request can still overwrite UI state.

🔧 Suggested fix
 const duplicatesMap = await findDuplicates(items, existingMemories);
+if (requestIdRef.current !== currentRequestId) return;

 const enrichedItems = items.map((item, i) => {
   const duplicate = duplicatesMap.get(i);
   return duplicate ? { ...item, existingDuplicate: duplicate } : item;
 });
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/components/features/document/document-import-dialog.tsx` around lines 176
- 183, After awaiting findDuplicates, the handler can overwrite UI if a newer
import started; capture the current request identifier before the await (e.g.,
const currentRequestId = importRequestIdRef.current or a local requestId
variable used by the component) and immediately after the await check that the
global/latest request id still equals currentRequestId; if it does not match,
return early and do not call setImportItems or setStatus. Apply this guard
around the block that computes duplicatesMap, enrichedItems and calls
setImportItems/setStatus (referencing findDuplicates, duplicatesMap,
enrichedItems, setImportItems, setStatus).


logger.debug(
`[req:${requestId}] Successfully extracted document data:`,
items.length,
"items",
);
Comment on lines +148 to +184
} catch (err) {
if (requestIdRef.current !== currentRequestId) return;
// Swallow AbortError — dialog was closed intentionally
if (err instanceof Error && err.name === "AbortError") return;

const errMsg =
err instanceof Error ? err.message : "An unexpected error occurred";
logger.error(`[req:${requestId}] Import error:`, err);
setStatus("error");
setError(errMsg);
toast.error(errMsg);
}

if (fileInputRef.current) {
fileInputRef.current.value = "";
}
},
[requestIdRef, setStatus, setError, setImportItems, existingMemories],
);

const handleCloseWrapper = (open: boolean) => {
if (!open) {
// Cancel any in-flight parse
abortControllerRef.current?.abort();
abortControllerRef.current = null;
setFileName(null);
}
handleClose(open);
Expand Down
106 changes: 99 additions & 7 deletions src/components/features/import/import-dialog-shared.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ import {
} from "@/components/ui/dialog";
import { Progress } from "@/components/ui/progress";
import { ScrollArea } from "@/components/ui/scroll-area";
import { cn } from "@/lib/cn";
import type { BaseImportItem } from "@/types/import";
import type { AllowedCategory } from "@/types/memory";

Expand Down Expand Up @@ -79,6 +80,7 @@ export function ImportItemsList<T extends BaseImportItem>({
);

const allSelected = items.every((item) => item.selected);
const duplicateCount = items.filter((item) => item.existingDuplicate).length;

return (
<div className="space-y-4">
Expand All @@ -88,6 +90,15 @@ export function ImportItemsList<T extends BaseImportItem>({
<span className="text-sm font-medium">
Found {items.length} items
</span>
{duplicateCount > 0 && (
<Badge
variant="outline"
size="sm"
className="border-amber-400 text-amber-600 bg-amber-50 dark:bg-amber-950/30"
>
{duplicateCount} duplicate{duplicateCount > 1 ? "s" : ""}
</Badge>
)}
{headerExtra}
</div>
<Button
Expand Down Expand Up @@ -122,7 +133,11 @@ export function ImportItemsList<T extends BaseImportItem>({
{categoryItems?.map((item) => (
<div
key={item.id}
className="flex items-start gap-3 p-2 rounded-lg hover:bg-muted/50 transition-colors w-full"
className={cn(
"flex items-start gap-3 p-2 rounded-lg hover:bg-muted/50 transition-colors w-full",
item.existingDuplicate &&
"border border-amber-200 dark:border-amber-800/50 bg-amber-50/50 dark:bg-amber-950/10",
)}
>
<Checkbox
id={`${itemIdPrefix}-${item.id}`}
Expand All @@ -134,12 +149,89 @@ export function ImportItemsList<T extends BaseImportItem>({
htmlFor={`${itemIdPrefix}-${item.id}`}
className="flex-1 min-w-0 cursor-pointer"
>
<p className="text-sm font-medium truncate">
{item.label}
</p>
<p className="text-xs text-muted-foreground line-clamp-2">
{item.answer}
</p>
<div className="flex items-center gap-1.5 flex-wrap">
<p className="text-sm font-medium truncate">
{item.label}
</p>
{item.existingDuplicate && (
<Badge
variant="outline"
size="sm"
className="shrink-0 border-amber-400 text-amber-600 bg-amber-50 dark:bg-amber-950/30"
>
duplicate
</Badge>
)}
</div>
{item.question &&
item.question.toLowerCase() !==
item.label.toLowerCase() && (
<p className="text-xs text-muted-foreground/70 italic truncate">
{item.question}
</p>
)}
{/* New value from document */}
<div
Comment thread
coderabbitai[bot] marked this conversation as resolved.
Outdated
className={cn(
"mt-0.5",
item.existingDuplicate && "space-y-1.5",
)}
>
{item.existingDuplicate && (
<p className="text-[11px] font-medium text-muted-foreground/60 uppercase tracking-wide">
New
</p>
)}
<p className="text-xs text-muted-foreground line-clamp-2">
{item.answer}
</p>
{/* Existing saved value */}
{item.existingDuplicate && (
<>
<p className="text-[11px] font-medium text-muted-foreground/60 uppercase tracking-wide mt-1">
Currently saved
</p>
<p className="text-xs text-muted-foreground/70 line-clamp-2 italic">
{item.existingDuplicate.answer}
</p>
{/* Use new / Keep existing toggle buttons */}
<div className="flex gap-1.5 mt-2">
<button
type="button"
onClick={(e) => {
e.preventDefault();
e.stopPropagation();
if (!item.selected) onToggleItem(item.id);
}}
className={cn(
"text-[11px] px-2 py-0.5 rounded border font-medium transition-colors",
item.selected
? "bg-primary text-primary-foreground border-primary"
: "bg-background text-muted-foreground border-border hover:border-primary hover:text-primary",
)}
>
Use new
</button>
<button
type="button"
onClick={(e) => {
e.preventDefault();
e.stopPropagation();
if (item.selected) onToggleItem(item.id);
}}
className={cn(
"text-[11px] px-2 py-0.5 rounded border font-medium transition-colors",
!item.selected
? "bg-primary text-primary-foreground border-primary"
: "bg-background text-muted-foreground border-border hover:border-primary hover:text-primary",
)}
>
Keep existing
</button>
Comment on lines +196 to +227
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Expose toggle state for assistive technologies.

“Use new” / “Keep existing” behave like toggle buttons, but they don’t expose pressed state (aria-pressed), which makes the current choice unclear for screen-reader users.

♿ Suggested fix
 <button
   type="button"
+  aria-pressed={item.selected}
+  aria-label={`Use new value for ${item.label}`}
   onClick={(e) => {
     e.preventDefault();
     e.stopPropagation();
     if (!item.selected) onToggleItem(item.id);
   }}
@@
 <button
   type="button"
+  aria-pressed={!item.selected}
+  aria-label={`Keep existing value for ${item.label}`}
   onClick={(e) => {
     e.preventDefault();
     e.stopPropagation();
     if (item.selected) onToggleItem(item.id);
   }}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
<button
type="button"
onClick={(e) => {
e.preventDefault();
e.stopPropagation();
if (!item.selected) onToggleItem(item.id);
}}
className={cn(
"text-[11px] px-2 py-0.5 rounded border font-medium transition-colors",
item.selected
? "bg-primary text-primary-foreground border-primary"
: "bg-background text-muted-foreground border-border hover:border-primary hover:text-primary",
)}
>
Use new
</button>
<button
type="button"
onClick={(e) => {
e.preventDefault();
e.stopPropagation();
if (item.selected) onToggleItem(item.id);
}}
className={cn(
"text-[11px] px-2 py-0.5 rounded border font-medium transition-colors",
!item.selected
? "bg-primary text-primary-foreground border-primary"
: "bg-background text-muted-foreground border-border hover:border-primary hover:text-primary",
)}
>
Keep existing
</button>
<button
type="button"
aria-pressed={item.selected}
aria-label={`Use new value for ${item.label}`}
onClick={(e) => {
e.preventDefault();
e.stopPropagation();
if (!item.selected) onToggleItem(item.id);
}}
className={cn(
"text-[11px] px-2 py-0.5 rounded border font-medium transition-colors",
item.selected
? "bg-primary text-primary-foreground border-primary"
: "bg-background text-muted-foreground border-border hover:border-primary hover:text-primary",
)}
>
Use new
</button>
<button
type="button"
aria-pressed={!item.selected}
aria-label={`Keep existing value for ${item.label}`}
onClick={(e) => {
e.preventDefault();
e.stopPropagation();
if (item.selected) onToggleItem(item.id);
}}
className={cn(
"text-[11px] px-2 py-0.5 rounded border font-medium transition-colors",
!item.selected
? "bg-primary text-primary-foreground border-primary"
: "bg-background text-muted-foreground border-border hover:border-primary hover:text-primary",
)}
>
Keep existing
</button>
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/components/features/import/import-dialog-shared.tsx` around lines 199 -
230, The two toggle-like buttons rendered near the "Use new" / "Keep existing"
labels don't expose their pressed state for assistive tech; update the button
elements in this component (the buttons that call onToggleItem(item.id) and use
the item.selected condition) to include aria-pressed set to a boolean reflecting
the current choice (e.g., aria-pressed={item.selected} for "Use new" and
aria-pressed={!item.selected} for "Keep existing"), leaving existing onClick
logic intact so screen readers can see which option is pressed.

</div>
Comment on lines +196 to +228
</>
)}
Comment on lines +199 to +230
</div>
</label>
</div>
))}
Expand Down
9 changes: 7 additions & 2 deletions src/hooks/use-import-dialog.ts
Original file line number Diff line number Diff line change
Expand Up @@ -108,13 +108,18 @@ export function useImportDialog<
description: successDescription,
});

logger.debug("Successfully imported memories:", entries.length);
logger.debug(
`Import success — saved: ${entries.length}, total selected: ${selectedItems.length}`,
);

resetState();
onOpenChange(false);
onSuccess?.();
} catch (err) {
logger.error("Failed to save memories:", err);
logger.error(
`Import error — attempted: ${importItems.filter((i) => i.selected).length} items —`,
err,
);
Comment on lines +119 to +122
toast.error(
err instanceof Error ? err.message : "Failed to save memories",
);
Expand Down
Loading
Loading