Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
18 commits
Select commit Hold shift + click to select a range
8e22daa
fix: update guidelines for document parsing to prevent duplicate memo…
pratham-kpatil Mar 14, 2026
4e12a42
fix: enhance document parsing by adding deduplication logic and impro…
pratham-kpatil Mar 14, 2026
e6e73a0
fix: improve document import process with enhanced status messaging a…
pratham-kpatil Mar 14, 2026
aad22b8
fix: add observability to document parser with telemetry integration
pratham-kpatil Mar 14, 2026
3ce150c
fix: enhance document parsing with abort handling and debounce logic …
pratham-kpatil Mar 14, 2026
a47c8d0
fix: refactor document import logic for improved error handling and d…
pratham-kpatil Mar 14, 2026
5acf4bb
fix: enhance document import dialog with duplicate memory preview and…
pratham-kpatil Mar 14, 2026
a532bee
chore: clean up comments in document import dialog
mikr13 Mar 14, 2026
2141fbe
chore: Clean up import-dialog-shared.tsx by removing comments
mikr13 Mar 14, 2026
dfe6680
chore: remove comment on pdfjs module caching
mikr13 Mar 14, 2026
f4295ad
refactor: PDF text extraction and normalization functions
mikr13 Mar 14, 2026
543cec2
chore: remove telemetry debug code from document-parser.ts
mikr13 Mar 14, 2026
304395e
chore: Remove comment on secondary dedup key
mikr13 Mar 14, 2026
21ae305
chore: Remove documentation for findDuplicates function
mikr13 Mar 14, 2026
b6c30de
Remove comment from BaseImport type
mikr13 Mar 14, 2026
3f61c12
fix: typo
mikr13 Mar 14, 2026
36a27b1
Fix typo in doc parser prompt update message
mikr13 Mar 14, 2026
e0bdbf7
fix: typo
mikr13 Mar 14, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .changeset/fifty-words-lick.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
"superfill.ai": patch
---

add observability in the document parser
5 changes: 5 additions & 0 deletions .changeset/huge-suns-prove.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
"superfill.ai": minor
---

Updated doc parser prompt to stop redundant memory creation and added logic for deduplication of memory
5 changes: 5 additions & 0 deletions .changeset/tender-months-wish.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
"superfill.ai": major
---

Updated the deduplication logic to give the preview of the new and existing duplicate memory in the UI
151 changes: 103 additions & 48 deletions src/components/features/document/document-import-dialog.tsx
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
import { FileTextIcon, UploadIcon } from "lucide-react";
import { useRef, useState } from "react";
import { useCallback, useRef, useState } from "react";
import { toast } from "sonner";
import {
ImportDialogFooter,
ImportDialogShell,
Expand All @@ -10,13 +11,15 @@ import {
} from "@/components/features/import/import-dialog-shared";
import { Button } from "@/components/ui/button";
import { useImportDialog } from "@/hooks/use-import-dialog";
import { useMemories } from "@/hooks/use-memories";
import {
convertToImportItems,
type DocumentImportItem,
type DocumentParserStatus,
parseDocument,
} from "@/lib/document/document-parser";
import { createLogger } from "@/lib/logger";
import { findDuplicates } from "@/lib/storage/memories";

const logger = createLogger("component:document-import-dialog");

Expand All @@ -43,6 +46,7 @@ function getDescription(status: DocumentParserStatus): string {
case "success":
return "Select the information you want to import.";
case "reading":
return "Reading your document...";
case "parsing":
return "AI is extracting your information...";
case "error":
Expand All @@ -63,6 +67,11 @@ export function DocumentImportDialog({
}: DocumentImportDialogProps) {
const fileInputRef = useRef<HTMLInputElement>(null);
const [fileName, setFileName] = useState<string | null>(null);
const lastImportKeyRef = useRef<string | null>(null);
const lastImportTimeRef = useRef<number>(0);
const abortControllerRef = useRef<AbortController | null>(null);

const { entries: existingMemories } = useMemories();

const {
status,
Expand Down Expand Up @@ -91,65 +100,111 @@ export function DocumentImportDialog({

const progress = PROGRESS_BY_STATUS[status];

const handleFileSelect = async (
event: React.ChangeEvent<HTMLInputElement>,
) => {
const file = event.target.files?.[0];
if (!file) return;

const isPdf =
file.type === "application/pdf" ||
file.name.toLowerCase().endsWith(".pdf");
const isTxt =
file.type === "text/plain" || file.name.toLowerCase().endsWith(".txt");

if (!isPdf && !isTxt) {
setError("Please select a PDF or text file");
setStatus("error");
event.target.value = "";
return;
}

setFileName(file.name);
setStatus("parsing");
setError(null);
setImportItems([]);

const currentRequestId = ++requestIdRef.current;

try {
const result = await parseDocument(file);
const handleFileSelect = useCallback(
async (event: React.ChangeEvent<HTMLInputElement>) => {
const file = event.target.files?.[0];
if (!file) return;

if (requestIdRef.current !== currentRequestId) return;
const isPdf =
file.type === "application/pdf" ||
file.name.toLowerCase().endsWith(".pdf");
const isTxt =
file.type === "text/plain" || file.name.toLowerCase().endsWith(".txt");

if (!result.success || !result.items) {
if (!isPdf && !isTxt) {
setError("Please select a PDF or text file");
setStatus("error");
setError(result.error || "Failed to extract data from document");
event.target.value = "";
return;
}

const items = convertToImportItems(result.items);
setImportItems(items);
setStatus("success");
const importKey = `${file.name}:${file.size}`;
const now = Date.now();

if (
importKey === lastImportKeyRef.current &&
now - lastImportTimeRef.current < 5_000
) {
logger.debug("Skipping duplicate import for:", file.name);
event.target.value = "";
return;
}

lastImportKeyRef.current = importKey;
lastImportTimeRef.current = now;
abortControllerRef.current?.abort();
const controller = new AbortController();
abortControllerRef.current = controller;
Comment on lines +133 to +137

logger.debug("Successfully extracted document data:", items.length);
} catch (err) {
if (requestIdRef.current !== currentRequestId) return;
setFileName(file.name);
setStatus("reading");
setError(null);
setImportItems([]);

logger.error("Import error:", err);
setStatus("error");
setError(
err instanceof Error ? err.message : "An unexpected error occurred",
);
}
const currentRequestId = ++requestIdRef.current;
const requestId = String(currentRequestId);

if (fileInputRef.current) {
fileInputRef.current.value = "";
}
};
try {
const result = await parseDocument(file, {
Comment on lines +147 to +148
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Ensure file input is reset on all exit paths.

There are early returns in the try/catch flow (e.g., cancellation / AbortError) before Line [203], so the file input may not be cleared. That can prevent selecting the same file again.

🔧 Suggested fix
-try {
+try {
   // existing logic
 } catch (err) {
   // existing logic
+} finally {
+  if (fileInputRef.current) {
+    fileInputRef.current.value = "";
+  }
 }
-
-if (fileInputRef.current) {
-  fileInputRef.current.value = "";
-}

Also applies to: 163-170, 190-205

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/components/features/document/document-import-dialog.tsx` around lines 150
- 151, The file input value is not cleared on all early-exit paths in the
try/catch around the parseDocument(file, ...) call in DocumentImportDialog,
which prevents re-selecting the same file; update the handler so that every exit
path (success, AbortError/cancellation, other errors, and any early returns)
clears the file input and resets any selectedFile state. Concretely, locate the
function containing parseDocument(file, ...) and ensure you either call a shared
cleanup helper (e.g., resetFileInput()) or directly set
fileInputRef.current.value = '' and setSelectedFile(null) in a finally block or
immediately before each return/throw (including AbortError handling), so the
input is always reset.

requestId,
signal: controller.signal,
onStageChange: (stage) => {
if (requestIdRef.current !== currentRequestId) return;
setStatus(stage);
},
});

if (requestIdRef.current !== currentRequestId) return;

if (!result.success || !result.items) {
if (result.error === "cancelled") return;

const errorMsg =
result.error || "Failed to extract data from document";
setStatus("error");
setError(errorMsg);
toast.error(errorMsg);
return;
}

const items = convertToImportItems(result.items);
const duplicatesMap = await findDuplicates(items, existingMemories);
const enrichedItems = items.map((item, i) => {
const duplicate = duplicatesMap.get(i);
return duplicate ? { ...item, existingDuplicate: duplicate } : item;
});

setImportItems(enrichedItems);
setStatus("success");
Comment on lines +171 to +178
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add stale-request guard after duplicate lookup.

findDuplicates(...) is awaited, but there’s no request-id recheck afterward. A newer import can start during that await, and this older request can still overwrite UI state.

🔧 Suggested fix
 const duplicatesMap = await findDuplicates(items, existingMemories);
+if (requestIdRef.current !== currentRequestId) return;

 const enrichedItems = items.map((item, i) => {
   const duplicate = duplicatesMap.get(i);
   return duplicate ? { ...item, existingDuplicate: duplicate } : item;
 });
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/components/features/document/document-import-dialog.tsx` around lines 176
- 183, After awaiting findDuplicates, the handler can overwrite UI if a newer
import started; capture the current request identifier before the await (e.g.,
const currentRequestId = importRequestIdRef.current or a local requestId
variable used by the component) and immediately after the await check that the
global/latest request id still equals currentRequestId; if it does not match,
return early and do not call setImportItems or setStatus. Apply this guard
around the block that computes duplicatesMap, enrichedItems and calls
setImportItems/setStatus (referencing findDuplicates, duplicatesMap,
enrichedItems, setImportItems, setStatus).


logger.debug(
`[req:${requestId}] Successfully extracted document data:`,
items.length,
"items",
);
Comment on lines +148 to +184
} catch (err) {
if (requestIdRef.current !== currentRequestId) return;
if (err instanceof Error && err.name === "AbortError") return;

const errMsg =
err instanceof Error ? err.message : "An unexpected error occurred";
logger.error(`[req:${requestId}] Import error:`, err);
setStatus("error");
setError(errMsg);
toast.error(errMsg);
}

if (fileInputRef.current) {
fileInputRef.current.value = "";
}
},
[requestIdRef, setStatus, setError, setImportItems, existingMemories],
);

const handleCloseWrapper = (open: boolean) => {
if (!open) {
abortControllerRef.current?.abort();
abortControllerRef.current = null;
setFileName(null);
}
handleClose(open);
Expand Down
103 changes: 96 additions & 7 deletions src/components/features/import/import-dialog-shared.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ import {
} from "@/components/ui/dialog";
import { Progress } from "@/components/ui/progress";
import { ScrollArea } from "@/components/ui/scroll-area";
import { cn } from "@/lib/cn";
import type { BaseImportItem } from "@/types/import";
import type { AllowedCategory } from "@/types/memory";

Expand Down Expand Up @@ -79,6 +80,7 @@ export function ImportItemsList<T extends BaseImportItem>({
);

const allSelected = items.every((item) => item.selected);
const duplicateCount = items.filter((item) => item.existingDuplicate).length;

return (
<div className="space-y-4">
Expand All @@ -88,6 +90,15 @@ export function ImportItemsList<T extends BaseImportItem>({
<span className="text-sm font-medium">
Found {items.length} items
</span>
{duplicateCount > 0 && (
<Badge
variant="outline"
size="sm"
className="border-amber-400 text-amber-600 bg-amber-50 dark:bg-amber-950/30"
>
{duplicateCount} duplicate{duplicateCount > 1 ? "s" : ""}
</Badge>
)}
{headerExtra}
</div>
<Button
Expand Down Expand Up @@ -122,7 +133,11 @@ export function ImportItemsList<T extends BaseImportItem>({
{categoryItems?.map((item) => (
<div
key={item.id}
className="flex items-start gap-3 p-2 rounded-lg hover:bg-muted/50 transition-colors w-full"
className={cn(
"flex items-start gap-3 p-2 rounded-lg hover:bg-muted/50 transition-colors w-full",
item.existingDuplicate &&
"border border-amber-200 dark:border-amber-800/50 bg-amber-50/50 dark:bg-amber-950/10",
)}
>
<Checkbox
id={`${itemIdPrefix}-${item.id}`}
Expand All @@ -134,12 +149,86 @@ export function ImportItemsList<T extends BaseImportItem>({
htmlFor={`${itemIdPrefix}-${item.id}`}
className="flex-1 min-w-0 cursor-pointer"
>
<p className="text-sm font-medium truncate">
{item.label}
</p>
<p className="text-xs text-muted-foreground line-clamp-2">
{item.answer}
</p>
<div className="flex items-center gap-1.5 flex-wrap">
<p className="text-sm font-medium truncate">
{item.label}
</p>
{item.existingDuplicate && (
<Badge
variant="outline"
size="sm"
className="shrink-0 border-amber-400 text-amber-600 bg-amber-50 dark:bg-amber-950/30"
>
duplicate
</Badge>
)}
</div>
{item.question &&
item.question.toLowerCase() !==
item.label.toLowerCase() && (
<p className="text-xs text-muted-foreground/70 italic truncate">
{item.question}
</p>
)}
<div
className={cn(
"mt-0.5",
item.existingDuplicate && "space-y-1.5",
)}
>
{item.existingDuplicate && (
<p className="text-[11px] font-medium text-muted-foreground/60 uppercase tracking-wide">
New
</p>
)}
<p className="text-xs text-muted-foreground line-clamp-2">
{item.answer}
</p>
{item.existingDuplicate && (
<>
<p className="text-[11px] font-medium text-muted-foreground/60 uppercase tracking-wide mt-1">
Currently saved
</p>
<p className="text-xs text-muted-foreground/70 line-clamp-2 italic">
{item.existingDuplicate.answer}
</p>
<div className="flex gap-1.5 mt-2">
<button
type="button"
onClick={(e) => {
e.preventDefault();
e.stopPropagation();
if (!item.selected) onToggleItem(item.id);
}}
className={cn(
"text-[11px] px-2 py-0.5 rounded border font-medium transition-colors",
item.selected
? "bg-primary text-primary-foreground border-primary"
: "bg-background text-muted-foreground border-border hover:border-primary hover:text-primary",
)}
>
Use new
</button>
<button
type="button"
onClick={(e) => {
e.preventDefault();
e.stopPropagation();
if (item.selected) onToggleItem(item.id);
}}
className={cn(
"text-[11px] px-2 py-0.5 rounded border font-medium transition-colors",
!item.selected
? "bg-primary text-primary-foreground border-primary"
: "bg-background text-muted-foreground border-border hover:border-primary hover:text-primary",
)}
>
Keep existing
</button>
Comment on lines +196 to +227
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Expose toggle state for assistive technologies.

“Use new” / “Keep existing” behave like toggle buttons, but they don’t expose pressed state (aria-pressed), which makes the current choice unclear for screen-reader users.

♿ Suggested fix
 <button
   type="button"
+  aria-pressed={item.selected}
+  aria-label={`Use new value for ${item.label}`}
   onClick={(e) => {
     e.preventDefault();
     e.stopPropagation();
     if (!item.selected) onToggleItem(item.id);
   }}
@@
 <button
   type="button"
+  aria-pressed={!item.selected}
+  aria-label={`Keep existing value for ${item.label}`}
   onClick={(e) => {
     e.preventDefault();
     e.stopPropagation();
     if (item.selected) onToggleItem(item.id);
   }}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
<button
type="button"
onClick={(e) => {
e.preventDefault();
e.stopPropagation();
if (!item.selected) onToggleItem(item.id);
}}
className={cn(
"text-[11px] px-2 py-0.5 rounded border font-medium transition-colors",
item.selected
? "bg-primary text-primary-foreground border-primary"
: "bg-background text-muted-foreground border-border hover:border-primary hover:text-primary",
)}
>
Use new
</button>
<button
type="button"
onClick={(e) => {
e.preventDefault();
e.stopPropagation();
if (item.selected) onToggleItem(item.id);
}}
className={cn(
"text-[11px] px-2 py-0.5 rounded border font-medium transition-colors",
!item.selected
? "bg-primary text-primary-foreground border-primary"
: "bg-background text-muted-foreground border-border hover:border-primary hover:text-primary",
)}
>
Keep existing
</button>
<button
type="button"
aria-pressed={item.selected}
aria-label={`Use new value for ${item.label}`}
onClick={(e) => {
e.preventDefault();
e.stopPropagation();
if (!item.selected) onToggleItem(item.id);
}}
className={cn(
"text-[11px] px-2 py-0.5 rounded border font-medium transition-colors",
item.selected
? "bg-primary text-primary-foreground border-primary"
: "bg-background text-muted-foreground border-border hover:border-primary hover:text-primary",
)}
>
Use new
</button>
<button
type="button"
aria-pressed={!item.selected}
aria-label={`Keep existing value for ${item.label}`}
onClick={(e) => {
e.preventDefault();
e.stopPropagation();
if (item.selected) onToggleItem(item.id);
}}
className={cn(
"text-[11px] px-2 py-0.5 rounded border font-medium transition-colors",
!item.selected
? "bg-primary text-primary-foreground border-primary"
: "bg-background text-muted-foreground border-border hover:border-primary hover:text-primary",
)}
>
Keep existing
</button>
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/components/features/import/import-dialog-shared.tsx` around lines 199 -
230, The two toggle-like buttons rendered near the "Use new" / "Keep existing"
labels don't expose their pressed state for assistive tech; update the button
elements in this component (the buttons that call onToggleItem(item.id) and use
the item.selected condition) to include aria-pressed set to a boolean reflecting
the current choice (e.g., aria-pressed={item.selected} for "Use new" and
aria-pressed={!item.selected} for "Keep existing"), leaving existing onClick
logic intact so screen readers can see which option is pressed.

</div>
Comment on lines +196 to +228
</>
)}
Comment on lines +199 to +230
</div>
</label>
</div>
))}
Expand Down
9 changes: 7 additions & 2 deletions src/hooks/use-import-dialog.ts
Original file line number Diff line number Diff line change
Expand Up @@ -108,13 +108,18 @@ export function useImportDialog<
description: successDescription,
});

logger.debug("Successfully imported memories:", entries.length);
logger.debug(
`Import success — saved: ${entries.length}, total selected: ${selectedItems.length}`,
);

resetState();
onOpenChange(false);
onSuccess?.();
} catch (err) {
logger.error("Failed to save memories:", err);
logger.error(
`Import error — attempted: ${importItems.filter((i) => i.selected).length} items —`,
err,
);
Comment on lines +119 to +122
toast.error(
err instanceof Error ? err.message : "Failed to save memories",
);
Expand Down
Loading
Loading