feat: product manual PDF upload for enhanced FAQ generation by antoniomtz · Pull Request #53 · NVIDIA-AI-Blueprints/Retail-Catalog-Enrichment

antoniomtz · 2026-04-17T17:02:24Z

Summary

Add stateless targeted RAG pipeline (POST /vlm/manual/extract) that extracts knowledge from product manual PDFs to generate richer FAQs (up to 10) with specs, care instructions, safety warnings, and warranty details
LLM dynamically generates product-type-specific queries from title + categories (not description) to avoid FAQ duplication
Fully stateless architecture: all embeddings computed in-memory and freed after response, supporting concurrent batch processing
"Product manual for FAQs" upload section in Advanced Options UI with staged progress indicator
New documentation: docs/POLICY_COMPLIANCE.md, docs/PRODUCT_MANUAL_FAQS.md, moved PRD.md to docs/
32 new unit tests (208 total, all passing)

Test plan

Run uv run pytest tests/ — all 208 tests pass
TypeScript check pnpm exec tsc --noEmit — no new errors (pre-existing Spinner aria-label only)
Upload a product image and run analysis without a manual — FAQs should be 3-5, same as before
Upload a product manual PDF in Advanced Options — verify extraction completes with filename and chunk count shown
Run analysis with manual loaded — FAQs should be up to 10, with manual-specific details (specs, care, safety)
Test with no title/categories — should fall back to "general" product type queries
Test with a scanned/image-based PDF — should return a clear error message
Test PDF over 50 MB — should return size limit error
Verify POST /vlm/manual/extract + POST /vlm/faqs works via curl without the UI (batch script scenario)

🤖 Generated with Claude Code

Add stateless targeted RAG pipeline that extracts knowledge from product manual PDFs to generate richer FAQs (up to 10) with specific details like specs, care instructions, safety warnings, and warranty information. Architecture: - POST /vlm/manual/extract processes PDF, returns structured knowledge JSON, frees all server-side resources (fully stateless, scalable to concurrent use) - LLM dynamically generates 5-8 product-type-specific queries from title + categories (not description) to avoid FAQ duplication with the description - In-memory numpy cosine similarity for chunk retrieval (no Milvus needed) - Embedding requests batched at 128 chunks for large manuals - 50 MB PDF size limit with input validation Backend: new product_manual.py module, modified /vlm/faqs endpoint to accept manual_knowledge, enhanced FAQ prompt with deduplication rules. Frontend: "Product manual for FAQs" upload section in Advanced Options, unified StagedUploadProgress component, "Optional" pill on Policy Library. Docs: moved PRD.md to docs/, created POLICY_COMPLIANCE.md and PRODUCT_MANUAL_FAQS.md feature guides, updated API.md, README, AGENTS.md. Tests: 32 new unit tests (208 total, all passing). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

antoniomtz self-assigned this Apr 17, 2026

antoniomtz added the enhancement New feature or request label Apr 17, 2026

antoniomtz merged commit 13f73be into main Apr 17, 2026
4 of 5 checks passed

antoniomtz deleted the antoniomtz/product-manual-faq-enrichment branch April 17, 2026 17:07

antoniomtz mentioned this pull request Apr 17, 2026

fix: add missing description prop to Spinner in FieldsCard #55

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: product manual PDF upload for enhanced FAQ generation#53

feat: product manual PDF upload for enhanced FAQ generation#53
antoniomtz merged 1 commit intomainfrom
antoniomtz/product-manual-faq-enrichment

antoniomtz commented Apr 17, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

antoniomtz commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

antoniomtz commented Apr 17, 2026 •

edited

Loading