Skip to content

feat: product manual PDF upload for enhanced FAQ generation#53

Merged
antoniomtz merged 1 commit intomainfrom
antoniomtz/product-manual-faq-enrichment
Apr 17, 2026
Merged

feat: product manual PDF upload for enhanced FAQ generation#53
antoniomtz merged 1 commit intomainfrom
antoniomtz/product-manual-faq-enrichment

Conversation

@antoniomtz
Copy link
Copy Markdown
Collaborator

@antoniomtz antoniomtz commented Apr 17, 2026

Summary

  • Add stateless targeted RAG pipeline (POST /vlm/manual/extract) that extracts knowledge from product manual PDFs to generate richer FAQs (up to 10) with specs, care instructions, safety warnings, and warranty details
  • LLM dynamically generates product-type-specific queries from title + categories (not description) to avoid FAQ duplication
  • Fully stateless architecture: all embeddings computed in-memory and freed after response, supporting concurrent batch processing
  • "Product manual for FAQs" upload section in Advanced Options UI with staged progress indicator
  • New documentation: docs/POLICY_COMPLIANCE.md, docs/PRODUCT_MANUAL_FAQS.md, moved PRD.md to docs/
  • 32 new unit tests (208 total, all passing)

Test plan

  • Run uv run pytest tests/ — all 208 tests pass
  • TypeScript check pnpm exec tsc --noEmit — no new errors (pre-existing Spinner aria-label only)
  • Upload a product image and run analysis without a manual — FAQs should be 3-5, same as before
  • Upload a product manual PDF in Advanced Options — verify extraction completes with filename and chunk count shown
  • Run analysis with manual loaded — FAQs should be up to 10, with manual-specific details (specs, care, safety)
  • Test with no title/categories — should fall back to "general" product type queries
  • Test with a scanned/image-based PDF — should return a clear error message
  • Test PDF over 50 MB — should return size limit error
  • Verify POST /vlm/manual/extract + POST /vlm/faqs works via curl without the UI (batch script scenario)

🤖 Generated with Claude Code

Add stateless targeted RAG pipeline that extracts knowledge from product
manual PDFs to generate richer FAQs (up to 10) with specific details like
specs, care instructions, safety warnings, and warranty information.

Architecture:
- POST /vlm/manual/extract processes PDF, returns structured knowledge JSON,
  frees all server-side resources (fully stateless, scalable to concurrent use)
- LLM dynamically generates 5-8 product-type-specific queries from title +
  categories (not description) to avoid FAQ duplication with the description
- In-memory numpy cosine similarity for chunk retrieval (no Milvus needed)
- Embedding requests batched at 128 chunks for large manuals
- 50 MB PDF size limit with input validation

Backend: new product_manual.py module, modified /vlm/faqs endpoint to accept
manual_knowledge, enhanced FAQ prompt with deduplication rules.

Frontend: "Product manual for FAQs" upload section in Advanced Options,
unified StagedUploadProgress component, "Optional" pill on Policy Library.

Docs: moved PRD.md to docs/, created POLICY_COMPLIANCE.md and
PRODUCT_MANUAL_FAQS.md feature guides, updated API.md, README, AGENTS.md.

Tests: 32 new unit tests (208 total, all passing).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@antoniomtz antoniomtz self-assigned this Apr 17, 2026
@antoniomtz antoniomtz added the enhancement New feature or request label Apr 17, 2026
@antoniomtz antoniomtz merged commit 13f73be into main Apr 17, 2026
4 of 5 checks passed
@antoniomtz antoniomtz deleted the antoniomtz/product-manual-faq-enrichment branch April 17, 2026 17:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant