Skip to content

Commit fca0c39

Browse files
antoniomtzclaude
andcommitted
feat: product manual PDF upload for enhanced FAQ generation
Add stateless targeted RAG pipeline that extracts knowledge from product manual PDFs to generate richer FAQs (up to 10) with specific details like specs, care instructions, safety warnings, and warranty information. Architecture: - POST /vlm/manual/extract processes PDF, returns structured knowledge JSON, frees all server-side resources (fully stateless, scalable to concurrent use) - LLM dynamically generates 5-8 product-type-specific queries from title + categories (not description) to avoid FAQ duplication with the description - In-memory numpy cosine similarity for chunk retrieval (no Milvus needed) - Embedding requests batched at 128 chunks for large manuals - 50 MB PDF size limit with input validation Backend: new product_manual.py module, modified /vlm/faqs endpoint to accept manual_knowledge, enhanced FAQ prompt with deduplication rules. Frontend: "Product manual for FAQs" upload section in Advanced Options, unified StagedUploadProgress component, "Optional" pill on Policy Library. Docs: moved PRD.md to docs/, created POLICY_COMPLIANCE.md and PRODUCT_MANUAL_FAQS.md feature guides, updated API.md, README, AGENTS.md. Tests: 32 new unit tests (208 total, all passing). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent f649166 commit fca0c39

18 files changed

Lines changed: 1881 additions & 37 deletions

AGENTS.md

Lines changed: 34 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -12,14 +12,18 @@ This document provides guidelines and instructions for AI assistants working on
1212
- **[README.md](README.md)** - Quick start guide and high-level overview
1313
- **[docs/API.md](docs/API.md)** - Complete API reference with examples
1414
- **[docs/DOCKER.md](docs/DOCKER.md)** - Docker and container deployment guide
15-
- **[PRD.md](PRD.md)** - Product requirements document
15+
- **[docs/PRD.md](docs/PRD.md)** - Product requirements document
16+
- **[docs/POLICY_COMPLIANCE.md](docs/POLICY_COMPLIANCE.md)** - Policy compliance feature guide
17+
- **[docs/PRODUCT_MANUAL_FAQS.md](docs/PRODUCT_MANUAL_FAQS.md)** - Product manual PDF for FAQ enrichment guide
1618
- **[AGENTS.md](AGENTS.md)** - This file (AI assistant guidelines)
1719

1820
### Current Status
1921
-**Multi-Language Support** - Locale-based product descriptions (FR-6 completed)
2022
-**VLM Content Augmentation** - Enhances existing product data with visual insights (FR-2 completed)
2123
-**2D Image Variation Generation** - Working with prompt planning and quality evaluation (FR-3 completed)
2224
-**Automated Quality Assessment** - VLM-based reflection for generated images (FR-9 completed)
25+
-**Product FAQ Generation** - FAQs from enriched data with optional product manual PDF enhancement (FR-10, FR-12 completed)
26+
-**Policy Compliance** - PDF policy library with Milvus RAG and compliance classification (FR-11 completed)
2327
- ⚠️ **In Development** - 3D Asset Generation (backend complete) and Video Generation in progress
2428

2529
### Key Goals
@@ -44,7 +48,7 @@ cd catalog-enrichment
4448
### Backend (current)
4549

4650
- Stack: FastAPI + Uvicorn (ASGI), OpenAI client (NVIDIA endpoint), Starlette under the hood
47-
- Dependencies: `fastapi`, `uvicorn[standard]`, `openai`, `python-multipart`, `python-dotenv`, `httpx`, `pillow`, `pyyaml`
51+
- Dependencies: `fastapi`, `uvicorn[standard]`, `openai`, `python-multipart`, `python-dotenv`, `httpx`, `pillow`, `pyyaml`, `pymilvus`, `pypdf`, `numpy`
4852
- Python: 3.11+
4953
- **Error Handling**: Comprehensive connection error detection with user-friendly messages when NIM endpoints are unreachable
5054

@@ -102,6 +106,32 @@ uvicorn --app-dir src backend.main:app --host 0.0.0.0 --port 8000 --reload
102106
- `quality_score`: float (0-100 quality score from VLM reflection, or null if evaluation failed)
103107
- `quality_issues`: array (list of detected quality issues from reflection analysis)
104108

109+
**FAQ Generation:**
110+
- POST `/vlm/faqs`
111+
- Request: `multipart/form-data` with fields:
112+
- `title` (string, optional): Product title from VLM analysis
113+
- `description` (string, optional): Product description from VLM analysis
114+
- `categories` (JSON string, optional): Categories array
115+
- `tags` (JSON string, optional): Tags array
116+
- `colors` (JSON string, optional): Colors array
117+
- `locale` (string, optional): Regional locale code (default: "en-US")
118+
- `manual_knowledge` (JSON string, optional): Extracted knowledge from `/vlm/manual/extract`
119+
- Response: `{ "faqs": [{ "question": "string", "answer": "string" }] }`
120+
- Without manual: 3-5 FAQs from product data
121+
- With manual knowledge: up to 10 FAQs drawing from both product data and manual
122+
123+
**Product Manual Knowledge Extraction:**
124+
- POST `/vlm/manual/extract`
125+
- Request: `multipart/form-data` with fields:
126+
- `file` (file): Product manual PDF (max 50 MB)
127+
- `title` (string, optional): Product title for query generation
128+
- `categories` (JSON string, optional): Product categories for query generation
129+
- `locale` (string, optional): Regional locale code (default: "en-US")
130+
- Response: `{ "filename": "string", "chunk_count": 42, "knowledge": { "topic": "extracted text..." } }`
131+
- Stateless: all vectors freed after response, no server-side storage
132+
- LLM generates 5-8 product-type-specific queries from title + categories (not description)
133+
- Retrieves relevant chunks per query via in-memory cosine similarity
134+
105135
**3D Asset Generation:**
106136
- POST `/generate/3d`
107137
- Request: `multipart/form-data` with fields:
@@ -344,7 +374,7 @@ Given the catalog enrichment focus, pay special attention to:
344374

345375
---
346376

347-
**Last Updated:** $(date)
348-
**Version:** 1.0
377+
**Last Updated:** 16-Apr-2026
378+
**Version:** 1.3
349379

350380
*This document should be updated as the project evolves and new practices are established.*

README.md

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -25,15 +25,17 @@ A GenAI-powered catalog enrichment system that transforms basic product images i
2525
- **Cultural Image Generation**: Create culturally-appropriate product backgrounds (Spanish courtyards, Mexican family spaces, British formal settings)
2626
- **Quality Evaluation**: Automated VLM-based quality assessment of generated images with detailed scoring
2727
- **3D Asset Generation**: Transform 2D product images into interactive 3D GLB models using Microsoft TRELLIS
28-
- **Product FAQ Generation**: Automatically generate 3-5 product FAQs from enriched catalog data
28+
- **Product FAQ Generation**: Automatically generate product FAQs from enriched catalog data, with optional product manual PDF upload for richer FAQs (up to 10) via stateless targeted RAG
2929
- **Policy Compliance**: Upload policy PDFs and automatically check product listings against them using RAG + Milvus
3030
- **Modular API**: Separate endpoints for VLM analysis, FAQ generation, image generation, and 3D asset generation
3131

3232
## Documentation
3333

3434
- **[API Documentation](docs/API.md)** - Detailed API endpoints, parameters, and examples
3535
- **[Docker Deployment Guide](docs/DOCKER.md)** - Docker and Docker Compose setup instructions
36-
- **[Product Requirements (PRD)](PRD.md)** - Product requirements and feature specifications
36+
- **[Product Requirements (PRD)](docs/PRD.md)** - Product requirements and feature specifications
37+
- **[Policy Compliance](docs/POLICY_COMPLIANCE.md)** - How policy compliance checking works
38+
- **[Product Manual for FAQs](docs/PRODUCT_MANUAL_FAQS.md)** - How product manual PDFs enrich FAQ generation
3739
- **[AI Agent Guidelines](AGENTS.md)** - Instructions for AI assistants working on this project
3840

3941
## Tech Stack
@@ -221,6 +223,8 @@ For complete Docker deployment instructions, see the **[Docker Deployment Guide]
221223
The system provides three main endpoints:
222224

223225
- `POST /vlm/analyze` - Fast VLM/LLM analysis
226+
- `POST /vlm/faqs` - Product FAQ generation (supports optional manual knowledge)
227+
- `POST /vlm/manual/extract` - Extract knowledge from a product manual PDF for FAQ enrichment
224228
- `POST /generate/variation` - Image generation with FLUX
225229
- `POST /generate/3d` - 3D asset generation with TRELLIS
226230

docs/API.md

Lines changed: 99 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,7 @@ The API provides a modular approach for optimal performance and flexibility:
3737

3838
**1) Fast VLM Analysis (POST `/vlm/analyze`)** - Get product fields quickly
3939
**2) FAQ Generation (POST `/vlm/faqs`)** - Generate product FAQs from enriched data
40+
**2.5) Manual Knowledge Extraction (POST `/vlm/manual/extract`)** - Extract knowledge from a product manual PDF to enrich FAQs
4041
**3) Image Generation (POST `/generate/variation`)** - Generate 2D variations on demand
4142
**4) 3D Asset Generation (POST `/generate/3d`)** - Generate 3D models on demand
4243

@@ -276,7 +277,10 @@ curl -X POST \
276277

277278
## 3️⃣ FAQ Generation: `/vlm/faqs`
278279

279-
Generate 3-5 frequently asked questions and answers for a product based on its enriched catalog data. Designed to be called after `/vlm/analyze` completes, using the enriched result as input.
280+
Generate frequently asked questions and answers for a product based on its enriched catalog data. Designed to be called after `/vlm/analyze` completes, using the enriched result as input.
281+
282+
Without a product manual: generates 3-5 basic FAQs from the product data.
283+
With manual knowledge (from `/vlm/manual/extract`): generates up to 10 richer FAQs that draw from both the product data and the manual, surfacing details that go beyond the description.
280284

281285
**Endpoint**: `POST /vlm/faqs`
282286
**Content-Type**: `multipart/form-data`
@@ -291,6 +295,7 @@ Generate 3-5 frequently asked questions and answers for a product based on its e
291295
| `tags` | JSON string | No | Tags array (default: `[]`) |
292296
| `colors` | JSON string | No | Colors array (default: `[]`) |
293297
| `locale` | string | No | Regional locale code (default: `en-US`) |
298+
| `manual_knowledge` | JSON string | No | Extracted manual knowledge from `/vlm/manual/extract` |
294299

295300
### Response Schema
296301

@@ -305,7 +310,7 @@ Generate 3-5 frequently asked questions and answers for a product based on its e
305310
}
306311
```
307312

308-
### Usage Example
313+
### Usage Example (Basic)
309314

310315
```bash
311316
# Call after /vlm/analyze to generate FAQs from enriched data
@@ -319,6 +324,27 @@ curl -X POST \
319324
http://localhost:8000/vlm/faqs
320325
```
321326

327+
### Usage Example (With Product Manual)
328+
329+
```bash
330+
# First extract knowledge from the manual, then pass it to FAQ generation
331+
KNOWLEDGE=$(curl -s -X POST \
332+
-F "file=@mower-manual.pdf" \
333+
-F "title=Craftsman 20V Cordless Lawn Mower" \
334+
-F 'categories=["electronics"]' \
335+
http://localhost:8000/vlm/manual/extract | jq -c '.knowledge')
336+
337+
curl -X POST \
338+
-F "title=Craftsman 20V Cordless Lawn Mower" \
339+
-F "description=A cordless lawn mower featuring a black and red design..." \
340+
-F 'categories=["electronics"]' \
341+
-F 'tags=["cordless","lawn mower","Craftsman"]' \
342+
-F 'colors=["black","red"]' \
343+
-F "locale=en-US" \
344+
-F "manual_knowledge=$KNOWLEDGE" \
345+
http://localhost:8000/vlm/faqs
346+
```
347+
322348
### Example Response
323349

324350
```json
@@ -342,6 +368,77 @@ curl -X POST \
342368

343369
---
344370

371+
## 3.5️⃣ Product Manual Knowledge Extraction: `/vlm/manual/extract`
372+
373+
Extract structured knowledge from a product manual PDF using targeted RAG. The endpoint processes the PDF, generates product-type-specific queries via the LLM (using title + categories, not description, to avoid duplicating what the description already covers), and retrieves relevant chunks from the manual for each topic.
374+
375+
This endpoint is **stateless** — all embeddings are computed in-memory and freed after the response. It can handle concurrent requests for different products.
376+
377+
**Endpoint**: `POST /vlm/manual/extract`
378+
**Content-Type**: `multipart/form-data`
379+
380+
### Request Parameters
381+
382+
| Parameter | Type | Required | Description |
383+
|-----------|------|----------|-------------|
384+
| `file` | file | Yes | Product manual PDF (max 50 MB) |
385+
| `title` | string | No | Product title (used to generate relevant queries) |
386+
| `categories` | JSON string | No | Product categories array (used to generate relevant queries) |
387+
| `locale` | string | No | Regional locale code (default: `en-US`) |
388+
389+
### Response Schema
390+
391+
```json
392+
{
393+
"filename": "string",
394+
"chunk_count": 42,
395+
"knowledge": {
396+
"battery_life": "The speaker provides up to 12 hours of continuous playback...",
397+
"waterproof_rating": "IPX7 rated, can be submerged up to 1 meter for 30 minutes...",
398+
"care_instructions": "Clean with a damp cloth. Do not use abrasive cleaners..."
399+
}
400+
}
401+
```
402+
403+
The `knowledge` object contains topic keys (dynamically generated by the LLM based on product type) mapped to the relevant text extracted from the manual. Topics with no relevant content are empty strings.
404+
405+
### Usage Example
406+
407+
```bash
408+
curl -X POST \
409+
-F "file=@speaker-manual.pdf;type=application/pdf" \
410+
-F "title=JBL Flip 6 Portable Speaker" \
411+
-F 'categories=["electronics"]' \
412+
-F "locale=en-US" \
413+
http://localhost:8000/vlm/manual/extract
414+
```
415+
416+
### Batch Script Example
417+
418+
```bash
419+
# Process multiple products concurrently (each request is independent)
420+
for product in products/*.json; do
421+
TITLE=$(jq -r '.title' "$product")
422+
CATS=$(jq -c '.categories' "$product")
423+
PDF=$(jq -r '.manual_pdf' "$product")
424+
425+
KNOWLEDGE=$(curl -s -X POST \
426+
-F "file=@$PDF" \
427+
-F "title=$TITLE" \
428+
-F "categories=$CATS" \
429+
http://localhost:8000/vlm/manual/extract | jq -c '.knowledge')
430+
431+
curl -s -X POST \
432+
-F "title=$TITLE" \
433+
-F "description=$(jq -r '.description' "$product")" \
434+
-F "categories=$CATS" \
435+
-F "manual_knowledge=$KNOWLEDGE" \
436+
http://localhost:8000/vlm/faqs
437+
done
438+
```
439+
440+
---
441+
345442
## 4️⃣ Image Generation: `/generate/variation`
346443

347444
Generate culturally-appropriate product variations using FLUX models based on VLM analysis results.

0 commit comments

Comments
 (0)