NVIDIA-AI-Blueprints · antoniomtz · Apr 16, 2026 · Apr 16, 2026
diff --git a/AGENTS.md b/AGENTS.md
@@ -322,17 +322,22 @@ Given the catalog enrichment focus, pay special attention to:
    - Ensure new code follows established patterns
    - Include appropriate error handling and logging
 
-3. **Documentation**
+3. **LLM Prompt Rules**
+   - **NEVER hardcode specific product examples in prompts.** Rules must be generic and work across all products. For example, do NOT write rules like `"when the user says 'synthetic leather' and the camera sees 'leather', use the user's term"` — instead write `"when there is a conflict, prefer the user's terms for materials and specs"`.
+   - Prompts are consumed by millions of products — every rule must generalize.
+   - If a specific scenario fails, fix the underlying rule, not just the example.
+
+4. **Documentation**
    - Update relevant documentation when making changes
    - Include examples in API documentation
    - Keep this AGENTS.md file current as the project evolves
 
-4. **Communication**
+5. **Communication**
    - Ask for clarification when requirements are ambiguous
    - Suggest improvements to architecture and processes
    - Flag potential security or performance concerns
 
-5. **Incremental Development**
+6. **Incremental Development**
    - Start with simple, working solutions
    - Iterate and improve based on feedback
    - Consider backwards compatibility when making changes

diff --git a/CLAUDE.md b/CLAUDE.md
@@ -0,0 +1 @@
+Refer to [AGENTS.md](AGENTS.md) for all project guidelines, coding standards, and AI assistant instructions.
diff --git a/PRD.md b/PRD.md
@@ -129,6 +129,25 @@ A GenAI-powered catalog enrichment system that transforms basic product images i
 - Support automated filtering or flagging of low-quality generated images
 - Ensure background differences from original are not penalized (backgrounds should differ)
 
+### FR-10: Product FAQ Generation
+- Generate 3-5 frequently asked questions and answers for each product
+- FAQs are derived from the final enriched catalog data (after VLM analysis, user data merge, and branding)
+- Questions cover practical shopper topics: materials, care instructions, sizing, use cases, compatibility, durability
+- Answers are concise (1-3 sentences), factual, and grounded in the enriched product data
+- Support locale-aware FAQ generation across all 10 supported regional locales
+- Separate `/vlm/faqs` endpoint allows asynchronous generation — details display immediately while FAQs load in the background
+- UI displays FAQs in a dedicated tab with collapsible accordion items
+
+### FR-11: Policy Compliance Checking
+- Accept PDF policy documents through a persistent policy library (`/policies` endpoint)
+- Parse and normalize uploaded PDFs into structured policy summaries
+- Embed normalized policy records using NVIDIA embeddings and store in Milvus vector database
+- During product analysis, perform semantic retrieval of relevant policy records
+- Run compliance classification against enriched product data and retrieved policy records
+- Return pass/fail status with matched policies, rule details, reasons, evidence, and warnings
+- Support deduplication of repeated policy uploads by content hash
+- Display compliance results in the UI with visual pass/fail indicators
+
 ## Technical Requirements
 
 ### TR-1: Model Integration
@@ -230,6 +249,16 @@ A GenAI-powered catalog enrichment system that transforms basic product images i
 **I want to** receive automated quality assessments with detailed scoring and issue detection for generated product images  
 **So that** I can quickly identify and filter out low-quality variations without manual review, ensuring only high-quality assets enter my catalog
 
+### US-8: Product FAQ Generation
+**As a** e-commerce content manager  
+**I want to** automatically generate frequently asked questions and answers for each product based on its enriched catalog data  
+**So that** I can populate product FAQ sections without manual copywriting, improving the customer shopping experience
+
+### US-9: Policy Compliance Checking
+**As a** catalog compliance officer  
+**I want to** upload policy PDFs and have the system automatically check enriched product listings against those policies  
+**So that** I can ensure all catalog entries comply with marketplace regulations and internal guidelines before publishing
+
 ## Success Criteria
 
 - **Processing Time**: <1 minute per product for complete enrichment (including quality assessment)
@@ -256,6 +285,8 @@ A GenAI-powered catalog enrichment system that transforms basic product images i
 - [ ] FR-7: Social Media Content Integration
 - [x] ~~FR-8: Brand Voice & Taxonomy Customization~~ *(Complete with brand_instructions parameter support)*
 - [x] ~~FR-9: Automated Quality Assessment for Generated Images~~ *(VLM-based reflection module integrated into image generation pipeline)*
+- [x] ~~FR-10: Product FAQ Generation~~ *(Separate /vlm/faqs endpoint with async loading, Kaizen Tabs + Accordion UI)*
+- [x] ~~FR-11: Policy Compliance Checking~~ *(PDF policy library with Milvus embeddings, semantic retrieval, compliance classification)*
 
 - [ ] TR-1: Model Integration
   - [x] ~~NVIDIA Nemotron VLM API integration~~

diff --git a/README.md b/README.md
@@ -25,7 +25,9 @@ A GenAI-powered catalog enrichment system that transforms basic product images i
 - **Cultural Image Generation**: Create culturally-appropriate product backgrounds (Spanish courtyards, Mexican family spaces, British formal settings)
 - **Quality Evaluation**: Automated VLM-based quality assessment of generated images with detailed scoring
 - **3D Asset Generation**: Transform 2D product images into interactive 3D GLB models using Microsoft TRELLIS
-- **Modular API**: Separate endpoints for VLM analysis, image generation, and 3D asset generation
+- **Product FAQ Generation**: Automatically generate 3-5 product FAQs from enriched catalog data
+- **Policy Compliance**: Upload policy PDFs and automatically check product listings against them using RAG + Milvus
+- **Modular API**: Separate endpoints for VLM analysis, FAQ generation, image generation, and 3D asset generation
 
 ## Documentation
 

diff --git a/docs/API.md b/docs/API.md
@@ -36,8 +36,9 @@ Health check endpoint for monitoring service status.
 The API provides a modular approach for optimal performance and flexibility:
 
 **1) Fast VLM Analysis (POST `/vlm/analyze`)** - Get product fields quickly
-**2) Image Generation (POST `/generate/variation`)** - Generate 2D variations on demand  
-**3) 3D Asset Generation (POST `/generate/3d`)** - Generate 3D models on demand
+**2) FAQ Generation (POST `/vlm/faqs`)** - Generate product FAQs from enriched data
+**3) Image Generation (POST `/generate/variation`)** - Generate 2D variations on demand  
+**4) 3D Asset Generation (POST `/generate/3d`)** - Generate 3D models on demand
 
 **Benefits of this approach:**
 - Display product information immediately to users
@@ -273,7 +274,75 @@ curl -X POST \
 
 ---
 
-## 3️⃣ Image Generation: `/generate/variation`
+## 3️⃣ FAQ Generation: `/vlm/faqs`
+
+Generate 3-5 frequently asked questions and answers for a product based on its enriched catalog data. Designed to be called after `/vlm/analyze` completes, using the enriched result as input.
+
+**Endpoint**: `POST /vlm/faqs`
+**Content-Type**: `multipart/form-data`
+
+### Request Parameters
+
+| Parameter | Type | Required | Description |
+|-----------|------|----------|-------------|
+| `title` | string | No | Product title from VLM analysis |
+| `description` | string | No | Product description from VLM analysis |
+| `categories` | JSON string | No | Categories array (default: `[]`) |
+| `tags` | JSON string | No | Tags array (default: `[]`) |
+| `colors` | JSON string | No | Colors array (default: `[]`) |
+| `locale` | string | No | Regional locale code (default: `en-US`) |
+
+### Response Schema
+
+```json
+{
+  "faqs": [
+    {
+      "question": "string",
+      "answer": "string"
+    }
+  ]
+}
+```
+
+### Usage Example
+
+```bash
+# Call after /vlm/analyze to generate FAQs from enriched data
+curl -X POST \
+  -F "title=Craftsman 20V Cordless Lawn Mower" \
+  -F "description=A cordless lawn mower featuring a black and red design..." \
+  -F 'categories=["electronics"]' \
+  -F 'tags=["cordless","lawn mower","Craftsman"]' \
+  -F 'colors=["black","red"]' \
+  -F "locale=en-US" \
+  http://localhost:8000/vlm/faqs
+```
+
+### Example Response
+
+```json
+{
+  "faqs": [
+    {
+      "question": "What type of battery does this mower use?",
+      "answer": "This mower operates on a 20V cordless battery system, providing the flexibility to mow without a power cord."
+    },
+    {
+      "question": "Does this mower come with a grass collection bag?",
+      "answer": "Yes, it includes a rear-mounted grass collection bag for convenient clippings management."
+    },
+    {
+      "question": "What are the main colors of this mower?",
+      "answer": "The mower features a black and red color scheme with prominent Craftsman branding."
+    }
+  ]
+}
+```
+
+---
+
+## 4️⃣ Image Generation: `/generate/variation`
 
 Generate culturally-appropriate product variations using FLUX models based on VLM analysis results.
 
@@ -334,7 +403,7 @@ curl -X POST \
 
 ---
 
-## 4️⃣ 3D Asset Generation: `/generate/3d`
+## 5️⃣ 3D Asset Generation: `/generate/3d`
 
 Generate interactive 3D GLB models from 2D product images using Microsoft's TRELLIS model.
 

diff --git a/docs/hallucination-report.md b/docs/hallucination-report.md
@@ -0,0 +1,170 @@
+# LLM Enhancement Hallucination Report
+
+**Date:** 2026-04-15
+**Reported by:** Antonio Martinez
+**Status:** Open — Separate task pending
+**Affected component:** `src/backend/vlm.py` — `_call_nemotron_enhance_vlm()` (Step 1 enhancement)
+
+---
+
+## Summary
+
+The VLM model (`nemotron-nano-12b-v2-vl`, 12B parameters) introduces hallucinations at the source — misreading visible text, fabricating materials and features, and drawing from training data rather than strictly describing the image. The LLM enhancement step (`_call_nemotron_enhance_vlm`) then compounds these errors by rewriting them into confident marketing copy. Both layers contribute, but the root cause is the VLM.
+
+---
+
+## Root Cause Analysis
+
+### Pipeline Flow
+
+```
+Image Upload
+  |
+  v
+[VLM] _call_vlm()                         <-- Accurate visual analysis
+  |   Model: nemotron-nano-12b-v2-vl
+  |   Output: title, description, categories, tags, colors
+  v
+[LLM] _call_nemotron_enhance_vlm()        <-- Hallucinations introduced HERE
+  |   Model: nemotron-3-nano
+  |   Task: "Write rich, persuasive product description"
+  v
+[LLM] _call_nemotron_apply_branding()     <-- Inherits errors from Step 1
+  |   (only runs if brand_instructions provided)
+  v
+[LLM] _call_nemotron_generate_faqs()      <-- Consumes VLM output directly,
+      (runs in parallel with Step 1)           but FAQs still affected if
+                                               VLM has minor OCR issues
+```
+
+### Where the Problem Lives
+
+**Layer 1 — VLM** (`src/backend/vlm.py`, `_call_vlm()`):
+The 12B VLM model misreads text, fabricates materials/features, and fills in details from training data. This happens regardless of prompt complexity — even "describe this product" triggers hallucinations. Longer prompts produce *more* hallucinations, not fewer. This is confirmed by the NVIDIA research team: longer system prompts degrade VLM output quality for this model class.
+
+**Layer 2 — LLM Enhancement** (`src/backend/vlm.py`, `_call_nemotron_enhance_vlm()`):
+The LLM takes the already-hallucinated VLM output and rewrites it into confident marketing copy, compounding errors and adding its own fabrications. Skipping this step when no user data is provided eliminates the second layer.
+
+---
+
+## Evidence: Craftsman 2XV20 Lawn Mower
+
+### Test Image
+
+`mower.jpeg` — Craftsman battery-powered lawn mower with "2XV20" printed on the deck (indicating dual V20 battery platform).
+
+### VLM Direct Testing (2026-04-15)
+
+Three prompts were tested against the same VLM endpoint (`nemotron-nano-12b-v2-vl`) with `mower.jpeg`:
+
+**Prompt 1 — Minimal: "describe this product"**
+
+> "This product is a Craftsman 20-inch 20V MAX Lithium Ion Cordless Lawn Mower. It's a compact, electric lawn mower designed for residential use. The mower features a 20-inch cutting deck [...] The 20V MAX Lithium Ion battery provides cordless convenience [...] includes a grass collection bag [...] equipped with a safety key to prevent accidental startups."
+
+- Gets closest to reality: correctly identifies it as cordless/battery-powered ("20V MAX Lithium Ion")
+- Still fabricates: "20-inch cutting deck", "safety key"
+- Clearly pulling from Craftsman training data rather than reading "2XV20" text
+
+**Prompt 2 — Detailed descriptive: "In detail, give a description of this image, include everything you see including texts. Be extremely descriptive."**
+
+> "The cutting deck itself is marked with the text '20' indicating the width of the cutting blade in inches [...] a clear plastic cover over the cutting deck, allowing a view of the blades inside."
+
+- Misreads "2XV20" as "20" and reinterprets it as cutting width
+- Fabricates "clear plastic cover over the cutting deck"
+- More hallucinations than the minimal prompt
+
+**Prompt 3 — Catalog enrichment structured prompt (our production prompt)**
+
+> `"title": "Craftsman 20-Inch Electric Lawn Mower"` ... `"clear plastic front cover"` ... `"control panel on the handlebar"` ... `"model number '20' is visible on the front"`
+
+- Same hallucinations as prompt 2, now in JSON format
+- Fabricates: "clear plastic front cover", "control panel on the handlebar"
+- Misreads "2XV20" as "20" and calls it a model number
+
+### Key Finding: Hallucinations Originate in the VLM
+
+Initial analysis attributed hallucinations to the LLM enhancement step. **Direct VLM testing disproved this.** The VLM itself:
+1. Misreads "2XV20" as "20" across all prompt styles
+2. Fabricates materials ("clear plastic") and features ("control panel", "safety key") not visible in the image
+3. Draws from training data about Craftsman products rather than strictly describing the image
+4. Performs *worse* with longer, more detailed prompts — the minimal prompt produced the fewest hallucinations
+
+### Hallucination Inventory (VLM output, all prompts combined)
+
+| Claim | Reality | Type | Source |
+|-------|---------|------|--------|
+| "20-Inch" cutting width | "2XV20" is Craftsman's dual V20 battery platform | Text misread | VLM |
+| "clear plastic cutting deck/cover" | Deck is opaque black | Fabricated material | VLM |
+| "control panel on the handlebar" | Only a safety lever is visible | Fabricated feature | VLM |
+| "safety key" | No safety key visible | Fabricated feature | VLM |
+| "Electric Lawn Mower" (prompt 2/3) | Battery-powered (cordless) | Training data inference | VLM |
+| "silver accents" on wheels | Wheels are entirely black | Fabricated detail | LLM enhancement |
+| "red power button" | Not visible | Fabricated feature | LLM enhancement |
+
+The LLM enhancement step compounded the VLM's errors (adding "silver accents", "red power button"), but the root cause is the 12B VLM model's vision limitations.
+
+---
+
+## Proposed Solution
+
+### Fix 1 (Implemented): Skip LLM Enhancement When Unnecessary
+
+**Status: Done** — merged in this branch.
+
+The LLM enhancement step is now skipped when no user product data is provided. This eliminates the second layer of hallucinations.
+
+| Scenario | Current Behavior | New Behavior |
+|----------|-----------------|--------------|
+| Image only (no user data, no brand instructions) | VLM -> LLM enhance -> output | VLM -> output directly (skip LLM) |
+| Image + user product data | VLM -> LLM enhance (merge) -> output | VLM -> LLM enhance (merge) -> output (keep) |
+| Image + brand instructions | VLM -> LLM enhance -> LLM brand -> output | VLM -> LLM brand -> output |
+| Image + user data + brand instructions | VLM -> LLM enhance -> LLM brand -> output | VLM -> LLM enhance -> LLM brand -> output (keep) |
+
+### Fix 2 (Future): Shorten the VLM Prompt
+
+The current VLM prompt in `_call_vlm()` is ~30 lines with detailed rules, category lists, formatting instructions, and output constraints. Testing showed that a minimal prompt ("describe this product") produced the fewest hallucinations — the VLM correctly identified the mower as "20V MAX Lithium Ion Cordless" with that prompt, while the long structured prompt caused it to misread "2XV20" as "20" and fabricate features.
+
+This is confirmed by the NVIDIA research team: longer system prompts degrade output quality for this VLM model class. The model spends capacity following formatting rules rather than focusing on accurate visual analysis.
+
+**Proposed approach:**
+- Strip the VLM prompt down to a short, focused instruction — prioritize visual accuracy over output formatting
+- Move structural concerns (JSON format, category validation, tag count) to a lightweight post-processing step or a separate LLM call
+- Test iteratively: compare hallucination rates across prompt lengths using a set of test images (mower, shoes, skincare, etc.)
+
+**Trade-off:** A shorter VLM prompt may return unstructured text instead of clean JSON. This would require parsing the free-text output into structured fields, either with regex/heuristics or a fast LLM call. The benefit is more accurate visual descriptions at the source.
+
+### Fix 3 (Future): Upgrade VLM Model
+
+The `nemotron-nano-12b-v2-vl` (12B parameters) has fundamental vision limitations with stylized text and detail accuracy. A larger VLM (72B+) would likely improve OCR accuracy and reduce training-data hallucinations. This is a infrastructure/cost trade-off rather than a code change.
+
+---
+
+## Impact on FAQ Feature
+
+The FAQ generation feature (`_call_nemotron_generate_faqs`) consumes the raw VLM observation directly (not the enhanced output), which reduces but does not eliminate the risk:
+
+- FAQs generated from accurate VLM output will be factually grounded
+- Minor VLM OCR errors (e.g., "2x20" vs "2XV20") can still propagate into FAQ answers
+- If the proposed fix (skip enhancement) is implemented, the Details tab and FAQ tab will both be grounded in the same factual VLM observation, creating consistency
+
+---
+
+## Reproduction Steps
+
+1. Start the backend and frontend services
+2. Upload `mower.jpeg` (Craftsman 2XV20 lawn mower)
+3. Click Generate with default settings (no product data, no brand instructions)
+4. Observe the enriched description in the Details tab
+5. Compare against the VLM's raw output (visible in backend logs at `[VLM]` level)
+
+---
+
+## Files Referenced
+
+| File | Relevance |
+|------|-----------|
+| `src/backend/vlm.py:128-205` | `_call_nemotron_enhance_vlm()` — where hallucinations are introduced |
+| `src/backend/vlm.py:167-186` | Enhancement prompt with insufficient anti-hallucination rules |
+| `src/backend/vlm.py:175` | Current anti-hallucination rule (too narrow — numbers only) |
+| `src/backend/vlm.py:397-439` | `_call_nemotron_enhance()` — orchestrator where the skip logic would go |
+| `src/backend/vlm.py:441-510` | `_call_vlm()` — VLM analysis (produces accurate output) |
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1 @@
		Refer to [AGENTS.md](AGENTS.md) for all project guidelines, coding standards, and AI assistant instructions.