Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
38 changes: 34 additions & 4 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,14 +12,18 @@ This document provides guidelines and instructions for AI assistants working on
- **[README.md](README.md)** - Quick start guide and high-level overview
- **[docs/API.md](docs/API.md)** - Complete API reference with examples
- **[docs/DOCKER.md](docs/DOCKER.md)** - Docker and container deployment guide
- **[PRD.md](PRD.md)** - Product requirements document
- **[docs/PRD.md](docs/PRD.md)** - Product requirements document
- **[docs/POLICY_COMPLIANCE.md](docs/POLICY_COMPLIANCE.md)** - Policy compliance feature guide
- **[docs/PRODUCT_MANUAL_FAQS.md](docs/PRODUCT_MANUAL_FAQS.md)** - Product manual PDF for FAQ enrichment guide
- **[AGENTS.md](AGENTS.md)** - This file (AI assistant guidelines)

### Current Status
- ✅ **Multi-Language Support** - Locale-based product descriptions (FR-6 completed)
- ✅ **VLM Content Augmentation** - Enhances existing product data with visual insights (FR-2 completed)
- ✅ **2D Image Variation Generation** - Working with prompt planning and quality evaluation (FR-3 completed)
- ✅ **Automated Quality Assessment** - VLM-based reflection for generated images (FR-9 completed)
- ✅ **Product FAQ Generation** - FAQs from enriched data with optional product manual PDF enhancement (FR-10, FR-12 completed)
- ✅ **Policy Compliance** - PDF policy library with Milvus RAG and compliance classification (FR-11 completed)
- ⚠️ **In Development** - 3D Asset Generation (backend complete) and Video Generation in progress

### Key Goals
Expand All @@ -44,7 +48,7 @@ cd catalog-enrichment
### Backend (current)

- Stack: FastAPI + Uvicorn (ASGI), OpenAI client (NVIDIA endpoint), Starlette under the hood
- Dependencies: `fastapi`, `uvicorn[standard]`, `openai`, `python-multipart`, `python-dotenv`, `httpx`, `pillow`, `pyyaml`
- Dependencies: `fastapi`, `uvicorn[standard]`, `openai`, `python-multipart`, `python-dotenv`, `httpx`, `pillow`, `pyyaml`, `pymilvus`, `pypdf`, `numpy`
- Python: 3.11+
- **Error Handling**: Comprehensive connection error detection with user-friendly messages when NIM endpoints are unreachable

Expand Down Expand Up @@ -102,6 +106,32 @@ uvicorn --app-dir src backend.main:app --host 0.0.0.0 --port 8000 --reload
- `quality_score`: float (0-100 quality score from VLM reflection, or null if evaluation failed)
- `quality_issues`: array (list of detected quality issues from reflection analysis)

**FAQ Generation:**
- POST `/vlm/faqs`
- Request: `multipart/form-data` with fields:
- `title` (string, optional): Product title from VLM analysis
- `description` (string, optional): Product description from VLM analysis
- `categories` (JSON string, optional): Categories array
- `tags` (JSON string, optional): Tags array
- `colors` (JSON string, optional): Colors array
- `locale` (string, optional): Regional locale code (default: "en-US")
- `manual_knowledge` (JSON string, optional): Extracted knowledge from `/vlm/manual/extract`
- Response: `{ "faqs": [{ "question": "string", "answer": "string" }] }`
- Without manual: 3-5 FAQs from product data
- With manual knowledge: up to 10 FAQs drawing from both product data and manual

**Product Manual Knowledge Extraction:**
- POST `/vlm/manual/extract`
- Request: `multipart/form-data` with fields:
- `file` (file): Product manual PDF (max 50 MB)
- `title` (string, optional): Product title for query generation
- `categories` (JSON string, optional): Product categories for query generation
- `locale` (string, optional): Regional locale code (default: "en-US")
- Response: `{ "filename": "string", "chunk_count": 42, "knowledge": { "topic": "extracted text..." } }`
- Stateless: all vectors freed after response, no server-side storage
- LLM generates 5-8 product-type-specific queries from title + categories (not description)
- Retrieves relevant chunks per query via in-memory cosine similarity

**3D Asset Generation:**
- POST `/generate/3d`
- Request: `multipart/form-data` with fields:
Expand Down Expand Up @@ -344,7 +374,7 @@ Given the catalog enrichment focus, pay special attention to:

---

**Last Updated:** $(date)
**Version:** 1.0
**Last Updated:** 16-Apr-2026
**Version:** 1.3

*This document should be updated as the project evolves and new practices are established.*
8 changes: 6 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,15 +25,17 @@ A GenAI-powered catalog enrichment system that transforms basic product images i
- **Cultural Image Generation**: Create culturally-appropriate product backgrounds (Spanish courtyards, Mexican family spaces, British formal settings)
- **Quality Evaluation**: Automated VLM-based quality assessment of generated images with detailed scoring
- **3D Asset Generation**: Transform 2D product images into interactive 3D GLB models using Microsoft TRELLIS
- **Product FAQ Generation**: Automatically generate 3-5 product FAQs from enriched catalog data
- **Product FAQ Generation**: Automatically generate product FAQs from enriched catalog data, with optional product manual PDF upload for richer FAQs (up to 10) via stateless targeted RAG
- **Policy Compliance**: Upload policy PDFs and automatically check product listings against them using RAG + Milvus
- **Modular API**: Separate endpoints for VLM analysis, FAQ generation, image generation, and 3D asset generation

## Documentation

- **[API Documentation](docs/API.md)** - Detailed API endpoints, parameters, and examples
- **[Docker Deployment Guide](docs/DOCKER.md)** - Docker and Docker Compose setup instructions
- **[Product Requirements (PRD)](PRD.md)** - Product requirements and feature specifications
- **[Product Requirements (PRD)](docs/PRD.md)** - Product requirements and feature specifications
- **[Policy Compliance](docs/POLICY_COMPLIANCE.md)** - How policy compliance checking works
- **[Product Manual for FAQs](docs/PRODUCT_MANUAL_FAQS.md)** - How product manual PDFs enrich FAQ generation
- **[AI Agent Guidelines](AGENTS.md)** - Instructions for AI assistants working on this project

## Tech Stack
Expand Down Expand Up @@ -221,6 +223,8 @@ For complete Docker deployment instructions, see the **[Docker Deployment Guide]
The system provides three main endpoints:

- `POST /vlm/analyze` - Fast VLM/LLM analysis
- `POST /vlm/faqs` - Product FAQ generation (supports optional manual knowledge)
- `POST /vlm/manual/extract` - Extract knowledge from a product manual PDF for FAQ enrichment
- `POST /generate/variation` - Image generation with FLUX
- `POST /generate/3d` - 3D asset generation with TRELLIS

Expand Down
101 changes: 99 additions & 2 deletions docs/API.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@ The API provides a modular approach for optimal performance and flexibility:

**1) Fast VLM Analysis (POST `/vlm/analyze`)** - Get product fields quickly
**2) FAQ Generation (POST `/vlm/faqs`)** - Generate product FAQs from enriched data
**2.5) Manual Knowledge Extraction (POST `/vlm/manual/extract`)** - Extract knowledge from a product manual PDF to enrich FAQs
**3) Image Generation (POST `/generate/variation`)** - Generate 2D variations on demand
**4) 3D Asset Generation (POST `/generate/3d`)** - Generate 3D models on demand

Expand Down Expand Up @@ -276,7 +277,10 @@ curl -X POST \

## 3️⃣ FAQ Generation: `/vlm/faqs`

Generate 3-5 frequently asked questions and answers for a product based on its enriched catalog data. Designed to be called after `/vlm/analyze` completes, using the enriched result as input.
Generate frequently asked questions and answers for a product based on its enriched catalog data. Designed to be called after `/vlm/analyze` completes, using the enriched result as input.

Without a product manual: generates 3-5 basic FAQs from the product data.
With manual knowledge (from `/vlm/manual/extract`): generates up to 10 richer FAQs that draw from both the product data and the manual, surfacing details that go beyond the description.

**Endpoint**: `POST /vlm/faqs`
**Content-Type**: `multipart/form-data`
Expand All @@ -291,6 +295,7 @@ Generate 3-5 frequently asked questions and answers for a product based on its e
| `tags` | JSON string | No | Tags array (default: `[]`) |
| `colors` | JSON string | No | Colors array (default: `[]`) |
| `locale` | string | No | Regional locale code (default: `en-US`) |
| `manual_knowledge` | JSON string | No | Extracted manual knowledge from `/vlm/manual/extract` |

### Response Schema

Expand All @@ -305,7 +310,7 @@ Generate 3-5 frequently asked questions and answers for a product based on its e
}
```

### Usage Example
### Usage Example (Basic)

```bash
# Call after /vlm/analyze to generate FAQs from enriched data
Expand All @@ -319,6 +324,27 @@ curl -X POST \
http://localhost:8000/vlm/faqs
```

### Usage Example (With Product Manual)

```bash
# First extract knowledge from the manual, then pass it to FAQ generation
KNOWLEDGE=$(curl -s -X POST \
-F "file=@mower-manual.pdf" \
-F "title=Craftsman 20V Cordless Lawn Mower" \
-F 'categories=["electronics"]' \
http://localhost:8000/vlm/manual/extract | jq -c '.knowledge')

curl -X POST \
-F "title=Craftsman 20V Cordless Lawn Mower" \
-F "description=A cordless lawn mower featuring a black and red design..." \
-F 'categories=["electronics"]' \
-F 'tags=["cordless","lawn mower","Craftsman"]' \
-F 'colors=["black","red"]' \
-F "locale=en-US" \
-F "manual_knowledge=$KNOWLEDGE" \
http://localhost:8000/vlm/faqs
```

### Example Response

```json
Expand All @@ -342,6 +368,77 @@ curl -X POST \

---

## 3.5️⃣ Product Manual Knowledge Extraction: `/vlm/manual/extract`

Extract structured knowledge from a product manual PDF using targeted RAG. The endpoint processes the PDF, generates product-type-specific queries via the LLM (using title + categories, not description, to avoid duplicating what the description already covers), and retrieves relevant chunks from the manual for each topic.

This endpoint is **stateless** — all embeddings are computed in-memory and freed after the response. It can handle concurrent requests for different products.

**Endpoint**: `POST /vlm/manual/extract`
**Content-Type**: `multipart/form-data`

### Request Parameters

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `file` | file | Yes | Product manual PDF (max 50 MB) |
| `title` | string | No | Product title (used to generate relevant queries) |
| `categories` | JSON string | No | Product categories array (used to generate relevant queries) |
| `locale` | string | No | Regional locale code (default: `en-US`) |

### Response Schema

```json
{
"filename": "string",
"chunk_count": 42,
"knowledge": {
"battery_life": "The speaker provides up to 12 hours of continuous playback...",
"waterproof_rating": "IPX7 rated, can be submerged up to 1 meter for 30 minutes...",
"care_instructions": "Clean with a damp cloth. Do not use abrasive cleaners..."
}
}
```

The `knowledge` object contains topic keys (dynamically generated by the LLM based on product type) mapped to the relevant text extracted from the manual. Topics with no relevant content are empty strings.

### Usage Example

```bash
curl -X POST \
-F "file=@speaker-manual.pdf;type=application/pdf" \
-F "title=JBL Flip 6 Portable Speaker" \
-F 'categories=["electronics"]' \
-F "locale=en-US" \
http://localhost:8000/vlm/manual/extract
```

### Batch Script Example

```bash
# Process multiple products concurrently (each request is independent)
for product in products/*.json; do
TITLE=$(jq -r '.title' "$product")
CATS=$(jq -c '.categories' "$product")
PDF=$(jq -r '.manual_pdf' "$product")

KNOWLEDGE=$(curl -s -X POST \
-F "file=@$PDF" \
-F "title=$TITLE" \
-F "categories=$CATS" \
http://localhost:8000/vlm/manual/extract | jq -c '.knowledge')

curl -s -X POST \
-F "title=$TITLE" \
-F "description=$(jq -r '.description' "$product")" \
-F "categories=$CATS" \
-F "manual_knowledge=$KNOWLEDGE" \
http://localhost:8000/vlm/faqs
done
```

---

## 4️⃣ Image Generation: `/generate/variation`

Generate culturally-appropriate product variations using FLUX models based on VLM analysis results.
Expand Down
Loading
Loading