You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
fix: guarantee user-provided title words are always preserved
The nano LLM non-deterministically drops user title words during the
enhance step. Add a programmatic post-check in the orchestrator that
verifies all user-provided title words appear in the final output. If
any are missing, the original user title is prepended.
The post-check uses the original product_data (not the filtered version)
to ensure the user's intent is always honored regardless of filter
behavior.
Also simplify the pre-filter to a binary decision (keep all or clear
all) instead of word-by-word filtering.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
prompt=f"""You are a product data quality filter. Your ONLY job is to remove irrelevant text from user-provided product data by comparing it against a visual analysis of the actual product.
86
+
prompt=f"""You are a product data validator. Decide if the user-provided text is about the SAME type of product shown in the visual analysis, or about a COMPLETELY DIFFERENT product.
87
87
88
-
VISUAL ANALYSIS (ground truth — what the camera sees):
88
+
VISUAL ANALYSIS (what the camera shows):
89
89
{vlm_json}
90
90
91
-
PRODUCT CATEGORY (detected from visual analysis): {vlm_categories}
91
+
PRODUCT CATEGORY: {vlm_categories}
92
92
93
-
USER-PROVIDED PRODUCT DATA (may contain errors or irrelevant text):
93
+
USER-PROVIDED PRODUCT DATA:
94
94
{product_json}
95
95
96
-
TASK: Return a FILTERED copy of the user-provided product data. The detected product category above is your primary anchor for judging relevance.
97
-
- KEEP: brand names, model names, alphanumeric codes (likely SKUs or model numbers), product specifications (dosages, measurements, quantities, sizes), ingredients, materials, and any term that describes or is relevant to the detected product category.
98
-
- REMOVE: words or phrases that clearly belong to a DIFFERENT product category. For example, food-related terms on an electronics product, or clothing terms on a skincare product.
99
-
- When in doubt, KEEP the term — only remove when you are confident it belongs to a completely different product category.
100
-
- For non-text fields (price, SKU, numeric values, etc.): Keep them unchanged.
101
-
- If ALL words in a field are irrelevant to the detected category, return an empty string for that field.
96
+
TASK: For each text field in the user-provided data, answer ONE question: "Is this text about a completely different type of product than what the image shows?"
97
+
- If YES (completely different product type, e.g. "laptop" on a shoe image, or "yoga mat" on a blender image) → set that field to an empty string.
98
+
- If NO (same product, related, or even partially relevant) → keep the ENTIRE field exactly as the user provided it. Do NOT modify, rephrase, or remove individual words.
99
+
100
+
This is a binary decision per field — keep it all or clear it all. Never partially edit the user's text.
101
+
For non-text fields (price, SKU, numeric values): always keep unchanged.
102
102
103
103
Return ONLY valid JSON with the same structure as the user-provided data. No markdown, no comments."""
104
104
@@ -342,7 +342,17 @@ def _call_nemotron_enhance(
342
342
# Step 1: Enhance VLM output and localize to target language (single call for efficiency)
0 commit comments