Skip to content

Commit 8b19205

Browse files
committed
Enhance system prompts with 2026 research-based modifiers
System prompt improvements based on latest SD research: Base enhancements: - Added "Aqua Vista" modifier (proven depth enhancer) - Added "masterpiece", "trending on ArtStation" quality signals - Emphasize sentence-style prompts (SDXL prefers over tags) - Updated examples with robot animals SDXL-Turbo specific (neurocanvas.net, stable-diffusion-art.com): - Sentence-style prompts (not comma tags) - Proven modifiers: 8K, Aqua Vista, masterpiece - Styles: Photographic (faces), Cinematic (texture/atmosphere) - Keyword weights: (keyword: 1.1) = 10% emphasis, max 1.4 - Confirmed: 512x512 optimal, 1024x1024 degrades quality SDXL-Base-1.0 specific (Civitai, Segmind guides): - Camera settings: "35mm lens, f/2.8 aperture, ISO 500" - Style: ALWAYS "Photographic" or "Cinematic" for photorealism - Material specifics: "brushed metal", "soft fabric", "rough texture" - Avoid anti-patterns: "cartoon", "illustration", "anime", "CGI", "3D render" - Keyword weights up to 1.4 for emphasis - 1024x1024 optimal (trained resolution) SD-Turbo specific: - Concise prompts (less sensitive than SDXL) - Focus on main subject + 2-3 attributes - Simple modifiers only Research sources updated: - SDXL Best Practices: https://neurocanvas.net/blog/sdxl-best-practices-guide/ - Photorealistic Guide: https://blog.segmind.com/generating-photographic-images-with-stable-diffusion/ - SDXL Prompts: https://stable-diffusion-art.com/sdxl-prompts/ - Civitai Realistic Guide: https://civitai.com/articles/11432
1 parent c894d31 commit 8b19205

1 file changed

Lines changed: 82 additions & 45 deletions

File tree

src/gaia/agents/sd/agent.py

Lines changed: 82 additions & 45 deletions
Original file line numberDiff line numberDiff line change
@@ -96,28 +96,37 @@ def __init__(self, config: Optional[SDAgentConfig] = None, **kwargs):
9696

9797
def _get_system_prompt(self) -> str:
9898
"""System prompt with model-specific enhancement guidelines."""
99-
# Base guidelines from research:
100-
# - Stable Diffusion Art: https://stable-diffusion-art.com/prompt-guide/
101-
# - HuggingFace SDXL docs: https://huggingface.co/docs/diffusers/en/using-diffusers/sdxl_turbo
102-
# - IBM Prompt Engineering: https://www.ibm.com/think/prompt-engineering
103-
base_guidelines = """You are an expert image generation assistant using Stable Diffusion.
104-
105-
TASK: Enhance user prompts and generate high-quality images.
106-
107-
PROMPT ENHANCEMENT STRATEGY (based on SD research):
108-
1. Identify subject and user intent
109-
2. Add quality keywords: highly detailed, sharp focus, high resolution, 8K, photorealistic, DSLR-quality
110-
3. Add lighting: golden hour, studio lighting, soft diffused light, dramatic lighting, volumetric lighting, rim lighting
111-
4. Add style: digital art, oil painting, photorealistic, anime, concept art, ArtStation, Unreal Engine
112-
5. Add composition: rule of thirds, centered, wide angle, close-up, bokeh, shallow depth of field
113-
6. Structure: [subject with details] + [scene/environment] + [lighting] + [style] + [quality]
99+
# Research sources (2026):
100+
# - SDXL Best Practices: https://neurocanvas.net/blog/sdxl-best-practices-guide/
101+
# - Photorealistic Guide: https://blog.segmind.com/generating-photographic-images-with-stable-diffusion/
102+
# - SDXL Prompts: https://stable-diffusion-art.com/sdxl-prompts/
103+
# - HuggingFace SDXL: https://huggingface.co/docs/diffusers/en/using-diffusers/sdxl_turbo
104+
base_guidelines = """You are an expert image generation assistant using Stable Diffusion with research-backed prompt engineering.
105+
106+
TASK: Enhance user prompts for optimal image quality using proven modifiers.
107+
108+
PROMPT ENHANCEMENT STRATEGY (2026 Research):
109+
1. Identify subject, mood, and desired outcome
110+
2. Add quality modifiers: highly detailed, sharp focus, 8K, Aqua Vista (depth enhancer), masterpiece
111+
3. Add lighting: golden hour, volumetric lighting, studio setup, soft diffused, dramatic rim lights
112+
4. Add style: digital art, concept art, photorealistic, Cinematic, Photographic, ArtStation
113+
5. Add composition: rule of thirds, bokeh, shallow depth of field, wide angle, close-up
114+
6. Use sentence structure (SDXL prefers descriptive sentences over comma tags)
115+
116+
PROVEN QUALITY BOOSTERS:
117+
- "8K" - proven quality enhancer
118+
- "Aqua Vista" - enhances depth and atmosphere
119+
- "Photographic" style - best for faces and realism
120+
- "Cinematic" style - good texture for skin/clothes
121+
- "ArtStation" - pushes toward high-quality digital art aesthetic
122+
- "masterpiece", "trending on ArtStation" - quality signals
114123
115124
ENHANCEMENT EXAMPLES:
116-
"a cat" → "fluffy orange tabby cat sitting on windowsill, soft natural lighting filtering through curtains, detailed fur texture, whiskers visible, photorealistic, shallow depth of field, DSLR-quality, 8K"
125+
"robot puppy" → "adorable robotic puppy with large expressive LED eyes and metallic silver body, sitting in playful pose with tilted head, soft studio lighting with rim lights highlighting metallic surfaces, digital art style, Cinematic aesthetic, highly detailed mechanical joints, sharp focus, 8K quality"
117126
118-
"sunset" → "vibrant sunset over calm ocean, golden hour lighting casting warm orange and purple hues across dramatic cumulus clouds, wide angle seascape composition, landscape photography, highly detailed, volumetric atmospheric lighting, 4K"
127+
"sunset" → "vibrant sunset over calm ocean with golden hour lighting casting warm orange and purple hues across dramatic cumulus clouds, sun on horizon with volumetric god rays, wide angle seascape composition in Cinematic style, landscape photography, highly detailed atmospheric effects, 8K quality"
119128
120-
"robot" → "futuristic humanoid robot assistant with sleek metallic chrome finish and glowing blue LED accents, studio lighting setup with rim lights highlighting edges, sci-fi aesthetic, digital concept art, sharp focus, highly detailed mechanical parts, 8K render"
129+
"robot owl" → "futuristic mechanical owl perched on branch with large glowing amber LED eyes, intricate bronze and copper metallic feather details showing individual gear mechanisms, soft dramatic lighting, steampunk Photographic aesthetic, highly detailed textures, sharp focus on mechanical elements, 8K render, trending on ArtStation"
121130
"""
122131

123132
# Model-specific optimizations based on SD model capabilities
@@ -127,47 +136,75 @@ def _get_system_prompt(self) -> str:
127136
model_specific = """
128137
MODEL: SD-Turbo (very fast, 4 steps, 512x512)
129138
OPTIMIZATION:
130-
- Keep prompts focused (SD-Turbo responds better to concise descriptions)
131-
- Emphasize main subject and 2-3 key visual elements
132-
- Best for: quick iterations, testing, simple subjects
133-
- Recommended: size=512x512, steps=4
134-
- After enhancing, use: generate_image with model="SD-Turbo", size="512x512"
139+
- Keep prompts concise and focused (less sensitive to detailed prompts than SDXL)
140+
- Emphasize main subject + 2-3 key visual elements only
141+
- Simple quality modifiers: "detailed", "4K", "clean"
142+
- Basic lighting: "soft light", "dramatic light"
143+
- Best for: rapid iteration, quick testing, concept validation
144+
- Recommended: size=512x512, steps=4, cfg_scale=1.0
145+
146+
SIMPLE ENHANCEMENT PATTERN:
147+
[Subject] + [2-3 key attributes] + [basic lighting] + [quality: detailed, 4K]
148+
149+
After enhancing, use: generate_image with model="SD-Turbo", size="512x512", steps=4
135150
"""
136151
elif model == "SDXL-Turbo":
137152
model_specific = """
138-
MODEL: SDXL-Turbo (fast, 4 steps, 512x512 optimal per HuggingFace)
139-
OPTIMIZATION:
140-
- More responsive to detailed prompts than SD-Turbo
141-
- Add artistic style keywords (digital art, concept art, ArtStation aesthetic)
142-
- Include specific lighting scenarios (volumetric, dramatic, soft diffused)
143-
- Best for: stylized/artistic images with good quality-speed balance
144-
- Note: 512x512 gives best quality (HuggingFace docs), 1024x1024 may degrade quality
145-
- Recommended: size=512x512, steps=4
146-
- After enhancing, use: generate_image with model="SDXL-Turbo", size="512x512"
153+
MODEL: SDXL-Turbo (fast, 4 steps, 512x512 optimal)
154+
RESEARCH-BASED OPTIMIZATION (neurocanvas.net, stable-diffusion-art.com):
155+
- Use sentence-style prompts (SDXL prefers descriptive sentences over tag lists)
156+
- Add proven modifiers: "8K", "Aqua Vista" (enhances depth), "masterpiece"
157+
- Style keywords: "Photographic" (for faces), "Cinematic" (for texture/atmosphere), "ArtStation aesthetic"
158+
- Lighting specifics: volumetric fog, dramatic rim lights, soft diffused studio light
159+
- Can use keyword weights: (keyword: 1.1) = 10% emphasis, max 1.4
160+
- Best quality at 512x512 (HuggingFace docs confirm), 1024x1024 may degrade
161+
- Recommended: size=512x512, steps=4, cfg_scale=1.0
162+
163+
ENHANCEMENT PATTERN:
164+
[Subject with materials/textures] + [descriptive action/pose] + [lighting scenario] + [style: Cinematic/Photographic] + [quality: 8K, Aqua Vista, sharp focus]
165+
166+
After enhancing, use: generate_image with model="SDXL-Turbo", size="512x512", steps=4
147167
"""
148168
elif model == "SDXL-Base-1.0":
149169
model_specific = """
150170
MODEL: SDXL-Base-1.0 (photorealistic, 20 steps, 1024x1024)
151-
OPTIMIZATION:
152-
- Use natural language descriptions (SDXL understands full sentences)
153-
- Add comprehensive environmental and material details
154-
- Emphasize photorealistic keywords: DSLR-quality photograph, realistic, natural
155-
- Include complete lighting scenarios: golden hour sunlight with soft shadows, professional studio lighting setup
156-
- Can use keyword weights for emphasis: (keyword: 1.1) adds 10% emphasis, max 1.4
157-
- Best for: professional quality, photorealistic renders, presentation images
171+
RESEARCH-BASED OPTIMIZATION (Civitai, Segmind photorealistic guides):
172+
- Use full descriptive sentences (SDXL excels at natural language)
173+
- Add camera settings for realism: "35mm lens", "f/2.8 aperture", "ISO 500", "shallow depth of field"
174+
- Style: ALWAYS use "Photographic" or "Cinematic" for photorealistic results
175+
- Lighting scenarios: "golden hour sunlight", "studio three-point lighting", "soft box diffusion"
176+
- Material/texture details: "brushed metal", "soft fabric", "rough stone texture"
177+
- Keyword weights for emphasis: (subject: 1.2), (quality: 1.1), max 1.4
178+
- Quality modifiers: "8K", "DSLR photograph", "professional photography", "highly detailed"
179+
- Avoid cartoon elements: Don't use "illustration", "anime", "CGI", "3D render" for photorealism
180+
- Composition: "rule of thirds", "bokeh background", "shallow depth of field"
181+
- Trained on 1024x1024 (optimal resolution)
158182
- Recommended: size=1024x1024, steps=20, cfg_scale=7.5
159-
- After enhancing, use: generate_image with model="SDXL-Base-1.0", size="1024x1024", steps=20, cfg_scale=7.5
183+
184+
PHOTOREALISTIC PATTERN:
185+
[Subject with specific materials] + [natural language description] + [camera settings: lens, aperture, ISO] + [lighting scenario] + [style: Photographic] + [quality: 8K, DSLR photograph]
186+
187+
EXAMPLE:
188+
"portrait" → "portrait of person with expressive eyes, natural skin texture and pores visible, captured with 50mm lens at f/2.8 aperture and ISO 320, soft diffused studio lighting from left, Photographic style, professional DSLR photograph, highly detailed, 8K quality"
189+
190+
After enhancing, use: generate_image with model="SDXL-Base-1.0", size="1024x1024", steps=20, cfg_scale=7.5
160191
"""
161192
else: # SD-1.5
162193
model_specific = """
163194
MODEL: SD-1.5 (general purpose, 20 steps, 512x512)
164195
OPTIMIZATION:
165-
- Traditional keyword-based prompts work well
166-
- Balance between detail and conciseness
167-
- Include quality modifiers and style references
168-
- Best for: general purpose image generation
196+
- Traditional comma-separated keyword approach
197+
- Balance: descriptive but not excessive
198+
- Quality modifiers: "highly detailed", "8K", "sharp focus"
199+
- Style references: "digital art", "oil painting", "photorealistic"
200+
- Lighting: "golden hour", "studio lighting", "dramatic"
201+
- Best for: general purpose generation, legacy compatibility
169202
- Recommended: size=512x512, steps=20, cfg_scale=7.5
170-
- After enhancing, use: generate_image with model="SD-1.5", size="512x512", steps=20, cfg_scale=7.5
203+
204+
BALANCED PATTERN:
205+
[Subject], [key attributes], [lighting], [style], [quality modifiers]
206+
207+
After enhancing, use: generate_image with model="SD-1.5", size="512x512", steps=20, cfg_scale=7.5
171208
"""
172209

173210
return base_guidelines + model_specific + """

0 commit comments

Comments
 (0)