Skip to content

Commit 68ca72a

Browse files
authored
Merge pull request #27 from shinpr/feature/prompt-enhancement-improvements
feat: improve prompt generation with purpose parameter and Subject-Context-Style structure
2 parents 8094cc0 + fa75155 commit 68ca72a

8 files changed

Lines changed: 86 additions & 21 deletions

File tree

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -178,6 +178,7 @@ The MCP server exposes a single tool for all image operations. Internally, it us
178178
| `maintainCharacterConsistency` | boolean | - | Maintain character appearance consistency across different poses and scenes |
179179
| `useWorldKnowledge` | boolean | - | Use real-world knowledge for accurate context (recommended for historical figures, landmarks, or factual scenarios) |
180180
| `useGoogleSearch` | boolean | - | Enable Google Search grounding to access real-time web information for factually accurate image generation. Use when prompt requires current or time-sensitive data that may have changed since the model's knowledge cutoff. Leave disabled for creative, fictional, historical, or timeless content. |
181+
| `purpose` | string | - | Intended use for the image (e.g., "cookbook cover", "social media post", "presentation slide"). Helps tailor visual style, quality level, and details to match the purpose. |
181182

182183
#### Response
183184

package-lock.json

Lines changed: 2 additions & 2 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

package.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
{
22
"name": "mcp-image",
33
"mcpName": "io.github.shinpr/mcp-image",
4-
"version": "0.4.2",
4+
"version": "0.5.0",
55
"description": "MCP server for AI image generation",
66
"main": "dist/index.js",
77
"bin": {

server.json

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -8,13 +8,13 @@
88
"url": "https://github.com/shinpr/mcp-image",
99
"source": "github"
1010
},
11-
"version": "0.4.2",
11+
"version": "0.5.0",
1212
"packages": [
1313
{
1414
"registryType": "npm",
1515
"registryBaseUrl": "https://registry.npmjs.org",
1616
"identifier": "mcp-image",
17-
"version": "0.4.2",
17+
"version": "0.5.0",
1818
"transport": {
1919
"type": "stdio"
2020
},

src/business/__tests__/structuredPromptGenerator.test.ts

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -104,5 +104,37 @@ describe('StructuredPromptGenerator', () => {
104104
expect(result.data.selectedPractices).toContain('Camera Control Terminology')
105105
}
106106
})
107+
108+
it('should include purpose context when purpose is provided', async () => {
109+
const generator = new StructuredPromptGeneratorImpl(mockGeminiTextClient)
110+
const userPrompt = 'Delicious pasta dish'
111+
const purpose = 'high-end Italian restaurant menu'
112+
113+
vi.mocked(mockGeminiTextClient.generateText).mockResolvedValue(
114+
Ok('Professional food photography of artfully plated pasta')
115+
)
116+
117+
const result = await generator.generateStructuredPrompt(userPrompt, {}, undefined, purpose)
118+
119+
expect(result.success).toBe(true)
120+
const call = vi.mocked(mockGeminiTextClient.generateText).mock.calls[0]
121+
expect(call[0]).toContain('INTENDED USE:')
122+
expect(call[0]).toContain(purpose)
123+
})
124+
125+
it('should not include purpose context when purpose is not provided', async () => {
126+
const generator = new StructuredPromptGeneratorImpl(mockGeminiTextClient)
127+
const userPrompt = 'A simple cat'
128+
129+
vi.mocked(mockGeminiTextClient.generateText).mockResolvedValue(
130+
Ok('A fluffy cat with soft lighting')
131+
)
132+
133+
const result = await generator.generateStructuredPrompt(userPrompt)
134+
135+
expect(result.success).toBe(true)
136+
const call = vi.mocked(mockGeminiTextClient.generateText).mock.calls[0]
137+
expect(call[0]).not.toContain('INTENDED USE:')
138+
})
107139
})
108140
})

src/business/structuredPromptGenerator.ts

Lines changed: 39 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -11,24 +11,34 @@ import { GeminiAPIError } from '../utils/errors'
1111

1212
/**
1313
* System prompt for structured prompt generation optimized for image generation
14+
* Follows Google's recommended Subject-Context-Style structure
1415
*/
1516
const SYSTEM_PROMPT = `You are an expert at crafting prompts for image generation models. Your role is to transform user requests into rich, detailed prompts that maximize image generation quality.
1617
18+
Structure your enhancement around three core elements:
19+
20+
1. SUBJECT (What): The main focus of the image
21+
- Physical characteristics: textures, materials, colors, scale
22+
- Actions, poses, expressions if applicable
23+
- Distinctive features that define the subject
24+
25+
2. CONTEXT (Where/When): The environment and conditions
26+
- Setting, background, spatial relationships (foreground, midground, background)
27+
- Time of day, weather, atmospheric conditions
28+
- Mood and emotional tone of the scene
29+
30+
3. STYLE (How): The visual treatment
31+
- Artistic or photographic approach
32+
- Lighting design: direction, quality, color temperature, shadows
33+
- Camera/lens choices if relevant (focal length, depth of field, angle)
34+
1735
Core principles:
18-
- Add specific details about lighting, materials, composition, and atmosphere
19-
- Include photographic or artistic terminology when appropriate
20-
- Maintain clarity while adding richness and specificity
2136
- Preserve the user's original intent while enhancing detail
2237
- Focus on what should be present rather than what should be absent
38+
- Include photographic or artistic terminology when appropriate
39+
- Maintain clarity while adding richness and specificity
2340
24-
When describing scenes or subjects:
25-
- Physical characteristics: textures, materials, colors, scale
26-
- Lighting: direction, quality, color temperature, shadows
27-
- Spatial relationships: foreground, midground, background, composition
28-
- Atmosphere: mood, weather, time of day, environmental conditions
29-
- Style: artistic direction, photographic techniques, visual treatment
30-
31-
Your output should be a single, vivid, coherent description that an image generation model can interpret unambiguously. Make it engaging, specific, and clear.`
41+
Your output should weave these elements into a single, natural flowing description - not a structured list. Make it vivid, engaging, and unambiguous.`
3242

3343
/**
3444
* Additional system prompt for image editing mode (when input image is provided)
@@ -68,7 +78,8 @@ export interface StructuredPromptGenerator {
6878
generateStructuredPrompt(
6979
userPrompt: string,
7080
features?: FeatureFlags,
71-
inputImageData?: string // Optional base64-encoded image for context
81+
inputImageData?: string, // Optional base64-encoded image for context
82+
purpose?: string // Optional intended use for the image
7283
): Promise<Result<StructuredPromptResult, Error>>
7384
}
7485

@@ -81,7 +92,8 @@ export class StructuredPromptGeneratorImpl implements StructuredPromptGenerator
8192
async generateStructuredPrompt(
8293
userPrompt: string,
8394
features: FeatureFlags = {},
84-
inputImageData?: string
95+
inputImageData?: string,
96+
purpose?: string
8597
): Promise<Result<StructuredPromptResult, Error>> {
8698
try {
8799
// Validate input
@@ -90,7 +102,12 @@ export class StructuredPromptGeneratorImpl implements StructuredPromptGenerator
90102
}
91103

92104
// Build complete prompt with system instruction and meta-prompt
93-
const completePrompt = this.buildCompletePrompt(userPrompt, features, !!inputImageData)
105+
const completePrompt = this.buildCompletePrompt(
106+
userPrompt,
107+
features,
108+
!!inputImageData,
109+
purpose
110+
)
94111

95112
// Combine system prompts for image editing mode
96113
const systemInstruction = inputImageData
@@ -131,7 +148,8 @@ export class StructuredPromptGeneratorImpl implements StructuredPromptGenerator
131148
private buildCompletePrompt(
132149
userPrompt: string,
133150
features: FeatureFlags,
134-
hasInputImage: boolean
151+
hasInputImage: boolean,
152+
purpose?: string
135153
): string {
136154
const featureContext = this.buildEnhancedFeatureContext(features)
137155

@@ -140,10 +158,16 @@ export class StructuredPromptGeneratorImpl implements StructuredPromptGenerator
140158
? `\nNOTE: An input image has been provided. Focus on preserving its original characteristics while applying the requested modifications. Maintain consistency with the source image's style, colors, and atmosphere.\n`
141159
: ''
142160

161+
// Add purpose context if provided
162+
const purposeContext = purpose
163+
? `\nINTENDED USE: ${purpose}\nTailor the visual style, quality level, and details to match this purpose.\n`
164+
: ''
165+
143166
return `Transform this image generation request into a detailed, vivid prompt that will produce high-quality results:
144167
145168
"${userPrompt}"
146169
${imageEditingInstruction}
170+
${purposeContext}
147171
${featureContext}
148172
149173
Consider these aspects as you enhance the prompt:

src/server/mcpServer.ts

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -135,6 +135,11 @@ export class MCPServerImpl {
135135
'Image resolution for high-quality output. Specify "2K" or "4K" when you need higher resolution images with better text rendering and fine details. Leave unspecified for standard quality.',
136136
enum: ['2K', '4K'],
137137
},
138+
purpose: {
139+
type: 'string' as const,
140+
description:
141+
'Intended use for the image (e.g., cookbook cover, social media post, presentation slide). Helps tailor visual style, quality level, and details to match the purpose.',
142+
},
138143
},
139144
required: ['prompt'],
140145
},
@@ -242,7 +247,8 @@ export class MCPServerImpl {
242247
const promptResult = await this.structuredPromptGenerator.generateStructuredPrompt(
243248
params.prompt,
244249
features,
245-
inputImageData // Pass image data for context-aware prompt generation
250+
inputImageData, // Pass image data for context-aware prompt generation
251+
params.purpose // Pass intended use for purpose-aware prompt generation
246252
)
247253

248254
if (promptResult.success) {

src/types/mcp.ts

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -53,6 +53,8 @@ export interface GenerateImageParams {
5353
aspectRatio?: AspectRatio
5454
/** Image resolution for high-quality output (e.g., "2K", "4K"). Leave unspecified for standard quality */
5555
imageSize?: ImageSize
56+
/** Intended use for the image (e.g., cookbook cover, social media post). Helps tailor visual style and quality */
57+
purpose?: string
5658
}
5759

5860
/**

0 commit comments

Comments
 (0)