Skip to content

Commit c88a73f

Browse files
authored
Merge pull request #9 from shinpr/feature/gemini-structured-prompt-orchestration
feat: implement structured prompt orchestration with Gemini API integration
2 parents aa17af3 + aca602c commit c88a73f

26 files changed

Lines changed: 2631 additions & 4663 deletions

.npmignore

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,8 +26,12 @@ CLAUDE.md
2626

2727
# Test files
2828
__tests__/
29+
src/__tests__/
30+
src/tests/
31+
**/__tests__/
2932
*.test.js
3033
*.test.ts
34+
*.test.d.ts
3135
coverage/
3236
.vitest-cache/
3337

README.md

Lines changed: 31 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,13 @@ A powerful MCP (Model Context Protocol) server that enables AI assistants to gen
55
## ✨ Features
66

77
- **AI-Powered Image Generation**: Create images from text prompts using Gemini 2.5 Flash Image Preview
8+
- **Intelligent Prompt Enhancement**: Automatically optimizes your prompts using Gemini 2.0 Flash for superior image quality
9+
- Adds photographic and artistic details
10+
- Enriches lighting, composition, and atmosphere descriptions
11+
- Preserves your intent while maximizing generation quality
812
- **Image Editing**: Transform existing images with natural language instructions
13+
- Context-aware editing that preserves original style
14+
- Maintains visual consistency with source image
915
- **Advanced Options**:
1016
- Multi-image blending for composite scenes
1117
- Character consistency across generations
@@ -62,6 +68,18 @@ Add to your Cursor settings:
6268
- Defaults to `./output` in the current working directory if not specified
6369
- Directory will be created automatically if it doesn't exist
6470

71+
### Optional: Skip Prompt Enhancement
72+
73+
Set `SKIP_PROMPT_ENHANCEMENT=true` to disable automatic prompt optimization and send your prompts directly to the image generator. Useful when you need full control over the exact prompt wording.
74+
75+
**Claude Code:**
76+
```bash
77+
claude mcp add mcp-image --env GEMINI_API_KEY=your-api-key --env SKIP_PROMPT_ENHANCEMENT=true -- npx -y mcp-image
78+
```
79+
80+
**Cursor:**
81+
Add `"SKIP_PROMPT_ENHANCEMENT": "true"` to the env section in your config.
82+
6583
## 📖 Usage Examples
6684

6785
Once configured, your AI assistant can generate images using natural language:
@@ -72,14 +90,15 @@ Once configured, your AI assistant can generate images using natural language:
7290
"Generate a serene mountain landscape at sunset with a lake reflection"
7391
```
7492

93+
The system automatically enhances this to include rich details about lighting, materials, composition, and atmosphere for optimal results.
94+
7595
### Image Editing
7696

7797
```
7898
"Edit this image to make the person face right"
7999
(with inputImagePath: "/path/to/image.jpg")
80100
```
81101

82-
83102
### Advanced Features
84103

85104
```
@@ -91,7 +110,9 @@ Once configured, your AI assistant can generate images using natural language:
91110

92111
### `generate_image` Tool
93112

94-
The MCP server exposes a single tool for all image operations:
113+
The MCP server exposes a single tool for all image operations. Internally, it uses a two-stage process:
114+
1. **Prompt Optimization**: Gemini 2.0 Flash analyzes and enriches your prompt
115+
2. **Image Generation**: Gemini 2.5 Flash Image Preview creates the final image
95116

96117
#### Parameters
97118

@@ -142,17 +163,20 @@ The MCP server exposes a single tool for all image operations:
142163

143164
### Performance Tips
144165

145-
- Image generation: 30-60 seconds typical
146-
- Image editing: 15-45 seconds typical
147-
- Use specific, descriptive prompts for better results
166+
- Image generation: 30-60 seconds typical (includes prompt optimization)
167+
- Image editing: 15-45 seconds typical (includes context analysis)
168+
- Simple prompts work great - the AI automatically adds professional details
169+
- Complex prompts are preserved and further enhanced
148170
- Consider enabling `useWorldKnowledge` for historical or factual subjects
149171

150-
151172
## 💰 Usage Notes
152173

153-
- This MCP server uses the paid Gemini API for image generation
174+
- This MCP server uses the paid Gemini API for both prompt optimization and image generation
175+
- Gemini 2.0 Flash for intelligent prompt enhancement (minimal token usage)
176+
- Gemini 2.5 Flash Image Preview for actual image generation
154177
- Check current pricing and rate limits at [Google AI Studio](https://aistudio.google.com/)
155178
- Monitor your API usage to avoid unexpected charges
179+
- The prompt optimization step adds minimal cost while significantly improving output quality
156180

157181
## 📄 License
158182

package-lock.json

Lines changed: 6 additions & 6 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

package.json

Lines changed: 11 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
{
22
"name": "mcp-image",
3-
"version": "0.1.1",
3+
"version": "0.2.0",
44
"description": "MCP server for AI image generation",
55
"main": "dist/index.js",
66
"bin": {
@@ -19,6 +19,13 @@
1919
"type": "git",
2020
"url": "https://github.com/shinpr/mcp-image.git"
2121
},
22+
"files": [
23+
"dist/**/*",
24+
"!dist/**/*.test.js",
25+
"!dist/**/*.test.d.ts",
26+
"!dist/__tests__",
27+
"!dist/tests"
28+
],
2229
"scripts": {
2330
"build": "tsc && tsc-alias",
2431
"link": "npm run build && npm link",
@@ -41,8 +48,8 @@
4148
"test:safe": "npm test && npm run cleanup:processes"
4249
},
4350
"dependencies": {
44-
"@modelcontextprotocol/sdk": "^1.0.0",
45-
"@google/genai": "^1.16.0"
51+
"@google/genai": "^1.17.0",
52+
"@modelcontextprotocol/sdk": "^1.0.0"
4653
},
4754
"devDependencies": {
4855
"@biomejs/biome": "^1.9.4",
@@ -70,4 +77,4 @@
7077
"biome format --write --no-errors-on-unmatched"
7178
]
7279
}
73-
}
80+
}

0 commit comments

Comments
 (0)