Skip to content

docs: add image generation architecture guideline#480

Open
liyin2015 wants to merge 1 commit into
mainfrom
docs/image-gen-guideline
Open

docs: add image generation architecture guideline#480
liyin2015 wants to merge 1 commit into
mainfrom
docs/image-gen-guideline

Conversation

@liyin2015

Copy link
Copy Markdown
Member

Image Generation Architecture Guideline

TL;DR: Adds a comprehensive design doc for the image generation pipeline, documenting the correct ModelClient → Generator → GeneratorOutput architecture, the GeneratorOutput contract for images, and provider-specific API references.

What's in the doc

  • Architecture overview: How image generation reuses the existing Generator flow (same as text generation)
  • GeneratorOutput contract: Standard image dict format ({"b64_json": ..., "mime_type": ...} or {"url": ...})
  • ModelClient requirements: What each client must implement for IMAGE_GENERATION (convert_inputs, acall, parse_response)
  • ImageModelAdapter analysis: Why it exists, why it's redundant, and migration path to eliminate it
  • OpenAI API reference: Complete gpt-image-2 parameter docs, flexible resolution constraints, size options, cost table, and behavioral gotchas
  • Common bugs catalog: Documented 4 bugs (base64 leak in data, wrong dict format, base64 in raw_response, list unwrap) with before/after code
  • Testing checklist: 11-item checklist for adding new image providers

Why this matters

The image generation pipeline had undocumented contracts that led to bugs (e.g., OpenAI client storing base64 in GeneratorOutput.data instead of images, causing base64 to leak into user-visible display). This doc captures the correct architecture so future provider integrations follow the right pattern.


🌸 Generated with AdaL

Comprehensive design doc covering:
- Correct ModelClient → Generator → GeneratorOutput architecture
- GeneratorOutput contract for images (b64_json/url dict format)
- What each ModelClient must implement for IMAGE_GENERATION
- Why ImageModelAdapter is redundant and migration path
- OpenAI gpt-image-2 API reference (params, sizes, costs, gotchas)
- Common bugs catalog (base64 leak, wrong field, format issues)
- Testing checklist for new image providers

Co-Authored-By: AdaL <adal@sylph.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant