Book Illustration Engine

Manuscript in. Print-ready, character-consistent, style-locked illustrations out. An end-to-end pipeline that turns a .docx or .pdf and a single style reference into a finished set of 4K, background-removed PNGs — fully autonomous, fully resumable, fully judged for quality.

See it work

One style reference. Three different scenes. Same look, same world, same fox.

_{Style reference (input)}

_{Scene 1 — fox at the vine (transparent PNG)}

_{Scene 2 — mid-leap (transparent PNG)}

_{Scene 3 — walking away (transparent PNG)}

These outputs come from bin/gen-scene.sh running on the bundled examples/fox-and-grapes/ sample. Same engine you'll run.

Why this exists

Every children's book project hits the same wall. You can generate one beautiful AI illustration in fifteen seconds. You can't generate forty consistent ones without losing your mind.

Style drifts between scenes. Characters look like cousins, not the same person. The "same fox from page 4" turns up on page 12 with different fur, eyes, posture. You spend more time prompt-fixing than illustrating.

This engine solves the consistency problem programmatically:

Style is locked by feeding the same reference image into every generation via Nano Banana Pro's edit-mode style transfer.
Characters are locked by training a FAL FLUX LoRA per recurring character from auto-generated reference sheets.
Quality is gated by Claude Vision auto-judging every output against five criteria, with up to two retries before flagging.
Cost is bounded by a budget cap that halts the run before overrun.

The result: you upload a manuscript, watch the plan, hit "go," and come back to a folder of 4K transparent PNGs ready for layout.

What it does

Takes two inputs:

A manuscript (.docx or .pdf) with inline Illustration: … markers
A style reference image (any PNG/JPEG)

And does, in order:

Stage	What it does	Tool
Parse	Convert the manuscript to numbered plain text	`textutil` (mac) / `pdftotext`
Extract briefs	Find every `Illustration:` marker, classify each as hero scene / icon grid / cross-reference	bash + grep
Build character graph	Identify recurring characters across scenes, classify recurring vs one-off	bash + Claude reasoning
Train LoRAs	For each recurring character: auto-generate 5–8 reference images, submit to FAL for FLUX LoRA training	Nano Banana Pro + FAL
Smart route	Per scene: FAL+LoRA (recurring characters) or Nano Banana Pro (one-offs / anthology)	Pipeline router
Generate	Produce each scene with the chosen path; auto-judge against 5 criteria; retry up to 2x	FAL FLUX or NB Pro + Claude Vision
Upscale	All scenes to 4096px	FAL Real-ESRGAN
Remove backgrounds	Transparent PNGs with smart fallback chain	FAL birefnet → local `bgproof`
Report	Per-book workspace with final PNGs, proof composites, cost ledger, resumable manifest	bash

The pipeline is resumable. A crash, a timeout, a Ctrl-C — re-running the engine picks up exactly where it left off, only spending money on the work that wasn't already done.

How it works

flowchart TD
    A[Manuscript .docx/.pdf] --> B[parse-doc.sh]
    S[Style reference image] --> B
    B --> C[Extract Illustration: briefs]
    C --> D[Build character graph]
    D --> E{Has recurring<br/>characters?}
    E -- yes --> F[gen-char-refs.sh<br/>5-8 refs per character]
    F --> G[train-lora.sh<br/>FAL FLUX LoRA]
    E -- no --> H[Skip LoRA training]
    G --> I[Plan + cost estimate]
    H --> I
    I --> J{User says go?}
    J -- yes --> K[Per-scene smart router]
    K --> L{Scene has<br/>recurring char?}
    L -- yes --> M[gen-scene-fal.sh<br/>FLUX + LoRA]
    L -- no --> N[gen-scene.sh<br/>NB Pro edit-mode]
    M --> O[Claude Vision auto-judge<br/>5 criteria]
    N --> O
    O --> P{Pass?}
    P -- no, retries left --> K
    P -- yes --> Q[upscale.sh<br/>FAL ESRGAN to 4K]
    Q --> R[bg-remove-fal.sh<br/>or local bgproof]
    R --> T[Final transparent PNG<br/>+ proof composite]

The smart router

For each scene, the router asks:

Does this scene contain a recurring character with a trained LoRA? → FAL FLUX + LoRA path (locks character identity).
Otherwise → Nano Banana Pro edit-mode path (locks style via the reference image).

You don't pick. The character graph picks for you, per scene.

The auto-judge

Every generated scene is sent back to Claude Vision and scored on five criteria:

Style match — does it look like the reference?
Subject coverage — is every required character/object present?
Character identity — do recurring characters actually look like themselves?
Composition — sane framing, nothing cropped weirdly
Technical quality — no extra limbs, no garbled text, no frame artifacts

Failing scenes get re-rolled up to twice. Persistent failures are flagged in the manifest, not silently shipped.

Install

As a Claude Code plugin (recommended)

/plugin marketplace add ayushnagvanshi101098-ship-it/book-illustration-engine
/plugin install book-illustration@ayush-plugins

Then in any Claude Code session:

Illustrate ~/Downloads/my-book.docx, style ref ~/Desktop/my-style.png.

The skill bootstraps a workspace, parses the manuscript, builds the character graph, trains LoRAs if needed, and shows you the full plan with a cost estimate. Nothing spends money until you say "go."

Standalone bash

git clone https://github.com/ayushnagvanshi101098-ship-it/book-illustration-engine.git
cd book-illustration-engine
cp .env.example .env
# fill in FAL_KEY, GEMINI_API_KEY (and ANTHROPIC_API_KEY if using Claude orchestration)
chmod +x bin/*.sh

The bin/ scripts compose into the full pipeline. See examples/fox-and-grapes/README.md for a step-by-step walkthrough.

Quick start with the bundled sample

The repo ships with a ready-to-run public-domain Aesop fable so you can prove the engine works on your machine before pointing it at your own book.

# 1. Parse the bundled manuscript (no API call, no money)
bin/parse-doc.sh examples/fox-and-grapes/manuscript.docx -o /tmp/parsed.txt
grep -c "Illustration:" /tmp/parsed.txt   # → 3

# 2. Generate one scene from the bundled brief + style ref (~$0.13)
echo "A red fox stands on his hind legs in a sunlit orchard, head tilted up at a vine of fat purple grapes. Watercolour storybook style." > /tmp/brief-1.txt
bin/gen-scene.sh \
  --brief /tmp/brief-1.txt \
  --style-ref examples/fox-and-grapes/style-ref.png \
  --output /tmp/fox-scene-1.jpeg

# 3. Upscale to 4K (~$0.04)
bin/upscale.sh --input /tmp/fox-scene-1.jpeg --output /tmp/fox-scene-1-4k.png --scale 2

# 4. Remove the background (~$0.02)
bin/bg-remove-fal.sh --input /tmp/fox-scene-1-4k.png --output /tmp/fox-scene-1-final.png

If all four steps succeed, your install is good.

Cost

Per-operation pricing on default settings:

Operation	Tool	Approx cost
Parse manuscript	`textutil` / `pdftotext`	free
Generate scene (NB Pro 2K)	Nano Banana Pro	~$0.13
Generate scene (FAL FLUX + LoRA)	FAL FLUX	~$0.04
Train one character LoRA	FAL LoRA training	~$2.00
Generate 5–8 character ref images	Nano Banana Pro	~$0.65 – $1.04
Upscale to 4K	FAL Real-ESRGAN	~$0.04
Remove background	FAL birefnet	~$0.02
Auto-judge (per scene)	Claude Vision	~$0.01

Typical book totals (default budget cap is $12):

Book type	Scenes	Recurring characters	Approx total
Anthology (one-off scenes)	15	0	~$1.70
Short narrative	20	1	~$5.00
Full picture book	30	2	~$8.50

The skill's pre-flight plan shows the exact estimate for your book before spending anything. Tell the skill at invocation time if you want to raise the cap (e.g. "…with a $20 budget").

Requirements

macOS or Linux, bash 4+
nano-banana CLI (Nano Banana Pro / Gemini 3 Pro Image) — install per its own docs
ImageMagick 7 (magick on PATH) — brew install imagemagick
pdftotext from poppler — brew install poppler
textutil (built-in on macOS) for .docx parsing
jq for JSON manipulation — brew install jq
A FAL account with an API key — sign up at https://fal.ai
A Gemini API key (for Nano Banana Pro) — https://aistudio.google.com/apikey
Optional: an Anthropic API key for Claude orchestration / auto-judge — https://console.anthropic.com

What makes this different

	Raw Midjourney / DALL-E	This engine
Character consistency across scenes	drift every gen	per-character LoRA
Style consistency across scenes	partial (--sref)	edit-mode style transfer
Quality gating	manual eyeball	automated 5-criteria judge
Crash recovery	re-run from scratch	resumable manifest
Cost visibility	post-hoc usage page	per-op ledger + budget cap
Output for print	screenshot the web UI	4K transparent PNGs out of the box
Workflow for 30 scenes	30 prompt sessions	one command

Configuration

Environment variables (set in .env or your shell):

Variable	Required	Default	Purpose
`FAL_KEY`	yes	—	FAL auth (LoRA, FLUX, ESRGAN, birefnet)
`GEMINI_API_KEY`	yes	—	Nano Banana Pro auth
`ANTHROPIC_API_KEY`	optional	—	Claude orchestration + auto-judge
`BOOK_ENGINE_HOME`	optional	`~/book-illustration-engine`	Engine install location

Roadmap

Things on the list, not yet shipped:

Cross-book LoRA reuse — reuse a trained character across multiple books (currently per-book isolated)
Icon grids and mini-posters — currently skipped with reason
Cross-reference resolution — "Same style as IF1" prompts (currently skipped)
Layout-aware composition — generate scenes pre-sized for facing pages, full bleeds, spot illustrations
Composed book PDF — drop the final PNGs into an InDesign-ready layout

PRs welcome. See docs/architecture.md for the design rationale.

How it works under the hood

For the full design — character graph algorithm, smart router decision tree, manifest schema, fallback chains — see docs/architecture.md.

For the orchestration logic that runs under Claude Code — including the per-step prompts, gates, and recovery behavior — see skills/book-illustration/SKILL.md.

License

MIT — see LICENSE. Use it commercially, fork it, ship it.

Author

Built by Ayush Nagvanshi — AI creative practitioner working on books, video, and brand at scale. Reach out on LinkedIn if you ship something with it.

If this saves you a weekend, a star is the cheapest way to say thanks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Book Illustration Engine

See it work

Why this exists

What it does

How it works

The smart router

The auto-judge

Install

As a Claude Code plugin (recommended)

Standalone bash

Quick start with the bundled sample

Cost

Requirements

What makes this different

Configuration

Roadmap

How it works under the hood

License

Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.claude-plugin		.claude-plugin
bin		bin
docs		docs
examples/fox-and-grapes		examples/fox-and-grapes
skills/book-illustration		skills/book-illustration
.env.example		.env.example
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

Book Illustration Engine

See it work

Why this exists

What it does

How it works

The smart router

The auto-judge

Install

As a Claude Code plugin (recommended)

Standalone bash

Quick start with the bundled sample

Cost

Requirements

What makes this different

Configuration

Roadmap

How it works under the hood

License

Author

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages