diff --git a/README.md b/README.md
index bf13b7b..d4faddc 100644
--- a/README.md
+++ b/README.md
@@ -65,6 +65,7 @@ Embed AMD-optimized AI into end-user applications.
 | Skill | What it does |
 | --- | --- |
 | `local-ai-app-integration` | Add private, on-device AI to apps that use OpenAI, Anthropic, or Ollama APIs by bundling Embeddable Lemonade as a subprocess. |
+| `local-ai-use` | Apply a Lemonade-first strategy so agents default to local image generation, text-to-speech, and speech-to-text to reduce token/cost usage before any cloud fallback. |
 
 ### Cross-stack porting
 
@@ -226,7 +227,7 @@ See [AUTHORING.md](AUTHORING.md) for the full authoring guide, including when a
 
 ## Status
 
-This repository is in its early days. The first in-repo skill, `skills/local-ai-app-integration/`, is available now and seeds the **Application integration** focus area. The Hardware-native, Cross-stack porting, and Profiling and delivery focus areas are being built out incrementally alongside manifests and CI. Expect rapid iteration. File an issue if there is a workflow you want covered, or open a PR with a skill you have been wanting to share.
+This repository is in its early days. In-repo skills include `skills/local-ai-app-integration/` and `skills/local-ai-use/`, seeding the **Application integration** focus area. The Hardware-native, Cross-stack porting, and Profiling and delivery focus areas are being built out incrementally alongside manifests and CI. Expect rapid iteration. File an issue if there is a workflow you want covered, or open a PR with a skill you have been wanting to share.
 
 ## License
 
diff --git a/skills/local-ai-use/SKILL.md b/skills/local-ai-use/SKILL.md
new file mode 100644
index 0000000..62c6ae8
--- /dev/null
+++ b/skills/local-ai-use/SKILL.md
@@ -0,0 +1,256 @@
+---
+name: local-ai-use
+description: >-
+  Routes image generation, text-to-speech, and speech-to-text through a local
+  Lemonade Server so the agent stops paying for cloud calls on those modalities.
+  Use when the user wants to save tokens, save cost, or save money using local
+  AI; default to local, offline, on-device, or private image generation,
+  transcription, or text-to-speech in this workspace; stop using DALL-E,
+  Whisper-as-a-service, ElevenLabs, or other paid multimodal APIs; route the
+  agent's image, TTS, or STT tool calls to a local model; or mentions Lemonade
+  Server, OmniRouter, SD-Turbo, kokoro, Whisper, Ryzen AI, NPU/iGPU/dGPU
+  inference, or "use local for images but cloud for chat". Run once per
+  workspace; the rule it installs handles every later request.
+---
+
+# Local AI Use (route image, TTS, STT through Lemonade)
+
+This is a **meta-skill**. You run it once. After that, every later request that
+needs image generation, text-to-speech, or speech-to-text uses the local
+[Lemonade Server](https://lemonade-server.ai) instead of a cloud API. The
+agent's own LLM keeps handling text; only the expensive multimodal calls move
+on-device.
+
+The skill does two things:
+
+1. **Verifies that local Lemonade is reachable and has the right models.**
+2. **Drops a `Local AI Use` block into the workspace `AGENTS.md`** so the agent
+   reads the routing rule on every later turn, in Cursor, Claude Code, Codex,
+   Gemini CLI, and any other agent that respects `AGENTS.md`.
+
+## When to use this skill
+
+Use this skill when **all** of the following are true:
+
+- The user has, or is willing to install, the system-wide Lemonade Server.
+- The user accepts the default Lemonade endpoint `http://localhost:13305`.
+- The user wants the change to be **persistent** across future turns and
+  agent restarts (the rule is written to disk).
+
+If the user is instead **embedding** Lemonade as a private subprocess inside
+an app installer, do not use this skill; use `local-ai-app-integration`
+instead.
+
+## Prerequisites
+
+- **OS:** Windows 11 x64, Ubuntu/Debian x64, or macOS (beta).
+- **Lemonade Server CLI on `PATH`:** verify with `lemonade --version`. If
+  missing, install from <https://lemonade-server.ai/install_options.html>
+  before continuing. Do not silently install on the user's machine; that is a
+  system-wide change and must be the user's call.
+- **Disk:** ~8 GB free for the three default models (SD-Turbo + Whisper-Tiny
+  + kokoro-v1).
+- **Network:** required for the first `lemonade pull` of each model. After
+  that, every modality runs offline.
+
+## The opinionated path
+
+Run this checklist top to bottom. Track progress against it; do not move on
+until each step verifies.
+
+```
+[ ] 1. Confirm Lemonade Server is installed and reachable
+[ ] 2. Pull the three default modality models
+[ ] 3. Install the routing rule into the workspace AGENTS.md
+[ ] 4. Smoke-test image, TTS, and STT against the local endpoint
+```
+
+The single command that does steps 1, 2, and 3 in one shot is:
+
+```bash
+python scripts/setup_local_ai.py
+```
+
+(Run from this skill's folder.) The script is idempotent: re-running it on a
+fully configured workspace is a no-op apart from a healthcheck. Read the
+sections below for what to do when each step fails.
+
+---
+
+## Step 1: confirm Lemonade Server is reachable
+
+Run:
+
+```bash
+lemonade status --json
+```
+
+Two acceptable outcomes:
+
+| `lemonade status` says | Action |
+|---|---|
+| `Server is running on port 13305` | Continue to Step 2. |
+| `Server is not running` | Start it. On Windows, launch the **Lemonade** Start Menu shortcut. On Linux, run `sudo systemctl start lemonade-server`. Re-check `lemonade status`. |
+
+If `lemonade` is not on `PATH` at all, the server is not installed. Stop and
+point the user at <https://lemonade-server.ai/install_options.html>. Do not
+attempt a silent install.
+
+The rest of this skill assumes the endpoint is `http://localhost:13305/api/v1`
+and no API key is required (the system-wide server defaults to no auth on
+loopback). If the user has set `LEMONADE_API_KEY`, the routing rule template
+in `templates/local-ai-rule.md` shows where to add the `Authorization` header.
+
+## Step 2: pull the three default modality models
+
+Pull these three. They are the **Lite Collection** defaults from Lemonade
+OmniRouter, sized to keep token-and-cost savings real on commodity hardware:
+
+| Modality | Model | Size | Why this default |
+|---|---|---|---|
+| Image generation | `SD-Turbo` | ~5 GB | Single-step generation, runs on CPU and AMD iGPU/dGPU |
+| Text-to-speech | `kokoro-v1` | ~0.3 GB | Only TTS model Lemonade currently supports; CPU-only, low latency |
+| Speech-to-text | `Whisper-Tiny` | ~0.1 GB | Smallest Whisper; fast on CPU. Upgrade to `Whisper-Large-v3-Turbo` if accuracy matters more than latency. |
+
+```bash
+lemonade pull SD-Turbo
+lemonade pull kokoro-v1
+lemonade pull Whisper-Tiny
+```
+
+Each `pull` is idempotent. To verify what is already downloaded:
+
+```bash
+lemonade list --downloaded
+```
+
+For coverage of larger / higher-quality alternatives (`SDXL-Turbo`,
+`Flux-2-Klein-4B`, `Whisper-Large-v3-Turbo`), see the
+[model picker in reference.md](reference.md#model-picker).
+
+## Step 3: install the routing rule into AGENTS.md
+
+The rule is a Markdown block stored in [`templates/local-ai-rule.md`](templates/local-ai-rule.md).
+Append it to the workspace's `AGENTS.md` (create the file if missing). Both
+Cursor and Claude Code load `AGENTS.md` automatically on every turn, so the
+agent will see the rule on its next message without any further setup.
+
+`scripts/setup_local_ai.py` does this for you, surrounded by stable markers
+so re-running the script replaces the block in place rather than appending
+a second copy. The markers look like:
+
+```
+<!-- BEGIN amd-skills:local-ai-use -->
+...rule...
+<!-- END amd-skills:local-ai-use -->
+```
+
+If you write the file by hand, keep those exact markers. The script relies
+on them for idempotent updates.
+
+If the user's agent only respects a different convention, mirror the same
+block to:
+
+- `CLAUDE.md` (Claude Code, project-scoped) or `~/.claude/CLAUDE.md` (global)
+- `.cursor/rules/local-ai-use.mdc` (Cursor user/project rules)
+- `GEMINI.md` (Gemini CLI)
+
+The rule's content is identical; only the file location changes.
+
+## Step 4: smoke-test the three modalities
+
+Verify each modality against the live server before declaring success. These
+mirror the inline patterns in the installed rule, so a green pass here means
+the rule will work.
+
+**Image generation** (writes `out.png`):
+
+```bash
+curl -sX POST http://localhost:13305/api/v1/images/generations \
+  -H "Content-Type: application/json" \
+  -d '{"model":"SD-Turbo","prompt":"a single red apple on a white table","size":"512x512","steps":4,"response_format":"b64_json"}' \
+  | python -c "import sys,json,base64; open('out.png','wb').write(base64.b64decode(json.load(sys.stdin)['data'][0]['b64_json']))"
+```
+
+**Text-to-speech** (writes `out.mp3`):
+
+```bash
+curl -sX POST http://localhost:13305/api/v1/audio/speech \
+  -H "Content-Type: application/json" \
+  -d '{"model":"kokoro-v1","input":"Local AI is now active.","response_format":"mp3"}' \
+  -o out.mp3
+```
+
+**Speech-to-text** (round-trips `out.mp3` → text via a wav re-encode):
+
+```bash
+ffmpeg -y -i out.mp3 -ar 16000 -ac 1 out.wav
+curl -sX POST http://localhost:13305/api/v1/audio/transcriptions \
+  -F "file=@out.wav" -F "model=Whisper-Tiny"
+```
+
+If any of the three returns a non-2xx status, fix it now. The rule we just
+installed sends future requests to these same endpoints, so a broken endpoint
+becomes a broken user experience.
+
+---
+
+## What changes after this skill runs
+
+From the next turn onward, the agent reads the rule in `AGENTS.md` on every
+message. The rule explicitly tells the agent:
+
+- **For image generation:** call `POST /api/v1/images/generations` on the
+  local server. Do **not** call any cloud image API and do **not** use the
+  built-in `GenerateImage` tool (that path bills tokens to the cloud
+  provider).
+- **For text-to-speech:** call `POST /api/v1/audio/speech`. Do **not** call
+  cloud TTS providers (OpenAI TTS, ElevenLabs, etc.).
+- **For speech-to-text:** call `POST /api/v1/audio/transcriptions`. Do
+  **not** call cloud transcription providers.
+- **Fallback:** only fall back to a cloud API after one local attempt has
+  failed *and* the user has been told the local call failed. Never
+  silently fall back; the whole point of this skill is to keep cost
+  predictable.
+
+The agent's own text reasoning continues to use whatever LLM Cursor / Claude
+Code / Codex is configured with. This skill does not redirect chat tokens;
+it only redirects the multimodal calls that would otherwise leave the
+machine.
+
+## Troubleshooting cheatsheet
+
+| Symptom | Cause | Recovery |
+|---|---|---|
+| `lemonade: command not found` | Server CLI not installed | Install from <https://lemonade-server.ai/install_options.html>; restart shell. |
+| `Server is not running` | Service stopped after install | Windows: launch the **Lemonade** Start Menu shortcut. Linux: `sudo systemctl start lemonade-server`. |
+| `POST /v1/images/generations` returns 404 model not found | Image model not downloaded | `lemonade pull SD-Turbo` and retry. |
+| Image generation is slow on CPU (~4–5 min) | sd-cpp on CPU backend | Install the GPU backend on supported AMD hardware: `lemonade backends install sd-cpp:rocm`. |
+| `POST /v1/audio/transcriptions` returns 400 unsupported format | Input is not 16 kHz mono WAV | Re-encode with `ffmpeg -i in.* -ar 16000 -ac 1 out.wav`. |
+| `POST /v1/audio/speech` returns 404 | TTS model not downloaded | `lemonade pull kokoro-v1`. |
+| 401 Unauthorized on every request | User has set `LEMONADE_API_KEY` | Add `Authorization: Bearer $LEMONADE_API_KEY` to every request and to the rule block. |
+
+## Verification checklist
+
+Mark this skill complete only when **all** of the following are true:
+
+- [ ] `lemonade status --json` reports the server running on port 13305.
+- [ ] `lemonade list --downloaded` shows `SD-Turbo`, `kokoro-v1`, and
+      `Whisper-Tiny`.
+- [ ] The workspace `AGENTS.md` contains the
+      `amd-skills:local-ai-use` block.
+- [ ] All three smoke tests in Step 4 succeed.
+- [ ] On a follow-up turn, asking the agent to "generate an image of X"
+      causes it to POST to `http://localhost:13305/api/v1/images/generations`
+      rather than calling a cloud tool.
+
+If any box is unchecked, the user is still paying cloud cost for at least
+one modality.
+
+---
+
+## Reference
+
+For the full model picker, alternate-quality options, the complete endpoint
+reference, the API-key flow, and the OmniRouter tool definitions you can
+hand to an agent's tool-calling loop, see [reference.md](reference.md).
diff --git a/skills/local-ai-use/reference.md b/skills/local-ai-use/reference.md
new file mode 100644
index 0000000..69e6775
--- /dev/null
+++ b/skills/local-ai-use/reference.md
@@ -0,0 +1,241 @@
+# Local AI Use: Reference
+
+Detailed reference for the `local-ai-use` skill. Read this only when the
+default path in `SKILL.md` doesn't cover a decision.
+
+## Contents
+
+- [Model picker](#model-picker)
+- [Endpoint reference](#endpoint-reference)
+- [API key handling](#api-key-handling)
+- [Hardware-accelerated backends](#hardware-accelerated-backends)
+- [OmniRouter tool definitions](#omnirouter-tool-definitions)
+- [Re-pointing the rule at a remote host](#re-pointing-the-rule-at-a-remote-host)
+- [Removing the rule](#removing-the-rule)
+
+---
+
+## Model picker
+
+The default trio (`SD-Turbo`, `kokoro-v1`, `Whisper-Tiny`) is sized for
+"keeps cost savings real on a typical laptop". Override only if the user
+asks for higher quality or has explicit hardware to spare.
+
+### Image generation (`recipe: sd-cpp`)
+
+| Model | Approx size | When to use | Trade-off |
+|---|---|---|---|
+| `SD-Turbo` | ~5 GB | **Default.** General-purpose, single-step (4-step) generation. | Lower fidelity than SDXL. |
+| `SDXL-Turbo` | ~6.9 GB | When the user notices quality issues with SD-Turbo. | Larger model, slower on CPU. |
+| `SD-1.5` | ~4 GB | When the user asks for "Stable Diffusion 1.5" by name. | Needs more steps (~20). |
+| `Flux-2-Klein-4B` | ~4 GB | Image **editing** (`/v1/images/edits`). | Editing-capable, slower than SD-Turbo for plain generation. |
+
+To upgrade: `lemonade pull <model>`, then change `"model"` in the rule
+block in `AGENTS.md` to the new model id.
+
+### Text-to-speech (`recipe: kokoro`)
+
+| Model | Approx size | When to use |
+|---|---|---|
+| `kokoro-v1` | ~0.3 GB | **Default and only supported model today.** CPU-only, low latency. |
+
+Voices: `shimmer` (default), plus all OpenAI-named voices (`alloy`, `ash`,
+`ballad`, `coral`, `echo`, `fable`, `nova`, `onyx`, `sage`, `verse`) and the
+kokoro-native voices (`af_sky`, `am_echo`, `bf_emma`, `bm_george`, ...).
+Pass `voice` in the request body to override.
+
+### Speech-to-text (`recipe: whispercpp`)
+
+| Model | Approx size | When to use |
+|---|---|---|
+| `Whisper-Tiny` | ~0.1 GB | **Default.** English-only fast path; great for short clips and meeting notes. |
+| `Whisper-Base` | ~0.3 GB | Slightly better accuracy, still tiny. |
+| `Whisper-Small` | ~1.0 GB | Multilingual, modest CPU cost. |
+| `Whisper-Large-v3-Turbo` | ~1.6 GB | Highest quality / latency mix; recommended when accuracy matters. |
+
+Whisper requires 16 kHz mono PCM WAV input. Convert anything else first:
+
+```bash
+ffmpeg -i input.mp3 -ar 16000 -ac 1 input.wav
+```
+
+For full live coverage, run `lemonade list` after starting the server, or
+browse <https://lemonade-server.ai/models.html>.
+
+---
+
+## Endpoint reference
+
+All multimodal endpoints accept the standard OpenAI request shape and
+return the standard OpenAI response shape, so any OpenAI-compatible client
+works (`openai-python`, `openai-node`, `openai-dotnet`, `go-openai`, ...).
+
+| Method | Path | Purpose | Backend |
+|---|---|---|---|
+| `POST` | `/api/v1/images/generations` | text → image (b64) | `sd-cpp` |
+| `POST` | `/api/v1/images/edits` | image + prompt → image | `sd-cpp` |
+| `POST` | `/api/v1/images/variations` | image → varied image | `sd-cpp` |
+| `POST` | `/api/v1/images/upscale` | image → upscaled image | ESRGAN |
+| `POST` | `/api/v1/audio/speech` | text → audio file | `kokoro` |
+| `POST` | `/api/v1/audio/transcriptions` | wav → text | `whispercpp` |
+| `WS`   | `/realtime` | streaming microphone → text | `whispercpp` |
+| `GET`  | `/api/v1/models` | list models (add `?show_all=true` for catalog) | n/a |
+| `GET`  | `/api/v1/health` | readiness probe | n/a |
+
+Notable per-endpoint quirks:
+
+- **`/v1/images/generations`**: only `n=1` and `response_format=b64_json`
+  are supported today. `size` defaults to `512x512`. The model's
+  `image_defaults` (steps / cfg_scale / width / height) returned by
+  `/v1/models/{id}` are the right values to use as your defaults.
+- **`/v1/images/edits`**: `multipart/form-data` (not JSON). `mask` is
+  optional; without one the entire image is the editable region.
+- **`/v1/audio/transcriptions`**: only `wav` input and `json` response are
+  supported today. Non-WAV input must be re-encoded with `ffmpeg`.
+- **`/v1/audio/speech`**: `mp3`, `wav`, `opus`, and `pcm` outputs supported.
+  Streaming requires `stream_format: "audio"`, which only emits `pcm`.
+
+For the full parameter list of any endpoint, see `lemonade/docs/api/openai.md`
+upstream.
+
+---
+
+## API key handling
+
+The system-wide Lemonade Server defaults to **no auth** on `localhost`. If
+the user has set `LEMONADE_API_KEY` (rare for the system-wide flow,
+standard for the embeddable flow), every HTTP request must carry:
+
+```
+Authorization: Bearer ${LEMONADE_API_KEY}
+```
+
+Update both:
+
+1. The shell environment that the agent runs commands in
+   (`export LEMONADE_API_KEY=...`).
+2. The rule block in `AGENTS.md`. Add a sentence near the top of the rule
+   that says "send `Authorization: Bearer $LEMONADE_API_KEY` on every
+   request" — it's already mentioned but worth highlighting.
+
+---
+
+## Hardware-accelerated backends
+
+Default install ships the broad-compatibility backends. To get GPU / NPU
+acceleration on supported AMD hardware:
+
+| Modality | Recipe | Faster backend | Install command |
+|---|---|---|---|
+| Image gen | `sd-cpp` | `rocm` (Strix Halo, RDNA3/4) | `lemonade backends install sd-cpp:rocm` |
+| LLM (separate skill) | `llamacpp` | `rocm` or `vulkan` | `lemonade backends install llamacpp:rocm` |
+| ASR | `whispercpp` | `npu` (XDNA2) | `lemonade backends install whispercpp:npu` |
+| TTS | `kokoro` | `cpu` only | n/a |
+
+After installing a backend, set the corresponding pin in `lemonade config
+set`:
+
+```bash
+lemonade config set sdcpp_backend rocm
+lemonade config set whispercpp_backend npu
+```
+
+These are persisted in `config.json` and apply on the next model load.
+
+---
+
+## OmniRouter tool definitions
+
+If the agent uses an OpenAI-style tool-calling loop (Continue, OpenHands,
+custom code) instead of plain HTTP, register the same endpoints as named
+tools so the LLM can pick them on its own. Lemonade publishes canonical
+tool schemas under `OmniRouter`; the minimum useful set for this skill is:
+
+```json
+[
+  {
+    "type": "function",
+    "function": {
+      "name": "generate_image",
+      "description": "Generate an image from a text prompt using local Lemonade Server.",
+      "parameters": {
+        "type": "object",
+        "properties": {
+          "prompt": {"type": "string"},
+          "size":   {"type": "string", "default": "512x512"},
+          "steps":  {"type": "integer", "default": 4}
+        },
+        "required": ["prompt"]
+      }
+    }
+  },
+  {
+    "type": "function",
+    "function": {
+      "name": "text_to_speech",
+      "description": "Speak the given text aloud using local Lemonade Server.",
+      "parameters": {
+        "type": "object",
+        "properties": {
+          "input": {"type": "string"},
+          "voice": {"type": "string", "default": "shimmer"}
+        },
+        "required": ["input"]
+      }
+    }
+  },
+  {
+    "type": "function",
+    "function": {
+      "name": "transcribe_audio",
+      "description": "Transcribe a WAV audio file using local Lemonade Server.",
+      "parameters": {
+        "type": "object",
+        "properties": {
+          "file_path": {"type": "string"},
+          "language":  {"type": "string"}
+        },
+        "required": ["file_path"]
+      }
+    }
+  }
+]
+```
+
+Map each `tool_call` to the corresponding endpoint:
+
+- `generate_image` → `POST /api/v1/images/generations`
+- `text_to_speech` → `POST /api/v1/audio/speech`
+- `transcribe_audio` → `POST /api/v1/audio/transcriptions`
+
+For the full canonical schema (including `edit_image` and `analyze_image`),
+read `examples/lemonade_tools.py` in the upstream lemonade-sdk repo.
+
+---
+
+## Re-pointing the rule at a remote host
+
+Lemonade can run on another machine (a workstation with a Ryzen AI NPU,
+say) while the agent runs on the laptop. To point this skill at it:
+
+1. Set `LEMONADE_HOST` and `LEMONADE_PORT` (or pass `--host` / `--port` to
+   `setup_local_ai.py`).
+2. Re-run `python scripts/setup_local_ai.py` so the rule block is rewritten
+   with the new endpoint baked in.
+3. Make sure the remote server is bound to a non-loopback interface
+   (`lemonade config set host 0.0.0.0`) and that firewall rules allow
+   inbound 13305. Setting `host` to `0.0.0.0` exposes the server; pair it
+   with `LEMONADE_API_KEY` so it isn't open to the LAN.
+
+---
+
+## Removing the rule
+
+To stop routing locally (e.g., the user wants cloud back), open the
+workspace `AGENTS.md` and delete everything between
+`<!-- BEGIN amd-skills:local-ai-use -->` and
+`<!-- END amd-skills:local-ai-use -->`. The agent picks up the change on
+its next turn.
+
+The downloaded models stay on disk; remove them with `lemonade delete
+<model>` if you want the space back.
diff --git a/skills/local-ai-use/scripts/setup_local_ai.py b/skills/local-ai-use/scripts/setup_local_ai.py
new file mode 100644
index 0000000..552a3f6
--- /dev/null
+++ b/skills/local-ai-use/scripts/setup_local_ai.py
@@ -0,0 +1,266 @@
+#!/usr/bin/env -S uv run --quiet
+# /// script
+# requires-python = ">=3.10"
+# dependencies = []
+# ///
+"""One-shot setup for the `local-ai-use` skill.
+
+Performs the three setup steps from SKILL.md:
+
+  1. Confirms the system-wide Lemonade Server is installed and reachable on
+     http://localhost:13305 (override with --host / --port or LEMONADE_HOST /
+     LEMONADE_PORT).
+  2. Pulls the three default modality models if they are missing
+     (image: SD-Turbo, TTS: kokoro-v1, STT: Whisper-Tiny).
+  3. Writes the routing rule from `templates/local-ai-rule.md` into
+     <workspace>/AGENTS.md, between stable BEGIN/END markers so re-runs
+     replace the block in place rather than appending.
+
+The script is idempotent: a second run on a fully configured workspace only
+re-runs the healthcheck. It exits non-zero on any unrecoverable failure.
+
+Constants are documented inline; nothing is magical.
+"""
+
+from __future__ import annotations
+
+import argparse
+import json
+import os
+import shutil
+import subprocess
+import sys
+import urllib.error
+import urllib.request
+from pathlib import Path
+
+# Defaults match the system-wide Lemonade Server install. Both the CLI
+# (LEMONADE_HOST / LEMONADE_PORT) and the OpenAI-compatible HTTP endpoints
+# bind to these by default.
+DEFAULT_HOST = "127.0.0.1"
+DEFAULT_PORT = 13305
+
+# The Lite Collection from Lemonade OmniRouter. Picked because each fits in
+# under ~5 GB and runs on commodity CPU hardware, so the savings vs. cloud
+# calls are real on a typical developer laptop. See SKILL.md for upgrade
+# paths.
+DEFAULT_MODELS = ("SD-Turbo", "kokoro-v1", "Whisper-Tiny")
+
+# Stable markers around the rule block in AGENTS.md. The script rewrites the
+# region between these markers in place; do not change the marker strings or
+# every existing AGENTS.md will get a duplicate block on the next run.
+BEGIN_MARKER = "<!-- BEGIN amd-skills:local-ai-use -->"
+END_MARKER = "<!-- END amd-skills:local-ai-use -->"
+
+SKILL_DIR = Path(__file__).resolve().parent.parent
+RULE_TEMPLATE = SKILL_DIR / "templates" / "local-ai-rule.md"
+
+INSTALL_URL = "https://lemonade-server.ai/install_options.html"
+
+
+def _print(msg: str) -> None:
+    """Single-line, prefix-tagged status print so the agent's output stays parseable."""
+    print(f"[local-ai-use] {msg}", flush=True)
+
+
+def _http_get(url: str, timeout_s: float) -> tuple[int, bytes]:
+    req = urllib.request.Request(url)
+    with urllib.request.urlopen(req, timeout=timeout_s) as r:  # noqa: S310
+        return r.status, r.read()
+
+
+def check_cli_installed() -> bool:
+    """Return True if the `lemonade` CLI is on PATH."""
+    return shutil.which("lemonade") is not None
+
+
+def check_server_reachable(host: str, port: int) -> bool:
+    """Return True if /api/v1/health responds 200 within 3 seconds."""
+    url = f"http://{host}:{port}/api/v1/health"
+    try:
+        status, _ = _http_get(url, timeout_s=3.0)
+        return status == 200
+    except (urllib.error.URLError, OSError):
+        return False
+
+
+def list_downloaded_models() -> set[str]:
+    """Return the set of locally downloaded model IDs.
+
+    Uses `lemonade list --downloaded` (CLI) and falls back to
+    GET /api/v1/models when the CLI lacks the flag. Returning an empty set is
+    treated as "could not determine" by the caller, which still attempts the
+    pulls; `lemonade pull` is itself idempotent.
+    """
+    try:
+        out = subprocess.run(
+            ["lemonade", "list", "--downloaded", "--json"],
+            check=True, capture_output=True, text=True, timeout=10,
+        ).stdout
+        data = json.loads(out)
+        return {m.get("id", "") for m in data if isinstance(m, dict)}
+    except (subprocess.SubprocessError, json.JSONDecodeError, FileNotFoundError):
+        pass
+
+    try:
+        status, body = _http_get("http://127.0.0.1:13305/api/v1/models", timeout_s=5)
+        if status == 200:
+            data = json.loads(body)
+            return {
+                m.get("id", "") for m in data.get("data", [])
+                if isinstance(m, dict) and m.get("downloaded")
+            }
+    except (urllib.error.URLError, OSError, json.JSONDecodeError):
+        pass
+
+    return set()
+
+
+def pull_model(model: str) -> bool:
+    """Run `lemonade pull <model>`. Returns True on success."""
+    _print(f"pulling {model}...")
+    try:
+        subprocess.run(
+            ["lemonade", "pull", model],
+            check=True,
+            # Stream output so the user sees the download progress instead of
+            # staring at a frozen prompt; SD-Turbo is several GB.
+            stdout=None, stderr=None,
+            # SD-Turbo is the largest pull at ~5 GB. 30 minutes is generous
+            # for a slow connection; below that we'd false-positive on real
+            # downloads.
+            timeout=30 * 60,
+        )
+        return True
+    except subprocess.CalledProcessError as exc:
+        _print(f"pull failed for {model} (exit {exc.returncode})")
+        return False
+    except subprocess.TimeoutExpired:
+        _print(f"pull timed out for {model} after 30 minutes")
+        return False
+
+
+def render_rule_block() -> str:
+    """Read the rule template; pass through unchanged.
+
+    The template already includes BEGIN/END markers and matches the constants
+    at the top of this file. We re-validate that here so a future template
+    edit cannot silently drift away from the markers the writer relies on.
+    """
+    if not RULE_TEMPLATE.exists():
+        raise FileNotFoundError(
+            f"Rule template missing: {RULE_TEMPLATE}. "
+            "Did the skill folder get partially copied?"
+        )
+    text = RULE_TEMPLATE.read_text(encoding="utf-8")
+    if BEGIN_MARKER not in text or END_MARKER not in text:
+        raise ValueError(
+            "Rule template is missing the BEGIN/END markers; refuse to write "
+            "AGENTS.md because re-runs would append duplicate blocks."
+        )
+    return text.strip() + "\n"
+
+
+def upsert_agents_md(workspace: Path) -> Path:
+    """Write or replace the rule block inside <workspace>/AGENTS.md."""
+    target = workspace / "AGENTS.md"
+    block = render_rule_block()
+
+    if not target.exists():
+        target.write_text(
+            "# Agent instructions\n\n"
+            "Project-scoped rules picked up automatically by Cursor, Claude Code,\n"
+            "Codex, Gemini CLI, and other AGENTS.md-aware coding agents.\n\n"
+            f"{block}",
+            encoding="utf-8",
+        )
+        _print(f"created {target}")
+        return target
+
+    existing = target.read_text(encoding="utf-8")
+    if BEGIN_MARKER in existing and END_MARKER in existing:
+        before, _, rest = existing.partition(BEGIN_MARKER)
+        _, _, after = rest.partition(END_MARKER)
+        # Strip trailing newline noise around the spliced region so we don't
+        # accumulate blank lines on every re-run.
+        new = before.rstrip() + "\n\n" + block + after.lstrip()
+        if new == existing:
+            _print(f"AGENTS.md rule already up to date at {target}")
+            return target
+        target.write_text(new, encoding="utf-8")
+        _print(f"updated rule block in {target}")
+        return target
+
+    # No existing block: append with a separating blank line.
+    if not existing.endswith("\n"):
+        existing += "\n"
+    target.write_text(existing + "\n" + block, encoding="utf-8")
+    _print(f"appended rule block to {target}")
+    return target
+
+
+def main(argv: list[str] | None = None) -> int:
+    parser = argparse.ArgumentParser(description=__doc__)
+    parser.add_argument(
+        "--workspace",
+        type=Path,
+        default=Path.cwd(),
+        help="Workspace root where AGENTS.md should be written (default: cwd).",
+    )
+    parser.add_argument(
+        "--host",
+        default=os.environ.get("LEMONADE_HOST", DEFAULT_HOST),
+        help="Lemonade Server host (default: 127.0.0.1 / $LEMONADE_HOST).",
+    )
+    parser.add_argument(
+        "--port",
+        type=int,
+        default=int(os.environ.get("LEMONADE_PORT", str(DEFAULT_PORT))),
+        help="Lemonade Server port (default: 13305 / $LEMONADE_PORT).",
+    )
+    parser.add_argument(
+        "--skip-pull",
+        action="store_true",
+        help="Do not pull missing models; just verify and write AGENTS.md.",
+    )
+    args = parser.parse_args(argv)
+
+    if not check_cli_installed():
+        _print("FAIL: `lemonade` is not on PATH.")
+        _print(f"Install Lemonade Server first: {INSTALL_URL}")
+        return 2
+
+    if not check_server_reachable(args.host, args.port):
+        _print(
+            f"FAIL: Lemonade Server is not responding at "
+            f"http://{args.host}:{args.port}/api/v1/health."
+        )
+        _print(
+            "Start it: on Windows launch the Lemonade Start Menu shortcut; "
+            "on Linux run `sudo systemctl start lemonade-server`."
+        )
+        return 3
+
+    _print(f"server reachable at http://{args.host}:{args.port}")
+
+    if not args.skip_pull:
+        downloaded = list_downloaded_models()
+        for model in DEFAULT_MODELS:
+            if model in downloaded:
+                _print(f"already downloaded: {model}")
+                continue
+            if not pull_model(model):
+                # Surface the failure but keep going so the user at least gets
+                # the rule installed for the modalities that did succeed.
+                _print(
+                    f"continuing without {model}; the rule will reference it "
+                    "but calls will 404 until you pull it."
+                )
+
+    upsert_agents_md(args.workspace.resolve())
+    _print("done. Future image, TTS, and STT requests now route to local Lemonade.")
+    return 0
+
+
+if __name__ == "__main__":
+    raise SystemExit(main())
diff --git a/skills/local-ai-use/templates/local-ai-rule.md b/skills/local-ai-use/templates/local-ai-rule.md
new file mode 100644
index 0000000..75c75e6
--- /dev/null
+++ b/skills/local-ai-use/templates/local-ai-rule.md
@@ -0,0 +1,89 @@
+<!-- BEGIN amd-skills:local-ai-use -->
+## Local AI Use (managed by amd-skills/local-ai-use)
+
+This workspace runs a local [Lemonade Server](https://lemonade-server.ai) at
+`http://localhost:13305/api/v1` to handle image generation, text-to-speech,
+and speech-to-text. Cloud APIs cost real money and tokens for these
+modalities, so use the local endpoints below for **every** such request
+unless the user explicitly asks for a cloud provider.
+
+### Routing rules
+
+| If the user asks you to... | Do this | Do not do this |
+|---|---|---|
+| Generate, draw, or create an image | `POST http://localhost:13305/api/v1/images/generations` | Use the built-in `GenerateImage` tool. Call DALL-E, Midjourney, Stable Diffusion via cloud, or any other paid image API. |
+| Edit or modify an existing image | `POST http://localhost:13305/api/v1/images/edits` (multipart) | Cloud image-edit APIs. |
+| Speak text aloud or generate audio narration | `POST http://localhost:13305/api/v1/audio/speech` | Call OpenAI TTS, ElevenLabs, PlayHT, or any other cloud TTS. |
+| Transcribe an audio file | `POST http://localhost:13305/api/v1/audio/transcriptions` | Call OpenAI Whisper-as-a-service, AssemblyAI, Deepgram, or any other cloud STT. |
+
+Plain text chat, code generation, and reasoning continue to use the agent's
+configured LLM. This rule only redirects the multimodal calls.
+
+### Defaults to use
+
+| Endpoint | Model | Notes |
+|---|---|---|
+| `/v1/images/generations` | `SD-Turbo` | 4 steps, `cfg_scale: 1.0`, `512x512`, `response_format: "b64_json"`. |
+| `/v1/audio/speech` | `kokoro-v1` | Default voice `shimmer`; `response_format: "mp3"`. |
+| `/v1/audio/transcriptions` | `Whisper-Tiny` | Input must be 16 kHz mono WAV. Re-encode with `ffmpeg -i in.* -ar 16000 -ac 1 out.wav`. |
+
+If `LEMONADE_API_KEY` is set in the environment, send
+`Authorization: Bearer $LEMONADE_API_KEY` on every request. Otherwise the
+loopback server accepts unauthenticated calls.
+
+### Ready-to-use call patterns
+
+**Image generation** (saves to `out.png`):
+
+```bash
+curl -sX POST http://localhost:13305/api/v1/images/generations \
+  -H "Content-Type: application/json" \
+  -d '{"model":"SD-Turbo","prompt":"PROMPT_HERE","size":"512x512","steps":4,"response_format":"b64_json"}' \
+  | python -c "import sys,json,base64; open('out.png','wb').write(base64.b64decode(json.load(sys.stdin)['data'][0]['b64_json']))"
+```
+
+Equivalent Python via the OpenAI SDK:
+
+```python
+from openai import OpenAI
+import base64
+client = OpenAI(base_url="http://localhost:13305/api/v1", api_key="lemonade")
+r = client.images.generate(model="SD-Turbo", prompt="PROMPT_HERE", size="512x512")
+open("out.png", "wb").write(base64.b64decode(r.data[0].b64_json))
+```
+
+**Text-to-speech** (saves to `out.mp3`):
+
+```bash
+curl -sX POST http://localhost:13305/api/v1/audio/speech \
+  -H "Content-Type: application/json" \
+  -d '{"model":"kokoro-v1","input":"TEXT_HERE","voice":"shimmer","response_format":"mp3"}' \
+  -o out.mp3
+```
+
+**Speech-to-text** (returns JSON `{"text": "..."}`):
+
+```bash
+ffmpeg -y -i INPUT_AUDIO -ar 16000 -ac 1 _stt.wav
+curl -sX POST http://localhost:13305/api/v1/audio/transcriptions \
+  -F "file=@_stt.wav" -F "model=Whisper-Tiny"
+```
+
+### Failure handling
+
+1. Try the local endpoint exactly once.
+2. If the server is unreachable, run `lemonade status` and surface the
+   result to the user before doing anything else.
+3. If the model is missing, run `lemonade pull <model>` and retry once.
+4. Only after that, ask the user before falling back to a cloud provider.
+   Never silently fall back; the whole point of this rule is predictable
+   cost.
+
+### Re-pointing to a different host
+
+If the user runs Lemonade on a different host or port, replace the
+`http://localhost:13305` prefix everywhere above with their endpoint, and
+update `LEMONADE_HOST` / `LEMONADE_PORT` in the shell environment so the
+`lemonade` CLI matches.
+
+<!-- END amd-skills:local-ai-use -->