amd · danielholanda · May 4, 2026 · May 4, 2026
diff --git a/README.md b/README.md
@@ -65,6 +65,7 @@ Embed AMD-optimized AI into end-user applications.
 | Skill | What it does |
 | --- | --- |
 | `local-ai-app-integration` | Add private, on-device AI to apps that use OpenAI, Anthropic, or Ollama APIs by bundling Embeddable Lemonade as a subprocess. |
+| `local-ai-use` | Apply a Lemonade-first strategy so agents default to local image generation, text-to-speech, and speech-to-text to reduce token/cost usage before any cloud fallback. |
 
 ### Cross-stack porting
 
@@ -226,7 +227,7 @@ See [AUTHORING.md](AUTHORING.md) for the full authoring guide, including when a
 
 ## Status
 
-This repository is in its early days. The first in-repo skill, `skills/local-ai-app-integration/`, is available now and seeds the **Application integration** focus area. The Hardware-native, Cross-stack porting, and Profiling and delivery focus areas are being built out incrementally alongside manifests and CI. Expect rapid iteration. File an issue if there is a workflow you want covered, or open a PR with a skill you have been wanting to share.
+This repository is in its early days. In-repo skills include `skills/local-ai-app-integration/` and `skills/local-ai-use/`, seeding the **Application integration** focus area. The Hardware-native, Cross-stack porting, and Profiling and delivery focus areas are being built out incrementally alongside manifests and CI. Expect rapid iteration. File an issue if there is a workflow you want covered, or open a PR with a skill you have been wanting to share.
 
 ## License
 

diff --git a/skills/local-ai-use/SKILL.md b/skills/local-ai-use/SKILL.md
@@ -0,0 +1,256 @@
+---
+name: local-ai-use
+description: >-
+  Routes image generation, text-to-speech, and speech-to-text through a local
+  Lemonade Server so the agent stops paying for cloud calls on those modalities.
+  Use when the user wants to save tokens, save cost, or save money using local
+  AI; default to local, offline, on-device, or private image generation,
+  transcription, or text-to-speech in this workspace; stop using DALL-E,
+  Whisper-as-a-service, ElevenLabs, or other paid multimodal APIs; route the
+  agent's image, TTS, or STT tool calls to a local model; or mentions Lemonade
+  Server, OmniRouter, SD-Turbo, kokoro, Whisper, Ryzen AI, NPU/iGPU/dGPU
+  inference, or "use local for images but cloud for chat". Run once per
+  workspace; the rule it installs handles every later request.
+---
+
+# Local AI Use (route image, TTS, STT through Lemonade)
+
+This is a **meta-skill**. You run it once. After that, every later request that
+needs image generation, text-to-speech, or speech-to-text uses the local
+[Lemonade Server](https://lemonade-server.ai) instead of a cloud API. The
+agent's own LLM keeps handling text; only the expensive multimodal calls move
+on-device.
+
+The skill does two things:
+
+1. **Verifies that local Lemonade is reachable and has the right models.**
+2. **Drops a `Local AI Use` block into the workspace `AGENTS.md`** so the agent
+   reads the routing rule on every later turn, in Cursor, Claude Code, Codex,
+   Gemini CLI, and any other agent that respects `AGENTS.md`.
+
+## When to use this skill
+
+Use this skill when **all** of the following are true:
+
+- The user has, or is willing to install, the system-wide Lemonade Server.
+- The user accepts the default Lemonade endpoint `http://localhost:13305`.
+- The user wants the change to be **persistent** across future turns and
+  agent restarts (the rule is written to disk).
+
+If the user is instead **embedding** Lemonade as a private subprocess inside
+an app installer, do not use this skill; use `local-ai-app-integration`
+instead.
+
+## Prerequisites
+
+- **OS:** Windows 11 x64, Ubuntu/Debian x64, or macOS (beta).
+- **Lemonade Server CLI on `PATH`:** verify with `lemonade --version`. If
+  missing, install from <https://lemonade-server.ai/install_options.html>
+  before continuing. Do not silently install on the user's machine; that is a
+  system-wide change and must be the user's call.
+- **Disk:** ~8 GB free for the three default models (SD-Turbo + Whisper-Tiny
+  + kokoro-v1).
+- **Network:** required for the first `lemonade pull` of each model. After
+  that, every modality runs offline.
+
+## The opinionated path
+
+Run this checklist top to bottom. Track progress against it; do not move on
+until each step verifies.
+
+```
+[ ] 1. Confirm Lemonade Server is installed and reachable
+[ ] 2. Pull the three default modality models
+[ ] 3. Install the routing rule into the workspace AGENTS.md
+[ ] 4. Smoke-test image, TTS, and STT against the local endpoint
+```
+
+The single command that does steps 1, 2, and 3 in one shot is:
+
+```bash
+python scripts/setup_local_ai.py
+```
+
+(Run from this skill's folder.) The script is idempotent: re-running it on a
+fully configured workspace is a no-op apart from a healthcheck. Read the
+sections below for what to do when each step fails.
+
+---
+
+## Step 1: confirm Lemonade Server is reachable
+
+Run:
+
+```bash
+lemonade status --json
+```
+
+Two acceptable outcomes:
+
+| `lemonade status` says | Action |
+|---|---|
+| `Server is running on port 13305` | Continue to Step 2. |
+| `Server is not running` | Start it. On Windows, launch the **Lemonade** Start Menu shortcut. On Linux, run `sudo systemctl start lemonade-server`. Re-check `lemonade status`. |
+
+If `lemonade` is not on `PATH` at all, the server is not installed. Stop and
+point the user at <https://lemonade-server.ai/install_options.html>. Do not
+attempt a silent install.
+
+The rest of this skill assumes the endpoint is `http://localhost:13305/api/v1`
+and no API key is required (the system-wide server defaults to no auth on
+loopback). If the user has set `LEMONADE_API_KEY`, the routing rule template
+in `templates/local-ai-rule.md` shows where to add the `Authorization` header.
+
+## Step 2: pull the three default modality models
+
+Pull these three. They are the **Lite Collection** defaults from Lemonade
+OmniRouter, sized to keep token-and-cost savings real on commodity hardware:
+
+| Modality | Model | Size | Why this default |
+|---|---|---|---|
+| Image generation | `SD-Turbo` | ~5 GB | Single-step generation, runs on CPU and AMD iGPU/dGPU |
+| Text-to-speech | `kokoro-v1` | ~0.3 GB | Only TTS model Lemonade currently supports; CPU-only, low latency |
+| Speech-to-text | `Whisper-Tiny` | ~0.1 GB | Smallest Whisper; fast on CPU. Upgrade to `Whisper-Large-v3-Turbo` if accuracy matters more than latency. |
+
+```bash
+lemonade pull SD-Turbo
+lemonade pull kokoro-v1
+lemonade pull Whisper-Tiny
+```
+
+Each `pull` is idempotent. To verify what is already downloaded:
+
+```bash
+lemonade list --downloaded
+```
+
+For coverage of larger / higher-quality alternatives (`SDXL-Turbo`,
+`Flux-2-Klein-4B`, `Whisper-Large-v3-Turbo`), see the
+[model picker in reference.md](reference.md#model-picker).
+
+## Step 3: install the routing rule into AGENTS.md
+
+The rule is a Markdown block stored in [`templates/local-ai-rule.md`](templates/local-ai-rule.md).
+Append it to the workspace's `AGENTS.md` (create the file if missing). Both
+Cursor and Claude Code load `AGENTS.md` automatically on every turn, so the
+agent will see the rule on its next message without any further setup.
+
+`scripts/setup_local_ai.py` does this for you, surrounded by stable markers
+so re-running the script replaces the block in place rather than appending
+a second copy. The markers look like:
+
+```
+<!-- BEGIN amd-skills:local-ai-use -->
+...rule...
+<!-- END amd-skills:local-ai-use -->
+```
+
+If you write the file by hand, keep those exact markers. The script relies
+on them for idempotent updates.
+
+If the user's agent only respects a different convention, mirror the same
+block to:
+
+- `CLAUDE.md` (Claude Code, project-scoped) or `~/.claude/CLAUDE.md` (global)
+- `.cursor/rules/local-ai-use.mdc` (Cursor user/project rules)
+- `GEMINI.md` (Gemini CLI)
+
+The rule's content is identical; only the file location changes.
+
+## Step 4: smoke-test the three modalities
+
+Verify each modality against the live server before declaring success. These
+mirror the inline patterns in the installed rule, so a green pass here means
+the rule will work.
+
+**Image generation** (writes `out.png`):
+
+```bash
+curl -sX POST http://localhost:13305/api/v1/images/generations \
+  -H "Content-Type: application/json" \
+  -d '{"model":"SD-Turbo","prompt":"a single red apple on a white table","size":"512x512","steps":4,"response_format":"b64_json"}' \
+  | python -c "import sys,json,base64; open('out.png','wb').write(base64.b64decode(json.load(sys.stdin)['data'][0]['b64_json']))"
+```
+
+**Text-to-speech** (writes `out.mp3`):
+
+```bash
+curl -sX POST http://localhost:13305/api/v1/audio/speech \
+  -H "Content-Type: application/json" \
+  -d '{"model":"kokoro-v1","input":"Local AI is now active.","response_format":"mp3"}' \
+  -o out.mp3
+```
+
+**Speech-to-text** (round-trips `out.mp3` → text via a wav re-encode):
+
+```bash
+ffmpeg -y -i out.mp3 -ar 16000 -ac 1 out.wav
+curl -sX POST http://localhost:13305/api/v1/audio/transcriptions \
+  -F "file=@out.wav" -F "model=Whisper-Tiny"
+```
+
+If any of the three returns a non-2xx status, fix it now. The rule we just
+installed sends future requests to these same endpoints, so a broken endpoint
+becomes a broken user experience.
+
+---
+
+## What changes after this skill runs
+
+From the next turn onward, the agent reads the rule in `AGENTS.md` on every
+message. The rule explicitly tells the agent:
+
+- **For image generation:** call `POST /api/v1/images/generations` on the
+  local server. Do **not** call any cloud image API and do **not** use the
+  built-in `GenerateImage` tool (that path bills tokens to the cloud
+  provider).
+- **For text-to-speech:** call `POST /api/v1/audio/speech`. Do **not** call
+  cloud TTS providers (OpenAI TTS, ElevenLabs, etc.).
+- **For speech-to-text:** call `POST /api/v1/audio/transcriptions`. Do
+  **not** call cloud transcription providers.
+- **Fallback:** only fall back to a cloud API after one local attempt has
+  failed *and* the user has been told the local call failed. Never
+  silently fall back; the whole point of this skill is to keep cost
+  predictable.
+
+The agent's own text reasoning continues to use whatever LLM Cursor / Claude
+Code / Codex is configured with. This skill does not redirect chat tokens;
+it only redirects the multimodal calls that would otherwise leave the
+machine.
+
+## Troubleshooting cheatsheet
+
+| Symptom | Cause | Recovery |
+|---|---|---|
+| `lemonade: command not found` | Server CLI not installed | Install from <https://lemonade-server.ai/install_options.html>; restart shell. |
+| `Server is not running` | Service stopped after install | Windows: launch the **Lemonade** Start Menu shortcut. Linux: `sudo systemctl start lemonade-server`. |
+| `POST /v1/images/generations` returns 404 model not found | Image model not downloaded | `lemonade pull SD-Turbo` and retry. |
+| Image generation is slow on CPU (~4–5 min) | sd-cpp on CPU backend | Install the GPU backend on supported AMD hardware: `lemonade backends install sd-cpp:rocm`. |
+| `POST /v1/audio/transcriptions` returns 400 unsupported format | Input is not 16 kHz mono WAV | Re-encode with `ffmpeg -i in.* -ar 16000 -ac 1 out.wav`. |
+| `POST /v1/audio/speech` returns 404 | TTS model not downloaded | `lemonade pull kokoro-v1`. |
+| 401 Unauthorized on every request | User has set `LEMONADE_API_KEY` | Add `Authorization: Bearer $LEMONADE_API_KEY` to every request and to the rule block. |
+
+## Verification checklist
+
+Mark this skill complete only when **all** of the following are true:
+
+- [ ] `lemonade status --json` reports the server running on port 13305.
+- [ ] `lemonade list --downloaded` shows `SD-Turbo`, `kokoro-v1`, and
+      `Whisper-Tiny`.
+- [ ] The workspace `AGENTS.md` contains the
+      `amd-skills:local-ai-use` block.
+- [ ] All three smoke tests in Step 4 succeed.
+- [ ] On a follow-up turn, asking the agent to "generate an image of X"
+      causes it to POST to `http://localhost:13305/api/v1/images/generations`
+      rather than calling a cloud tool.
+
+If any box is unchecked, the user is still paying cloud cost for at least
+one modality.
+
+---
+
+## Reference
+
+For the full model picker, alternate-quality options, the complete endpoint
+reference, the API-key flow, and the OmniRouter tool definitions you can
+hand to an agent's tool-calling loop, see [reference.md](reference.md).