|
| 1 | +--- |
| 2 | +name: local-ai-use |
| 3 | +description: >- |
| 4 | + Routes image generation, text-to-speech, and speech-to-text through a local |
| 5 | + Lemonade Server so the agent stops paying for cloud calls on those modalities. |
| 6 | + Use when the user wants to save tokens, save cost, or save money using local |
| 7 | + AI; default to local, offline, on-device, or private image generation, |
| 8 | + transcription, or text-to-speech in this workspace; stop using DALL-E, |
| 9 | + Whisper-as-a-service, ElevenLabs, or other paid multimodal APIs; route the |
| 10 | + agent's image, TTS, or STT tool calls to a local model; or mentions Lemonade |
| 11 | + Server, OmniRouter, SD-Turbo, kokoro, Whisper, Ryzen AI, NPU/iGPU/dGPU |
| 12 | + inference, or "use local for images but cloud for chat". Run once per |
| 13 | + workspace; the rule it installs handles every later request. |
| 14 | +--- |
| 15 | + |
| 16 | +# Local AI Use (route image, TTS, STT through Lemonade) |
| 17 | + |
| 18 | +This is a **meta-skill**. You run it once. After that, every later request that |
| 19 | +needs image generation, text-to-speech, or speech-to-text uses the local |
| 20 | +[Lemonade Server](https://lemonade-server.ai) instead of a cloud API. The |
| 21 | +agent's own LLM keeps handling text; only the expensive multimodal calls move |
| 22 | +on-device. |
| 23 | + |
| 24 | +The skill does two things: |
| 25 | + |
| 26 | +1. **Verifies that local Lemonade is reachable and has the right models.** |
| 27 | +2. **Drops a `Local AI Use` block into the workspace `AGENTS.md`** so the agent |
| 28 | + reads the routing rule on every later turn, in Cursor, Claude Code, Codex, |
| 29 | + Gemini CLI, and any other agent that respects `AGENTS.md`. |
| 30 | + |
| 31 | +## When to use this skill |
| 32 | + |
| 33 | +Use this skill when **all** of the following are true: |
| 34 | + |
| 35 | +- The user has, or is willing to install, the system-wide Lemonade Server. |
| 36 | +- The user accepts the default Lemonade endpoint `http://localhost:13305`. |
| 37 | +- The user wants the change to be **persistent** across future turns and |
| 38 | + agent restarts (the rule is written to disk). |
| 39 | + |
| 40 | +If the user is instead **embedding** Lemonade as a private subprocess inside |
| 41 | +an app installer, do not use this skill; use `local-ai-app-integration` |
| 42 | +instead. |
| 43 | + |
| 44 | +## Prerequisites |
| 45 | + |
| 46 | +- **OS:** Windows 11 x64, Ubuntu/Debian x64, or macOS (beta). |
| 47 | +- **Lemonade Server CLI on `PATH`:** verify with `lemonade --version`. If |
| 48 | + missing, install from <https://lemonade-server.ai/install_options.html> |
| 49 | + before continuing. Do not silently install on the user's machine; that is a |
| 50 | + system-wide change and must be the user's call. |
| 51 | +- **Disk:** ~8 GB free for the three default models (SD-Turbo + Whisper-Tiny |
| 52 | + + kokoro-v1). |
| 53 | +- **Network:** required for the first `lemonade pull` of each model. After |
| 54 | + that, every modality runs offline. |
| 55 | + |
| 56 | +## The opinionated path |
| 57 | + |
| 58 | +Run this checklist top to bottom. Track progress against it; do not move on |
| 59 | +until each step verifies. |
| 60 | + |
| 61 | +``` |
| 62 | +[ ] 1. Confirm Lemonade Server is installed and reachable |
| 63 | +[ ] 2. Pull the three default modality models |
| 64 | +[ ] 3. Install the routing rule into the workspace AGENTS.md |
| 65 | +[ ] 4. Smoke-test image, TTS, and STT against the local endpoint |
| 66 | +``` |
| 67 | + |
| 68 | +The single command that does steps 1, 2, and 3 in one shot is: |
| 69 | + |
| 70 | +```bash |
| 71 | +python scripts/setup_local_ai.py |
| 72 | +``` |
| 73 | + |
| 74 | +(Run from this skill's folder.) The script is idempotent: re-running it on a |
| 75 | +fully configured workspace is a no-op apart from a healthcheck. Read the |
| 76 | +sections below for what to do when each step fails. |
| 77 | + |
| 78 | +--- |
| 79 | + |
| 80 | +## Step 1: confirm Lemonade Server is reachable |
| 81 | + |
| 82 | +Run: |
| 83 | + |
| 84 | +```bash |
| 85 | +lemonade status --json |
| 86 | +``` |
| 87 | + |
| 88 | +Two acceptable outcomes: |
| 89 | + |
| 90 | +| `lemonade status` says | Action | |
| 91 | +|---|---| |
| 92 | +| `Server is running on port 13305` | Continue to Step 2. | |
| 93 | +| `Server is not running` | Start it. On Windows, launch the **Lemonade** Start Menu shortcut. On Linux, run `sudo systemctl start lemonade-server`. Re-check `lemonade status`. | |
| 94 | + |
| 95 | +If `lemonade` is not on `PATH` at all, the server is not installed. Stop and |
| 96 | +point the user at <https://lemonade-server.ai/install_options.html>. Do not |
| 97 | +attempt a silent install. |
| 98 | + |
| 99 | +The rest of this skill assumes the endpoint is `http://localhost:13305/api/v1` |
| 100 | +and no API key is required (the system-wide server defaults to no auth on |
| 101 | +loopback). If the user has set `LEMONADE_API_KEY`, the routing rule template |
| 102 | +in `templates/local-ai-rule.md` shows where to add the `Authorization` header. |
| 103 | + |
| 104 | +## Step 2: pull the three default modality models |
| 105 | + |
| 106 | +Pull these three. They are the **Lite Collection** defaults from Lemonade |
| 107 | +OmniRouter, sized to keep token-and-cost savings real on commodity hardware: |
| 108 | + |
| 109 | +| Modality | Model | Size | Why this default | |
| 110 | +|---|---|---|---| |
| 111 | +| Image generation | `SD-Turbo` | ~5 GB | Single-step generation, runs on CPU and AMD iGPU/dGPU | |
| 112 | +| Text-to-speech | `kokoro-v1` | ~0.3 GB | Only TTS model Lemonade currently supports; CPU-only, low latency | |
| 113 | +| Speech-to-text | `Whisper-Tiny` | ~0.1 GB | Smallest Whisper; fast on CPU. Upgrade to `Whisper-Large-v3-Turbo` if accuracy matters more than latency. | |
| 114 | + |
| 115 | +```bash |
| 116 | +lemonade pull SD-Turbo |
| 117 | +lemonade pull kokoro-v1 |
| 118 | +lemonade pull Whisper-Tiny |
| 119 | +``` |
| 120 | + |
| 121 | +Each `pull` is idempotent. To verify what is already downloaded: |
| 122 | + |
| 123 | +```bash |
| 124 | +lemonade list --downloaded |
| 125 | +``` |
| 126 | + |
| 127 | +For coverage of larger / higher-quality alternatives (`SDXL-Turbo`, |
| 128 | +`Flux-2-Klein-4B`, `Whisper-Large-v3-Turbo`), see the |
| 129 | +[model picker in reference.md](reference.md#model-picker). |
| 130 | + |
| 131 | +## Step 3: install the routing rule into AGENTS.md |
| 132 | + |
| 133 | +The rule is a Markdown block stored in [`templates/local-ai-rule.md`](templates/local-ai-rule.md). |
| 134 | +Append it to the workspace's `AGENTS.md` (create the file if missing). Both |
| 135 | +Cursor and Claude Code load `AGENTS.md` automatically on every turn, so the |
| 136 | +agent will see the rule on its next message without any further setup. |
| 137 | + |
| 138 | +`scripts/setup_local_ai.py` does this for you, surrounded by stable markers |
| 139 | +so re-running the script replaces the block in place rather than appending |
| 140 | +a second copy. The markers look like: |
| 141 | + |
| 142 | +``` |
| 143 | +<!-- BEGIN amd-skills:local-ai-use --> |
| 144 | +...rule... |
| 145 | +<!-- END amd-skills:local-ai-use --> |
| 146 | +``` |
| 147 | + |
| 148 | +If you write the file by hand, keep those exact markers. The script relies |
| 149 | +on them for idempotent updates. |
| 150 | + |
| 151 | +If the user's agent only respects a different convention, mirror the same |
| 152 | +block to: |
| 153 | + |
| 154 | +- `CLAUDE.md` (Claude Code, project-scoped) or `~/.claude/CLAUDE.md` (global) |
| 155 | +- `.cursor/rules/local-ai-use.mdc` (Cursor user/project rules) |
| 156 | +- `GEMINI.md` (Gemini CLI) |
| 157 | + |
| 158 | +The rule's content is identical; only the file location changes. |
| 159 | + |
| 160 | +## Step 4: smoke-test the three modalities |
| 161 | + |
| 162 | +Verify each modality against the live server before declaring success. These |
| 163 | +mirror the inline patterns in the installed rule, so a green pass here means |
| 164 | +the rule will work. |
| 165 | + |
| 166 | +**Image generation** (writes `out.png`): |
| 167 | + |
| 168 | +```bash |
| 169 | +curl -sX POST http://localhost:13305/api/v1/images/generations \ |
| 170 | + -H "Content-Type: application/json" \ |
| 171 | + -d '{"model":"SD-Turbo","prompt":"a single red apple on a white table","size":"512x512","steps":4,"response_format":"b64_json"}' \ |
| 172 | + | python -c "import sys,json,base64; open('out.png','wb').write(base64.b64decode(json.load(sys.stdin)['data'][0]['b64_json']))" |
| 173 | +``` |
| 174 | + |
| 175 | +**Text-to-speech** (writes `out.mp3`): |
| 176 | + |
| 177 | +```bash |
| 178 | +curl -sX POST http://localhost:13305/api/v1/audio/speech \ |
| 179 | + -H "Content-Type: application/json" \ |
| 180 | + -d '{"model":"kokoro-v1","input":"Local AI is now active.","response_format":"mp3"}' \ |
| 181 | + -o out.mp3 |
| 182 | +``` |
| 183 | + |
| 184 | +**Speech-to-text** (round-trips `out.mp3` → text via a wav re-encode): |
| 185 | + |
| 186 | +```bash |
| 187 | +ffmpeg -y -i out.mp3 -ar 16000 -ac 1 out.wav |
| 188 | +curl -sX POST http://localhost:13305/api/v1/audio/transcriptions \ |
| 189 | + -F "file=@out.wav" -F "model=Whisper-Tiny" |
| 190 | +``` |
| 191 | + |
| 192 | +If any of the three returns a non-2xx status, fix it now. The rule we just |
| 193 | +installed sends future requests to these same endpoints, so a broken endpoint |
| 194 | +becomes a broken user experience. |
| 195 | + |
| 196 | +--- |
| 197 | + |
| 198 | +## What changes after this skill runs |
| 199 | + |
| 200 | +From the next turn onward, the agent reads the rule in `AGENTS.md` on every |
| 201 | +message. The rule explicitly tells the agent: |
| 202 | + |
| 203 | +- **For image generation:** call `POST /api/v1/images/generations` on the |
| 204 | + local server. Do **not** call any cloud image API and do **not** use the |
| 205 | + built-in `GenerateImage` tool (that path bills tokens to the cloud |
| 206 | + provider). |
| 207 | +- **For text-to-speech:** call `POST /api/v1/audio/speech`. Do **not** call |
| 208 | + cloud TTS providers (OpenAI TTS, ElevenLabs, etc.). |
| 209 | +- **For speech-to-text:** call `POST /api/v1/audio/transcriptions`. Do |
| 210 | + **not** call cloud transcription providers. |
| 211 | +- **Fallback:** only fall back to a cloud API after one local attempt has |
| 212 | + failed *and* the user has been told the local call failed. Never |
| 213 | + silently fall back; the whole point of this skill is to keep cost |
| 214 | + predictable. |
| 215 | + |
| 216 | +The agent's own text reasoning continues to use whatever LLM Cursor / Claude |
| 217 | +Code / Codex is configured with. This skill does not redirect chat tokens; |
| 218 | +it only redirects the multimodal calls that would otherwise leave the |
| 219 | +machine. |
| 220 | + |
| 221 | +## Troubleshooting cheatsheet |
| 222 | + |
| 223 | +| Symptom | Cause | Recovery | |
| 224 | +|---|---|---| |
| 225 | +| `lemonade: command not found` | Server CLI not installed | Install from <https://lemonade-server.ai/install_options.html>; restart shell. | |
| 226 | +| `Server is not running` | Service stopped after install | Windows: launch the **Lemonade** Start Menu shortcut. Linux: `sudo systemctl start lemonade-server`. | |
| 227 | +| `POST /v1/images/generations` returns 404 model not found | Image model not downloaded | `lemonade pull SD-Turbo` and retry. | |
| 228 | +| Image generation is slow on CPU (~4–5 min) | sd-cpp on CPU backend | Install the GPU backend on supported AMD hardware: `lemonade backends install sd-cpp:rocm`. | |
| 229 | +| `POST /v1/audio/transcriptions` returns 400 unsupported format | Input is not 16 kHz mono WAV | Re-encode with `ffmpeg -i in.* -ar 16000 -ac 1 out.wav`. | |
| 230 | +| `POST /v1/audio/speech` returns 404 | TTS model not downloaded | `lemonade pull kokoro-v1`. | |
| 231 | +| 401 Unauthorized on every request | User has set `LEMONADE_API_KEY` | Add `Authorization: Bearer $LEMONADE_API_KEY` to every request and to the rule block. | |
| 232 | + |
| 233 | +## Verification checklist |
| 234 | + |
| 235 | +Mark this skill complete only when **all** of the following are true: |
| 236 | + |
| 237 | +- [ ] `lemonade status --json` reports the server running on port 13305. |
| 238 | +- [ ] `lemonade list --downloaded` shows `SD-Turbo`, `kokoro-v1`, and |
| 239 | + `Whisper-Tiny`. |
| 240 | +- [ ] The workspace `AGENTS.md` contains the |
| 241 | + `amd-skills:local-ai-use` block. |
| 242 | +- [ ] All three smoke tests in Step 4 succeed. |
| 243 | +- [ ] On a follow-up turn, asking the agent to "generate an image of X" |
| 244 | + causes it to POST to `http://localhost:13305/api/v1/images/generations` |
| 245 | + rather than calling a cloud tool. |
| 246 | + |
| 247 | +If any box is unchecked, the user is still paying cloud cost for at least |
| 248 | +one modality. |
| 249 | + |
| 250 | +--- |
| 251 | + |
| 252 | +## Reference |
| 253 | + |
| 254 | +For the full model picker, alternate-quality options, the complete endpoint |
| 255 | +reference, the API-key flow, and the OmniRouter tool definitions you can |
| 256 | +hand to an agent's tool-calling loop, see [reference.md](reference.md). |
0 commit comments