Skip to content

Commit 8245088

Browse files
committed
Update skill to install Lemonade if not installed
1 parent 361f941 commit 8245088

4 files changed

Lines changed: 371 additions & 353 deletions

File tree

skills/local-ai-use/SKILL.md

Lines changed: 102 additions & 154 deletions
Original file line numberDiff line numberDiff line change
@@ -15,37 +15,34 @@ description: >-
1515

1616
# Local AI Use (route image, TTS, STT through Lemonade)
1717

18-
This is a **meta-skill**. After you run it, every later request that needs
19-
image generation, text-to-speech, or speech-to-text uses the local
18+
This is a **meta-skill**. You run it once. After that, every later request that
19+
needs image generation, text-to-speech, or speech-to-text uses the local
2020
[Lemonade Server](https://lemonade-server.ai) instead of a cloud API. The
2121
agent's own LLM keeps handling text; only the expensive multimodal calls move
2222
on-device.
2323

24-
The skill covers **two independent modality groups** — set up only the one(s)
25-
the user actually needs:
24+
The skill does three things:
2625

27-
| Group | Covers | Default model(s) |
28-
|---|---|---|
29-
| **image** | image generation + editing | `SD-Turbo` |
30-
| **speech** | text-to-speech **and** speech-to-text | `kokoro-v1`, `Whisper-Tiny` |
31-
32-
> **Do ONLY the group the user asked for.** If the user wants to generate an
33-
> image, set up `image` only — do **not** pull the speech models. The setup
34-
> command takes the group as an argument (`image`, `speech`, or `all`), and the
35-
> rule installed into `AGENTS.md` contains only the group(s) you set up.
36-
37-
For each group you set up, the skill does two things:
38-
39-
1. **Verifies that local Lemonade is reachable and has the right models.**
40-
2. **Drops a `Local AI Use` block into the workspace `AGENTS.md`** so the agent
26+
1. **Makes sure local Lemonade is installed and running.** If the `lemonade`
27+
CLI is missing, the setup script installs the **full version** of Lemonade
28+
(server + desktop app) on the user's behalf; if the server is installed but
29+
not running, it launches it.
30+
2. **Verifies that local Lemonade is reachable.**
31+
3. **Drops a `Local AI Use` block into the workspace `AGENTS.md`** so the agent
4132
reads the routing rule on every later turn, in Cursor, Claude Code, Codex,
4233
Gemini CLI, and any other agent that respects `AGENTS.md`.
4334

35+
Models are **not** downloaded during setup. Each default model is pulled
36+
lazily, on first use, by the routing rule (e.g. the first image request pulls
37+
the image model). This keeps setup fast and avoids gigabytes of downloads the
38+
user may never need.
39+
4440
## When to use this skill
4541

4642
Use this skill when **all** of the following are true:
4743

48-
- The user has, or is willing to install, the system-wide Lemonade Server.
44+
- The user wants local Lemonade. If it is not yet installed, the setup script
45+
installs the **full version** (server + desktop app) for them automatically.
4946
- The user accepts the default Lemonade endpoint `http://localhost:13305`.
5047
- The user wants the change to be **persistent** across future turns and
5148
agent restarts (the rule is written to disk).
@@ -57,109 +54,104 @@ instead.
5754
## Prerequisites
5855

5956
- **OS:** Windows 11 x64, Ubuntu/Debian x64, or macOS (beta).
60-
- **Lemonade Server CLI on `PATH`:** verify with `lemonade --version`. If
61-
missing, install from <https://lemonade-server.ai/install_options.html>
62-
before continuing. Do not silently install on the user's machine; that is a
63-
system-wide change and must be the user's call.
64-
- **Disk:** ~5 GB for `image` (SD-Turbo); ~0.4 GB for `speech`
65-
(kokoro-v1 + Whisper-Tiny). Only the group(s) you set up are downloaded.
66-
- **Network:** required for the first `lemonade pull` of each model. After
67-
that, every modality runs offline.
57+
- **Lemonade Server:** the setup script installs it if missing. It downloads
58+
and silently installs the **full version** (Windows `lemonade.msi`, the
59+
Ubuntu/Debian `ppa:lemonade-team/stable` PPA plus `lemonade-desktop`, or the
60+
macOS `.pkg`), then launches the server. On Linux/macOS this needs `sudo`.
61+
Pass `--no-install` if the user wants to install it themselves instead.
62+
- **Disk:** ~8 GB free for the three default models (SD-Turbo + Whisper-Tiny
63+
+ kokoro-v1), plus ~0.1 GB for the installer itself.
64+
- **Network:** required for the install download and the first `lemonade pull`
65+
of each model. After that, every modality runs offline.
6866

6967
## The opinionated path
7068

71-
Run this checklist top to bottom for the group(s) the user needs. Track progress
72-
against it; do not move on until each step verifies.
69+
Run this checklist top to bottom. Track progress against it; do not move on
70+
until each step verifies.
7371

7472
```
75-
[ ] 1. Confirm Lemonade Server is installed and reachable
76-
[ ] 2. Pull the selected group's default models
77-
[ ] 3. Install the routing rule into the workspace AGENTS.md
78-
[ ] 4. Smoke-test the selected group's endpoints
73+
[ ] 1. Ensure Lemonade Server is installed and running (auto-install if missing)
74+
[ ] 2. Install the routing rule into the workspace AGENTS.md
7975
```
8076

81-
The single command that does steps 1, 2, and 3 in one shot, scoped to a group:
77+
The single command that does both steps in one shot is:
8278

8379
```bash
84-
python scripts/setup_local_ai.py image # image only
85-
python scripts/setup_local_ai.py speech # TTS + STT only
86-
python scripts/setup_local_ai.py all # both (only if the user wants both)
80+
python scripts/setup_local_ai.py
8781
```
8882

89-
The script pulls only the selected group's
90-
models and writes only that group's rule section. It is idempotent: re-running
91-
with the same group is a no-op apart from a healthcheck. To add a group later,
92-
re-run with the full set you want (e.g. `all`). Read the sections below for what
93-
to do when each step fails.
83+
It auto-installs the full version of Lemonade if the `lemonade` CLI is
84+
missing, launches the server if it is not running, then writes the rule. The
85+
script is idempotent: re-running it on a fully configured workspace is a no-op
86+
apart from a healthcheck. Read the sections below for what to do when each
87+
step fails.
9488

9589
---
9690

97-
## Step 1: confirm Lemonade Server is reachable
91+
## Step 1: ensure Lemonade Server is installed and running
9892

99-
Run:
93+
`scripts/setup_local_ai.py` handles this end to end, but here is what it does
94+
so you can do it by hand or debug it:
10095

101-
```bash
102-
lemonade status --json
103-
```
96+
**1a. Is the CLI installed?** Check whether `lemonade` is on `PATH`
97+
(`lemonade --version`). If it is not, install the **full version** on the
98+
user's behalf:
99+
100+
| OS | Install the full version |
101+
|---|---|
102+
| Windows | Download `lemonade.msi` from the [latest release](https://github.com/lemonade-sdk/lemonade/releases/latest/download/lemonade.msi) and run `msiexec /i lemonade.msi /qn` (silent, per-user, no elevation). |
103+
| Ubuntu/Debian | `sudo add-apt-repository -y ppa:lemonade-team/stable && sudo apt-get update && sudo apt-get install -y lemonade-server lemonade-desktop` |
104+
| macOS (beta) | Download the `Lemonade-<ver>-Darwin.pkg` from the latest release and run `sudo installer -pkg Lemonade-<ver>-Darwin.pkg -target /`. |
104105

105-
Two acceptable outcomes:
106+
The full installer bundles the server **and** the desktop app; the
107+
server-only minimal MSI and the legacy `lemonade-server` CLI are deprecated
108+
upstream. After a Windows install the CLI lands in
109+
`%LOCALAPPDATA%\lemonade_server` and is added to the *user* PATH (new shells
110+
only); the setup script probes that directory so it works in the same run.
111+
112+
**1b. Is the server running?** Check `lemonade status --json`.
106113

107114
| `lemonade status` says | Action |
108115
|---|---|
109116
| `Server is running on port 13305` | Continue to Step 2. |
110-
| `Server is not running` | Start it. On Windows, launch the **Lemonade** Start Menu shortcut. On Linux, run `sudo systemctl start lemonade-server`. Re-check `lemonade status`. |
117+
| `Server is not running` | Launch it with `lemonade serve` (the script does this in the background and polls `/api/v1/health` until it answers). |
111118

112-
If `lemonade` is not on `PATH` at all, the server is not installed. Stop and
113-
point the user at <https://lemonade-server.ai/install_options.html>. Do not
114-
attempt a silent install.
119+
Only if the automatic install genuinely fails (no `apt-get`, no `sudo`,
120+
download blocked) should you stop and point the user at
121+
<https://lemonade-server.ai/install_options.html>.
115122

116123
The rest of this skill assumes the endpoint is `http://localhost:13305/api/v1`
117124
and no API key is required (the system-wide server defaults to no auth on
118125
loopback). If the user has set `LEMONADE_API_KEY`, the routing rule template
119126
in `templates/local-ai-rule.md` shows where to add the `Authorization` header.
120127

121-
## Step 2: pull the selected group's default models
122-
123-
Pull only the models for the group(s) you are setting up. They are the
124-
**Lite Collection** defaults from Lemonade OmniRouter, sized to keep
125-
token-and-cost savings real on commodity hardware:
128+
### Default modality models (pulled on first use, not during setup)
126129

127-
| Group | Modality | Model | Size | Why this default |
128-
|---|---|---|---|---|
129-
| `image` | Image generation | `SD-Turbo` | ~5 GB | Single-step generation, runs on CPU and AMD iGPU/dGPU |
130-
| `speech` | Text-to-speech | `kokoro-v1` | ~0.3 GB | Only TTS model Lemonade currently supports; CPU-only, low latency |
131-
| `speech` | Speech-to-text | `Whisper-Tiny` | ~0.1 GB | Smallest Whisper; fast on CPU. Upgrade to `Whisper-Large-v3-Turbo` if accuracy matters more than latency. |
130+
Setup does **not** download these. The installed rule pulls each one the first
131+
time that modality is requested. They are the **Lite Collection** defaults from
132+
Lemonade OmniRouter, sized to keep token-and-cost savings real on commodity
133+
hardware:
132134

133-
```bash
134-
# image group
135-
lemonade pull SD-Turbo
136-
# speech group
137-
lemonade pull kokoro-v1
138-
lemonade pull Whisper-Tiny
139-
```
135+
| Modality | Model | Size | Why this default |
136+
|---|---|---|---|
137+
| Image generation | `SD-Turbo` | ~5 GB | Single-step generation, runs on CPU and AMD iGPU/dGPU |
138+
| Text-to-speech | `kokoro-v1` | ~0.3 GB | Only TTS model Lemonade currently supports; CPU-only, low latency |
139+
| Speech-to-text | `Whisper-Tiny` | ~0.1 GB | Smallest Whisper; fast on CPU. Upgrade to `Whisper-Large-v3-Turbo` if accuracy matters more than latency. |
140140

141-
To choose a different model while installing the rule, pass it to the setup
142-
script alongside the group. For example, to make future image requests use SDXL:
141+
To write a different model ID into the rule, pass it to the setup script. For
142+
example, to make future image requests use SDXL:
143143

144144
```bash
145-
python scripts/setup_local_ai.py image --image-model SDXL-Turbo
145+
python scripts/setup_local_ai.py --image-model SDXL-Turbo
146146
```
147147

148-
The script will pull the selected model and write that model ID into the
149-
installed `AGENTS.md` rule. The same pattern works for `--tts-model` and
150-
`--stt-model` with the `speech` group.
151-
152-
Each `pull` is idempotent. To verify what is already downloaded:
153-
154-
```bash
155-
lemonade list --downloaded
156-
```
157-
158-
For coverage of larger / higher-quality alternatives (`SDXL-Turbo`,
159-
`Flux-2-Klein-4B`, `Whisper-Large-v3-Turbo`), see the
148+
That model ID is written into the installed `AGENTS.md` rule and pulled on its
149+
first use. The same pattern works for `--tts-model` and `--stt-model`. For
150+
larger / higher-quality alternatives (`SDXL-Turbo`, `Flux-2-Klein-4B`,
151+
`Whisper-Large-v3-Turbo`), see the
160152
[model picker in reference.md](reference.md#model-picker).
161153

162-
## Step 3: install the routing rule into AGENTS.md
154+
## Step 2: install the routing rule into AGENTS.md
163155

164156
The rule is a Markdown block stored in [`templates/local-ai-rule.md`](templates/local-ai-rule.md).
165157
Append it to the workspace's `AGENTS.md` (create the file if missing). Both
@@ -189,66 +181,25 @@ block to:
189181

190182
The rule's content is identical; only the file location changes.
191183

192-
## Step 4: smoke-test the group(s) you set up
193-
194-
Verify each modality you set up against the live server before declaring
195-
success — run only the tests for the group(s) you installed. These mirror the
196-
inline patterns in the installed rule, so a green pass here means the rule will
197-
work. If you installed with a model override such as `--image-model SDXL-Turbo`,
198-
use that model ID in the smoke test and confirm the installed `AGENTS.md` rule
199-
contains it.
200-
201-
**Image generation**`image` group (writes `out.png`):
202-
203-
```bash
204-
curl -sX POST http://localhost:13305/api/v1/images/generations \
205-
-H "Content-Type: application/json" \
206-
-d '{"model":"SD-Turbo","prompt":"a single red apple on a white table","size":"512x512","steps":4,"response_format":"b64_json"}' \
207-
| python -c "import sys,json,base64; open('out.png','wb').write(base64.b64decode(json.load(sys.stdin)['data'][0]['b64_json']))"
208-
```
209-
210-
**Text-to-speech**`speech` group (writes `out.mp3`):
211-
212-
```bash
213-
curl -sX POST http://localhost:13305/api/v1/audio/speech \
214-
-H "Content-Type: application/json" \
215-
-d '{"model":"kokoro-v1","input":"Local AI is now active.","response_format":"mp3"}' \
216-
-o out.mp3
217-
```
218-
219-
**Speech-to-text**`speech` group (round-trips `out.mp3` → text via a wav re-encode):
220-
221-
```bash
222-
ffmpeg -y -i out.mp3 -ar 16000 -ac 1 out.wav
223-
curl -sX POST http://localhost:13305/api/v1/audio/transcriptions \
224-
-F "file=@out.wav" -F "model=Whisper-Tiny"
225-
```
226-
227-
If a test for a group you set up returns a non-2xx status, fix it now. The rule
228-
we just installed sends future requests to these same endpoints, so a broken
229-
endpoint becomes a broken user experience.
230-
231184
---
232185

233186
## What changes after this skill runs
234187

235188
From the next turn onward, the agent reads the rule in `AGENTS.md` on every
236-
message. For each group you set up, the rule explicitly tells the agent:
237-
238-
- **image group — image generation:** call `POST /api/v1/images/generations`
239-
(or `/images/edits`) on the local server. Do **not** call any cloud image API
240-
and do **not** use the built-in `GenerateImage` tool (that path bills tokens
241-
to the cloud provider).
242-
- **speech group — text-to-speech:** call `POST /api/v1/audio/speech`. Do
243-
**not** call cloud TTS providers (OpenAI TTS, ElevenLabs, etc.).
244-
- **speech group — speech-to-text:** call `POST /api/v1/audio/transcriptions`.
245-
Do **not** call cloud transcription providers.
246-
- **Fallback (any group):** only fall back to a cloud API after one local
247-
attempt has failed *and* the user has been told the local call failed. Never
248-
silently fall back; the whole point of this skill is to keep cost predictable.
249-
250-
A group you did **not** set up is untouched — the agent keeps using its
251-
configured providers for that modality.
189+
message. The rule explicitly tells the agent:
190+
191+
- **For image generation:** call `POST /api/v1/images/generations` on the
192+
local server. Do **not** call any cloud image API and do **not** use the
193+
built-in `GenerateImage` tool (that path bills tokens to the cloud
194+
provider).
195+
- **For text-to-speech:** call `POST /api/v1/audio/speech`. Do **not** call
196+
cloud TTS providers (OpenAI TTS, ElevenLabs, etc.).
197+
- **For speech-to-text:** call `POST /api/v1/audio/transcriptions`. Do
198+
**not** call cloud transcription providers.
199+
- **Fallback:** only fall back to a cloud API after one local attempt has
200+
failed *and* the user has been told the local call failed. Never
201+
silently fall back; the whole point of this skill is to keep cost
202+
predictable.
252203

253204
The agent's own text reasoning continues to use whatever LLM Cursor / Claude
254205
Code / Codex is configured with. This skill does not redirect chat tokens;
@@ -259,8 +210,8 @@ machine.
259210

260211
| Symptom | Cause | Recovery |
261212
|---|---|---|
262-
| `lemonade: command not found` | Server CLI not installed | Install from <https://lemonade-server.ai/install_options.html>; restart shell. |
263-
| `Server is not running` | Service stopped after install | Windows: launch the **Lemonade** Start Menu shortcut. Linux: `sudo systemctl start lemonade-server`. |
213+
| `lemonade: command not found` | CLI not installed | Re-run `python scripts/setup_local_ai.py` (auto-installs the full version). If it just installed on Windows, open a new shell so the user PATH refreshes, or the script will find it under `%LOCALAPPDATA%\lemonade_server`. |
214+
| `Server is not running` | Service stopped after install | Run `lemonade serve` (the setup script launches it for you). |
264215
| `POST /v1/images/generations` returns 404 model not found | Image model not downloaded | `lemonade pull SD-Turbo` and retry. |
265216
| Image generation is slow on CPU (~4–5 min) | sd-cpp on CPU backend | Install the GPU backend on supported AMD hardware: `lemonade backends install sd-cpp:rocm`. |
266217
| `POST /v1/audio/transcriptions` returns 400 unsupported format | Input is not 16 kHz mono WAV | Re-encode with `ffmpeg -i in.* -ar 16000 -ac 1 out.wav`. |
@@ -269,20 +220,17 @@ machine.
269220

270221
## Verification checklist
271222

272-
Mark a group complete only when **all** of the following are true for it:
223+
Mark this skill complete only when **all** of the following are true:
273224

274225
- [ ] `lemonade status --json` reports the server running on port 13305.
275-
- [ ] `lemonade list --downloaded` shows the group's model(s): `SD-Turbo` for
276-
`image`; `kokoro-v1` and `Whisper-Tiny` for `speech`.
277-
- [ ] The workspace `AGENTS.md` contains the `amd-skills:local-ai-use` block,
278-
and that block includes the group's section (`### Image` and/or
279-
`### Speech`).
280-
- [ ] The group's smoke test(s) in Step 4 succeed.
281-
- [ ] On a follow-up turn, a request for that modality causes the agent to POST
282-
to the local endpoint rather than calling a cloud tool.
283-
284-
You only need the rows for the group(s) you set up. A group you skipped is
285-
expected to still use cloud providers.
226+
- [ ] The workspace `AGENTS.md` contains the
227+
`amd-skills:local-ai-use` block.
228+
- [ ] On a follow-up turn, asking the agent to "generate an image of X"
229+
causes it to POST to `http://localhost:13305/api/v1/images/generations`
230+
(pulling the model on first use) rather than calling a cloud tool.
231+
232+
If any box is unchecked, the user is still paying cloud cost for at least
233+
one modality.
286234

287235
---
288236

skills/local-ai-use/reference.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,7 @@ asks for higher quality or has explicit hardware to spare.
3333
To upgrade: re-run setup with the target model, for example:
3434

3535
```bash
36-
python scripts/setup_local_ai.py image --image-model SDXL-Turbo
36+
python scripts/setup_local_ai.py --image-model SDXL-Turbo
3737
```
3838

3939
The script pulls the model and rewrites the `AGENTS.md` rule in place.

0 commit comments

Comments
 (0)