You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
> **Do ONLY the group the user asked for.** If the user wants to generate an
33
-
> image, set up `image` only — do **not** pull the speech models. The setup
34
-
> command takes the group as an argument (`image`, `speech`, or `all`), and the
35
-
> rule installed into `AGENTS.md` contains only the group(s) you set up.
36
-
37
-
For each group you set up, the skill does two things:
38
-
39
-
1.**Verifies that local Lemonade is reachable and has the right models.**
40
-
2.**Drops a `Local AI Use` block into the workspace `AGENTS.md`** so the agent
26
+
1.**Makes sure local Lemonade is installed and running.** If the `lemonade`
27
+
CLI is missing, the setup script installs the **full version** of Lemonade
28
+
(server + desktop app) on the user's behalf; if the server is installed but
29
+
not running, it launches it.
30
+
2.**Verifies that local Lemonade is reachable.**
31
+
3.**Drops a `Local AI Use` block into the workspace `AGENTS.md`** so the agent
41
32
reads the routing rule on every later turn, in Cursor, Claude Code, Codex,
42
33
Gemini CLI, and any other agent that respects `AGENTS.md`.
43
34
35
+
Models are **not** downloaded during setup. Each default model is pulled
36
+
lazily, on first use, by the routing rule (e.g. the first image request pulls
37
+
the image model). This keeps setup fast and avoids gigabytes of downloads the
38
+
user may never need.
39
+
44
40
## When to use this skill
45
41
46
42
Use this skill when **all** of the following are true:
47
43
48
-
- The user has, or is willing to install, the system-wide Lemonade Server.
44
+
- The user wants local Lemonade. If it is not yet installed, the setup script
45
+
installs the **full version** (server + desktop app) for them automatically.
49
46
- The user accepts the default Lemonade endpoint `http://localhost:13305`.
50
47
- The user wants the change to be **persistent** across future turns and
51
48
agent restarts (the rule is written to disk).
@@ -57,109 +54,104 @@ instead.
57
54
## Prerequisites
58
55
59
56
-**OS:** Windows 11 x64, Ubuntu/Debian x64, or macOS (beta).
60
-
-**Lemonade Server CLI on `PATH`:** verify with `lemonade --version`. If
61
-
missing, install from <https://lemonade-server.ai/install_options.html>
62
-
before continuing. Do not silently install on the user's machine; that is a
63
-
system-wide change and must be the user's call.
64
-
-**Disk:**~5 GB for `image` (SD-Turbo); ~0.4 GB for `speech`
65
-
(kokoro-v1 + Whisper-Tiny). Only the group(s) you set up are downloaded.
66
-
-**Network:** required for the first `lemonade pull` of each model. After
67
-
that, every modality runs offline.
57
+
-**Lemonade Server:** the setup script installs it if missing. It downloads
58
+
and silently installs the **full version** (Windows `lemonade.msi`, the
59
+
Ubuntu/Debian `ppa:lemonade-team/stable` PPA plus `lemonade-desktop`, or the
60
+
macOS `.pkg`), then launches the server. On Linux/macOS this needs `sudo`.
61
+
Pass `--no-install` if the user wants to install it themselves instead.
62
+
-**Disk:**~8 GB free for the three default models (SD-Turbo + Whisper-Tiny
63
+
+ kokoro-v1), plus ~0.1 GB for the installer itself.
64
+
-**Network:** required for the install download and the first `lemonade pull`
65
+
of each model. After that, every modality runs offline.
68
66
69
67
## The opinionated path
70
68
71
-
Run this checklist top to bottom for the group(s) the user needs. Track progress
72
-
against it; do not move on until each step verifies.
69
+
Run this checklist top to bottom. Track progress against it; do not move on
70
+
until each step verifies.
73
71
74
72
```
75
-
[ ] 1. Confirm Lemonade Server is installed and reachable
76
-
[ ] 2. Pull the selected group's default models
77
-
[ ] 3. Install the routing rule into the workspace AGENTS.md
78
-
[ ] 4. Smoke-test the selected group's endpoints
73
+
[ ] 1. Ensure Lemonade Server is installed and running (auto-install if missing)
74
+
[ ] 2. Install the routing rule into the workspace AGENTS.md
79
75
```
80
76
81
-
The single command that does steps 1, 2, and 3 in one shot, scoped to a group:
77
+
The single command that does both steps in one shot is:
82
78
83
79
```bash
84
-
python scripts/setup_local_ai.py image # image only
85
-
python scripts/setup_local_ai.py speech # TTS + STT only
86
-
python scripts/setup_local_ai.py all # both (only if the user wants both)
80
+
python scripts/setup_local_ai.py
87
81
```
88
82
89
-
The script pulls only the selected group's
90
-
models and writes only that group's rule section. It is idempotent: re-running
91
-
with the same group is a no-op apart from a healthcheck. To add a group later,
92
-
re-run with the full set you want (e.g. `all`). Read the sections below for what
93
-
to do when each step fails.
83
+
It auto-installs the full version of Lemonade if the `lemonade` CLI is
84
+
missing, launches the server if it is not running, then writes the rule. The
85
+
script is idempotent: re-running it on a fully configured workspace is a no-op
86
+
apart from a healthcheck. Read the sections below for what to do when each
87
+
step fails.
94
88
95
89
---
96
90
97
-
## Step 1: confirm Lemonade Server is reachable
91
+
## Step 1: ensure Lemonade Server is installed and running
98
92
99
-
Run:
93
+
`scripts/setup_local_ai.py` handles this end to end, but here is what it does
94
+
so you can do it by hand or debug it:
100
95
101
-
```bash
102
-
lemonade status --json
103
-
```
96
+
**1a. Is the CLI installed?** Check whether `lemonade` is on `PATH`
97
+
(`lemonade --version`). If it is not, install the **full version** on the
98
+
user's behalf:
99
+
100
+
| OS | Install the full version |
101
+
|---|---|
102
+
| Windows | Download `lemonade.msi` from the [latest release](https://github.com/lemonade-sdk/lemonade/releases/latest/download/lemonade.msi) and run `msiexec /i lemonade.msi /qn` (silent, per-user, no elevation). |
| macOS (beta) | Download the `Lemonade-<ver>-Darwin.pkg` from the latest release and run `sudo installer -pkg Lemonade-<ver>-Darwin.pkg -target /`. |
104
105
105
-
Two acceptable outcomes:
106
+
The full installer bundles the server **and** the desktop app; the
107
+
server-only minimal MSI and the legacy `lemonade-server` CLI are deprecated
108
+
upstream. After a Windows install the CLI lands in
109
+
`%LOCALAPPDATA%\lemonade_server` and is added to the *user* PATH (new shells
110
+
only); the setup script probes that directory so it works in the same run.
111
+
112
+
**1b. Is the server running?** Check `lemonade status --json`.
106
113
107
114
|`lemonade status` says | Action |
108
115
|---|---|
109
116
|`Server is running on port 13305`| Continue to Step 2. |
110
-
|`Server is not running`|Start it. On Windows, launch the **Lemonade** Start Menu shortcut. On Linux, run `sudo systemctl start lemonade-server`. Re-check `lemonade status`. |
117
+
|`Server is not running`|Launch it with `lemonade serve` (the script does this in the background and polls `/api/v1/health` until it answers). |
111
118
112
-
If `lemonade` is not on `PATH` at all, the server is not installed. Stop and
113
-
point the user at<https://lemonade-server.ai/install_options.html>. Do not
114
-
attempt a silent install.
119
+
Only if the automatic install genuinely fails (no `apt-get`, no `sudo`,
120
+
download blocked) should you stop and point the user at
The rest of this skill assumes the endpoint is `http://localhost:13305/api/v1`
117
124
and no API key is required (the system-wide server defaults to no auth on
118
125
loopback). If the user has set `LEMONADE_API_KEY`, the routing rule template
119
126
in `templates/local-ai-rule.md` shows where to add the `Authorization` header.
120
127
121
-
## Step 2: pull the selected group's default models
122
-
123
-
Pull only the models for the group(s) you are setting up. They are the
124
-
**Lite Collection** defaults from Lemonade OmniRouter, sized to keep
125
-
token-and-cost savings real on commodity hardware:
128
+
### Default modality models (pulled on first use, not during setup)
126
129
127
-
| Group | Modality | Model | Size | Why this default |
128
-
|---|---|---|---|---|
129
-
|`image`| Image generation |`SD-Turbo`|~5 GB | Single-step generation, runs on CPU and AMD iGPU/dGPU |
130
-
|`speech`| Text-to-speech |`kokoro-v1`|~0.3 GB | Only TTS model Lemonade currently supports; CPU-only, low latency |
131
-
|`speech`| Speech-to-text |`Whisper-Tiny`|~0.1 GB | Smallest Whisper; fast on CPU. Upgrade to `Whisper-Large-v3-Turbo` if accuracy matters more than latency. |
130
+
Setup does **not** download these. The installed rule pulls each one the first
131
+
time that modality is requested. They are the **Lite Collection** defaults from
132
+
Lemonade OmniRouter, sized to keep token-and-cost savings real on commodity
133
+
hardware:
132
134
133
-
```bash
134
-
# image group
135
-
lemonade pull SD-Turbo
136
-
# speech group
137
-
lemonade pull kokoro-v1
138
-
lemonade pull Whisper-Tiny
139
-
```
135
+
| Modality | Model | Size | Why this default |
136
+
|---|---|---|---|
137
+
| Image generation |`SD-Turbo`|~5 GB | Single-step generation, runs on CPU and AMD iGPU/dGPU |
138
+
| Text-to-speech |`kokoro-v1`|~0.3 GB | Only TTS model Lemonade currently supports; CPU-only, low latency |
139
+
| Speech-to-text |`Whisper-Tiny`|~0.1 GB | Smallest Whisper; fast on CPU. Upgrade to `Whisper-Large-v3-Turbo` if accuracy matters more than latency. |
140
140
141
-
To choose a different model while installing the rule, pass it to the setup
142
-
script alongside the group. For example, to make future image requests use SDXL:
141
+
To write a different model ID into the rule, pass it to the setup script. For
-**For speech-to-text:** call `POST /api/v1/audio/transcriptions`. Do
198
+
**not** call cloud transcription providers.
199
+
-**Fallback:** only fall back to a cloud API after one local attempt has
200
+
failed *and* the user has been told the local call failed. Never
201
+
silently fall back; the whole point of this skill is to keep cost
202
+
predictable.
252
203
253
204
The agent's own text reasoning continues to use whatever LLM Cursor / Claude
254
205
Code / Codex is configured with. This skill does not redirect chat tokens;
@@ -259,8 +210,8 @@ machine.
259
210
260
211
| Symptom | Cause | Recovery |
261
212
|---|---|---|
262
-
|`lemonade: command not found`|Server CLI not installed |Install from <https://lemonade-server.ai/install_options.html>; restart shell. |
263
-
|`Server is not running`| Service stopped after install |Windows: launch the **Lemonade** Start Menu shortcut. Linux: `sudo systemctl start lemonade-server`. |
213
+
|`lemonade: command not found`| CLI not installed |Re-run `python scripts/setup_local_ai.py` (auto-installs the full version). If it just installed on Windows, open a new shell so the user PATH refreshes, or the script will find it under `%LOCALAPPDATA%\lemonade_server`. |
214
+
|`Server is not running`| Service stopped after install |Run `lemonade serve` (the setup script launches it for you). |
264
215
|`POST /v1/images/generations` returns 404 model not found | Image model not downloaded |`lemonade pull SD-Turbo` and retry. |
265
216
| Image generation is slow on CPU (~4–5 min) | sd-cpp on CPU backend | Install the GPU backend on supported AMD hardware: `lemonade backends install sd-cpp:rocm`. |
266
217
|`POST /v1/audio/transcriptions` returns 400 unsupported format | Input is not 16 kHz mono WAV | Re-encode with `ffmpeg -i in.* -ar 16000 -ac 1 out.wav`. |
@@ -269,20 +220,17 @@ machine.
269
220
270
221
## Verification checklist
271
222
272
-
Mark a group complete only when **all** of the following are true for it:
223
+
Mark this skill complete only when **all** of the following are true:
273
224
274
225
-[ ]`lemonade status --json` reports the server running on port 13305.
275
-
-[ ]`lemonade list --downloaded` shows the group's model(s): `SD-Turbo` for
276
-
`image`; `kokoro-v1` and `Whisper-Tiny` for `speech`.
277
-
-[ ] The workspace `AGENTS.md` contains the `amd-skills:local-ai-use` block,
278
-
and that block includes the group's section (`### Image` and/or
279
-
`### Speech`).
280
-
-[ ] The group's smoke test(s) in Step 4 succeed.
281
-
-[ ] On a follow-up turn, a request for that modality causes the agent to POST
282
-
to the local endpoint rather than calling a cloud tool.
283
-
284
-
You only need the rows for the group(s) you set up. A group you skipped is
285
-
expected to still use cloud providers.
226
+
-[ ] The workspace `AGENTS.md` contains the
227
+
`amd-skills:local-ai-use` block.
228
+
-[ ] On a follow-up turn, asking the agent to "generate an image of X"
229
+
causes it to POST to `http://localhost:13305/api/v1/images/generations`
230
+
(pulling the model on first use) rather than calling a cloud tool.
231
+
232
+
If any box is unchecked, the user is still paying cloud cost for at least
0 commit comments