Skip to content

Commit 3d85ee7

Browse files
committed
Add local ai use skill
1 parent 0427fdb commit 3d85ee7

5 files changed

Lines changed: 854 additions & 1 deletion

File tree

README.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -65,6 +65,7 @@ Embed AMD-optimized AI into end-user applications.
6565
| Skill | What it does |
6666
| --- | --- |
6767
| `local-ai-app-integration` | Add private, on-device AI to apps that use OpenAI, Anthropic, or Ollama APIs by bundling Embeddable Lemonade as a subprocess. |
68+
| `local-ai-use` | Apply a Lemonade-first strategy so agents default to local image generation, text-to-speech, and speech-to-text to reduce token/cost usage before any cloud fallback. |
6869

6970
### Cross-stack porting
7071

@@ -226,7 +227,7 @@ See [AUTHORING.md](AUTHORING.md) for the full authoring guide, including when a
226227

227228
## Status
228229

229-
This repository is in its early days. The first in-repo skill, `skills/local-ai-app-integration/`, is available now and seeds the **Application integration** focus area. The Hardware-native, Cross-stack porting, and Profiling and delivery focus areas are being built out incrementally alongside manifests and CI. Expect rapid iteration. File an issue if there is a workflow you want covered, or open a PR with a skill you have been wanting to share.
230+
This repository is in its early days. In-repo skills include `skills/local-ai-app-integration/` and `skills/local-ai-use/`, seeding the **Application integration** focus area. The Hardware-native, Cross-stack porting, and Profiling and delivery focus areas are being built out incrementally alongside manifests and CI. Expect rapid iteration. File an issue if there is a workflow you want covered, or open a PR with a skill you have been wanting to share.
230231

231232
## License
232233

skills/local-ai-use/SKILL.md

Lines changed: 256 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,256 @@
1+
---
2+
name: local-ai-use
3+
description: >-
4+
Routes image generation, text-to-speech, and speech-to-text through a local
5+
Lemonade Server so the agent stops paying for cloud calls on those modalities.
6+
Use when the user wants to save tokens, save cost, or save money using local
7+
AI; default to local, offline, on-device, or private image generation,
8+
transcription, or text-to-speech in this workspace; stop using DALL-E,
9+
Whisper-as-a-service, ElevenLabs, or other paid multimodal APIs; route the
10+
agent's image, TTS, or STT tool calls to a local model; or mentions Lemonade
11+
Server, OmniRouter, SD-Turbo, kokoro, Whisper, Ryzen AI, NPU/iGPU/dGPU
12+
inference, or "use local for images but cloud for chat". Run once per
13+
workspace; the rule it installs handles every later request.
14+
---
15+
16+
# Local AI Use (route image, TTS, STT through Lemonade)
17+
18+
This is a **meta-skill**. You run it once. After that, every later request that
19+
needs image generation, text-to-speech, or speech-to-text uses the local
20+
[Lemonade Server](https://lemonade-server.ai) instead of a cloud API. The
21+
agent's own LLM keeps handling text; only the expensive multimodal calls move
22+
on-device.
23+
24+
The skill does two things:
25+
26+
1. **Verifies that local Lemonade is reachable and has the right models.**
27+
2. **Drops a `Local AI Use` block into the workspace `AGENTS.md`** so the agent
28+
reads the routing rule on every later turn, in Cursor, Claude Code, Codex,
29+
Gemini CLI, and any other agent that respects `AGENTS.md`.
30+
31+
## When to use this skill
32+
33+
Use this skill when **all** of the following are true:
34+
35+
- The user has, or is willing to install, the system-wide Lemonade Server.
36+
- The user accepts the default Lemonade endpoint `http://localhost:13305`.
37+
- The user wants the change to be **persistent** across future turns and
38+
agent restarts (the rule is written to disk).
39+
40+
If the user is instead **embedding** Lemonade as a private subprocess inside
41+
an app installer, do not use this skill; use `local-ai-app-integration`
42+
instead.
43+
44+
## Prerequisites
45+
46+
- **OS:** Windows 11 x64, Ubuntu/Debian x64, or macOS (beta).
47+
- **Lemonade Server CLI on `PATH`:** verify with `lemonade --version`. If
48+
missing, install from <https://lemonade-server.ai/install_options.html>
49+
before continuing. Do not silently install on the user's machine; that is a
50+
system-wide change and must be the user's call.
51+
- **Disk:** ~8 GB free for the three default models (SD-Turbo + Whisper-Tiny
52+
+ kokoro-v1).
53+
- **Network:** required for the first `lemonade pull` of each model. After
54+
that, every modality runs offline.
55+
56+
## The opinionated path
57+
58+
Run this checklist top to bottom. Track progress against it; do not move on
59+
until each step verifies.
60+
61+
```
62+
[ ] 1. Confirm Lemonade Server is installed and reachable
63+
[ ] 2. Pull the three default modality models
64+
[ ] 3. Install the routing rule into the workspace AGENTS.md
65+
[ ] 4. Smoke-test image, TTS, and STT against the local endpoint
66+
```
67+
68+
The single command that does steps 1, 2, and 3 in one shot is:
69+
70+
```bash
71+
python scripts/setup_local_ai.py
72+
```
73+
74+
(Run from this skill's folder.) The script is idempotent: re-running it on a
75+
fully configured workspace is a no-op apart from a healthcheck. Read the
76+
sections below for what to do when each step fails.
77+
78+
---
79+
80+
## Step 1: confirm Lemonade Server is reachable
81+
82+
Run:
83+
84+
```bash
85+
lemonade status --json
86+
```
87+
88+
Two acceptable outcomes:
89+
90+
| `lemonade status` says | Action |
91+
|---|---|
92+
| `Server is running on port 13305` | Continue to Step 2. |
93+
| `Server is not running` | Start it. On Windows, launch the **Lemonade** Start Menu shortcut. On Linux, run `sudo systemctl start lemonade-server`. Re-check `lemonade status`. |
94+
95+
If `lemonade` is not on `PATH` at all, the server is not installed. Stop and
96+
point the user at <https://lemonade-server.ai/install_options.html>. Do not
97+
attempt a silent install.
98+
99+
The rest of this skill assumes the endpoint is `http://localhost:13305/api/v1`
100+
and no API key is required (the system-wide server defaults to no auth on
101+
loopback). If the user has set `LEMONADE_API_KEY`, the routing rule template
102+
in `templates/local-ai-rule.md` shows where to add the `Authorization` header.
103+
104+
## Step 2: pull the three default modality models
105+
106+
Pull these three. They are the **Lite Collection** defaults from Lemonade
107+
OmniRouter, sized to keep token-and-cost savings real on commodity hardware:
108+
109+
| Modality | Model | Size | Why this default |
110+
|---|---|---|---|
111+
| Image generation | `SD-Turbo` | ~5 GB | Single-step generation, runs on CPU and AMD iGPU/dGPU |
112+
| Text-to-speech | `kokoro-v1` | ~0.3 GB | Only TTS model Lemonade currently supports; CPU-only, low latency |
113+
| Speech-to-text | `Whisper-Tiny` | ~0.1 GB | Smallest Whisper; fast on CPU. Upgrade to `Whisper-Large-v3-Turbo` if accuracy matters more than latency. |
114+
115+
```bash
116+
lemonade pull SD-Turbo
117+
lemonade pull kokoro-v1
118+
lemonade pull Whisper-Tiny
119+
```
120+
121+
Each `pull` is idempotent. To verify what is already downloaded:
122+
123+
```bash
124+
lemonade list --downloaded
125+
```
126+
127+
For coverage of larger / higher-quality alternatives (`SDXL-Turbo`,
128+
`Flux-2-Klein-4B`, `Whisper-Large-v3-Turbo`), see the
129+
[model picker in reference.md](reference.md#model-picker).
130+
131+
## Step 3: install the routing rule into AGENTS.md
132+
133+
The rule is a Markdown block stored in [`templates/local-ai-rule.md`](templates/local-ai-rule.md).
134+
Append it to the workspace's `AGENTS.md` (create the file if missing). Both
135+
Cursor and Claude Code load `AGENTS.md` automatically on every turn, so the
136+
agent will see the rule on its next message without any further setup.
137+
138+
`scripts/setup_local_ai.py` does this for you, surrounded by stable markers
139+
so re-running the script replaces the block in place rather than appending
140+
a second copy. The markers look like:
141+
142+
```
143+
<!-- BEGIN amd-skills:local-ai-use -->
144+
...rule...
145+
<!-- END amd-skills:local-ai-use -->
146+
```
147+
148+
If you write the file by hand, keep those exact markers. The script relies
149+
on them for idempotent updates.
150+
151+
If the user's agent only respects a different convention, mirror the same
152+
block to:
153+
154+
- `CLAUDE.md` (Claude Code, project-scoped) or `~/.claude/CLAUDE.md` (global)
155+
- `.cursor/rules/local-ai-use.mdc` (Cursor user/project rules)
156+
- `GEMINI.md` (Gemini CLI)
157+
158+
The rule's content is identical; only the file location changes.
159+
160+
## Step 4: smoke-test the three modalities
161+
162+
Verify each modality against the live server before declaring success. These
163+
mirror the inline patterns in the installed rule, so a green pass here means
164+
the rule will work.
165+
166+
**Image generation** (writes `out.png`):
167+
168+
```bash
169+
curl -sX POST http://localhost:13305/api/v1/images/generations \
170+
-H "Content-Type: application/json" \
171+
-d '{"model":"SD-Turbo","prompt":"a single red apple on a white table","size":"512x512","steps":4,"response_format":"b64_json"}' \
172+
| python -c "import sys,json,base64; open('out.png','wb').write(base64.b64decode(json.load(sys.stdin)['data'][0]['b64_json']))"
173+
```
174+
175+
**Text-to-speech** (writes `out.mp3`):
176+
177+
```bash
178+
curl -sX POST http://localhost:13305/api/v1/audio/speech \
179+
-H "Content-Type: application/json" \
180+
-d '{"model":"kokoro-v1","input":"Local AI is now active.","response_format":"mp3"}' \
181+
-o out.mp3
182+
```
183+
184+
**Speech-to-text** (round-trips `out.mp3` → text via a wav re-encode):
185+
186+
```bash
187+
ffmpeg -y -i out.mp3 -ar 16000 -ac 1 out.wav
188+
curl -sX POST http://localhost:13305/api/v1/audio/transcriptions \
189+
-F "file=@out.wav" -F "model=Whisper-Tiny"
190+
```
191+
192+
If any of the three returns a non-2xx status, fix it now. The rule we just
193+
installed sends future requests to these same endpoints, so a broken endpoint
194+
becomes a broken user experience.
195+
196+
---
197+
198+
## What changes after this skill runs
199+
200+
From the next turn onward, the agent reads the rule in `AGENTS.md` on every
201+
message. The rule explicitly tells the agent:
202+
203+
- **For image generation:** call `POST /api/v1/images/generations` on the
204+
local server. Do **not** call any cloud image API and do **not** use the
205+
built-in `GenerateImage` tool (that path bills tokens to the cloud
206+
provider).
207+
- **For text-to-speech:** call `POST /api/v1/audio/speech`. Do **not** call
208+
cloud TTS providers (OpenAI TTS, ElevenLabs, etc.).
209+
- **For speech-to-text:** call `POST /api/v1/audio/transcriptions`. Do
210+
**not** call cloud transcription providers.
211+
- **Fallback:** only fall back to a cloud API after one local attempt has
212+
failed *and* the user has been told the local call failed. Never
213+
silently fall back; the whole point of this skill is to keep cost
214+
predictable.
215+
216+
The agent's own text reasoning continues to use whatever LLM Cursor / Claude
217+
Code / Codex is configured with. This skill does not redirect chat tokens;
218+
it only redirects the multimodal calls that would otherwise leave the
219+
machine.
220+
221+
## Troubleshooting cheatsheet
222+
223+
| Symptom | Cause | Recovery |
224+
|---|---|---|
225+
| `lemonade: command not found` | Server CLI not installed | Install from <https://lemonade-server.ai/install_options.html>; restart shell. |
226+
| `Server is not running` | Service stopped after install | Windows: launch the **Lemonade** Start Menu shortcut. Linux: `sudo systemctl start lemonade-server`. |
227+
| `POST /v1/images/generations` returns 404 model not found | Image model not downloaded | `lemonade pull SD-Turbo` and retry. |
228+
| Image generation is slow on CPU (~4–5 min) | sd-cpp on CPU backend | Install the GPU backend on supported AMD hardware: `lemonade backends install sd-cpp:rocm`. |
229+
| `POST /v1/audio/transcriptions` returns 400 unsupported format | Input is not 16 kHz mono WAV | Re-encode with `ffmpeg -i in.* -ar 16000 -ac 1 out.wav`. |
230+
| `POST /v1/audio/speech` returns 404 | TTS model not downloaded | `lemonade pull kokoro-v1`. |
231+
| 401 Unauthorized on every request | User has set `LEMONADE_API_KEY` | Add `Authorization: Bearer $LEMONADE_API_KEY` to every request and to the rule block. |
232+
233+
## Verification checklist
234+
235+
Mark this skill complete only when **all** of the following are true:
236+
237+
- [ ] `lemonade status --json` reports the server running on port 13305.
238+
- [ ] `lemonade list --downloaded` shows `SD-Turbo`, `kokoro-v1`, and
239+
`Whisper-Tiny`.
240+
- [ ] The workspace `AGENTS.md` contains the
241+
`amd-skills:local-ai-use` block.
242+
- [ ] All three smoke tests in Step 4 succeed.
243+
- [ ] On a follow-up turn, asking the agent to "generate an image of X"
244+
causes it to POST to `http://localhost:13305/api/v1/images/generations`
245+
rather than calling a cloud tool.
246+
247+
If any box is unchecked, the user is still paying cloud cost for at least
248+
one modality.
249+
250+
---
251+
252+
## Reference
253+
254+
For the full model picker, alternate-quality options, the complete endpoint
255+
reference, the API-key flow, and the OmniRouter tool definitions you can
256+
hand to an agent's tool-calling loop, see [reference.md](reference.md).

0 commit comments

Comments
 (0)