Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions .github/skillspector-allow.yml
Original file line number Diff line number Diff line change
Expand Up @@ -101,6 +101,18 @@ suppressions:
argparse defaults / explicit --image-model/--tts-model/--stt-model flags,
not from LLM or model output. Nothing here consumes unvalidated model
output, so there is no injection sink to sanitize.
- skill: local-ai-use
rule: TM2
file: SKILL.md
match: Chaining Abuse
reason: >-
False positive. Line 103 is the documented Ubuntu/Debian install
one-liner `sudo add-apt-repository -y ppa:lemonade-team/stable &&
sudo apt-get update && sudo apt-get install -y lemonade-server
lemonade-desktop`. The `&&` chaining is the standard apt install
sequence (add PPA, refresh index, install package), not tool/command
chaining of untrusted or model-derived steps. No LLM output feeds the
chain and each command is a fixed, reviewable install step.
- skill: local-ai-use
rule: P2
file: templates/local-ai-rule.md
Expand Down
7 changes: 2 additions & 5 deletions eval/behavioral/tests/test_local_ai_use.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,13 +29,10 @@ def test_generate_image_of_a_cat():
run.workspace_contains("out.png")

# Positive behavioral expectations
run.should("Install Lemonade Server if it is not already installed")
run.should("Download the SD-Turbo model if the model is not already downloaded")
run.should("Add a 'Local AI Use' block to AGENTS.md")

# Negative behavioral expectations
run.should_not("Use the GenerateImage tool")
run.should_not("Use a cloud image API")
run.should_not("Pull unrelated modalities for this image generation task")
run.should_not("Reach for a cloud image path instead of local Lemonade")

# Skipped behavioral expectations
#run.should_not("Pull unrelated modalities for an image-only task")
167 changes: 69 additions & 98 deletions skills/local-ai-use/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,18 +21,28 @@ needs image generation, text-to-speech, or speech-to-text uses the local
agent's own LLM keeps handling text; only the expensive multimodal calls move
on-device.

The skill does two things:

1. **Verifies that local Lemonade is reachable and has the right models.**
2. **Drops a `Local AI Use` block into the workspace `AGENTS.md`** so the agent
The skill does three things:

1. **Makes sure local Lemonade is installed and running.** If the `lemonade`
CLI is missing, the setup script installs the **full version** of Lemonade
(server + desktop app) on the user's behalf; if the server is installed but
not running, it launches it.
2. **Verifies that local Lemonade is reachable.**
3. **Drops a `Local AI Use` block into the workspace `AGENTS.md`** so the agent
reads the routing rule on every later turn, in Cursor, Claude Code, Codex,
Gemini CLI, and any other agent that respects `AGENTS.md`.

Models are **not** downloaded during setup. Each default model is pulled
lazily, on first use, by the routing rule (e.g. the first image request pulls
the image model). This keeps setup fast and avoids gigabytes of downloads the
user may never need.

## When to use this skill

Use this skill when **all** of the following are true:

- The user has, or is willing to install, the system-wide Lemonade Server.
- The user wants local Lemonade. If it is not yet installed, the setup script
installs the **full version** (server + desktop app) for them automatically.
- The user accepts the default Lemonade endpoint `http://localhost:13305`.
- The user wants the change to be **persistent** across future turns and
agent restarts (the rule is written to disk).
Expand All @@ -44,102 +54,104 @@ instead.
## Prerequisites

- **OS:** Windows 11 x64, Ubuntu/Debian x64, or macOS (beta).
- **Lemonade Server CLI on `PATH`:** verify with `lemonade --version`. If
missing, install from <https://lemonade-server.ai/install_options.html>
before continuing. Do not silently install on the user's machine; that is a
system-wide change and must be the user's call.
- **Lemonade Server:** the setup script installs it if missing. It downloads
and silently installs the **full version** (Windows `lemonade.msi`, the
Ubuntu/Debian `ppa:lemonade-team/stable` PPA plus `lemonade-desktop`, or the
macOS `.pkg`), then launches the server. On Linux/macOS this needs `sudo`.
Pass `--no-install` if the user wants to install it themselves instead.
- **Disk:** ~8 GB free for the three default models (SD-Turbo + Whisper-Tiny
+ kokoro-v1).
- **Network:** required for the first `lemonade pull` of each model. After
that, every modality runs offline.
+ kokoro-v1), plus ~0.1 GB for the installer itself.
- **Network:** required for the install download and the first `lemonade pull`
of each model. After that, every modality runs offline.

## The opinionated path

Run this checklist top to bottom. Track progress against it; do not move on
until each step verifies.

```
[ ] 1. Confirm Lemonade Server is installed and reachable
[ ] 2. Pull the three default modality models
[ ] 3. Install the routing rule into the workspace AGENTS.md
[ ] 4. Smoke-test image, TTS, and STT against the local endpoint
[ ] 1. Ensure Lemonade Server is installed and running (auto-install if missing)
[ ] 2. Install the routing rule into the workspace AGENTS.md
```

The single command that does steps 1, 2, and 3 in one shot is:
The single command that does both steps in one shot is:

```bash
python scripts/setup_local_ai.py
```

The script is idempotent: re-running it on a
fully configured workspace is a no-op apart from a healthcheck. Read the
sections below for what to do when each step fails.
It auto-installs the full version of Lemonade if the `lemonade` CLI is
missing, launches the server if it is not running, then writes the rule. The
script is idempotent: re-running it on a fully configured workspace is a no-op
apart from a healthcheck. Read the sections below for what to do when each
step fails.

---

## Step 1: confirm Lemonade Server is reachable
## Step 1: ensure Lemonade Server is installed and running

Run:
`scripts/setup_local_ai.py` handles this end to end, but here is what it does
so you can do it by hand or debug it:

```bash
lemonade status --json
```
**1a. Is the CLI installed?** Check whether `lemonade` is on `PATH`
(`lemonade --version`). If it is not, install the **full version** on the
user's behalf:

Two acceptable outcomes:
| OS | Install the full version |
|---|---|
| Windows | Download `lemonade.msi` from the [latest release](https://github.com/lemonade-sdk/lemonade/releases/latest/download/lemonade.msi) and run `msiexec /i lemonade.msi /qn` (silent, per-user, no elevation). |
| Ubuntu/Debian | `sudo add-apt-repository -y ppa:lemonade-team/stable && sudo apt-get update && sudo apt-get install -y lemonade-server lemonade-desktop` |
| macOS (beta) | Download the `Lemonade-<ver>-Darwin.pkg` from the latest release and run `sudo installer -pkg Lemonade-<ver>-Darwin.pkg -target /`. |

The full installer bundles the server **and** the desktop app; the
server-only minimal MSI and the legacy `lemonade-server` CLI are deprecated
upstream. After a Windows install the CLI lands in
`%LOCALAPPDATA%\lemonade_server` and is added to the *user* PATH (new shells
only); the setup script probes that directory so it works in the same run.

**1b. Is the server running?** Check `lemonade status --json`.

| `lemonade status` says | Action |
|---|---|
| `Server is running on port 13305` | Continue to Step 2. |
| `Server is not running` | Start it. On Windows, launch the **Lemonade** Start Menu shortcut. On Linux, run `sudo systemctl start lemonade-server`. Re-check `lemonade status`. |
| `Server is not running` | Launch it with `lemonade serve` (the script does this in the background and polls `/api/v1/health` until it answers). |

If `lemonade` is not on `PATH` at all, the server is not installed. Stop and
point the user at <https://lemonade-server.ai/install_options.html>. Do not
attempt a silent install.
Only if the automatic install genuinely fails (no `apt-get`, no `sudo`,
download blocked) should you stop and point the user at
<https://lemonade-server.ai/install_options.html>.

The rest of this skill assumes the endpoint is `http://localhost:13305/api/v1`
and no API key is required (the system-wide server defaults to no auth on
loopback). If the user has set `LEMONADE_API_KEY`, the routing rule template
in `templates/local-ai-rule.md` shows where to add the `Authorization` header.

## Step 2: pull the three default modality models
### Default modality models (pulled on first use, not during setup)

Pull these three. They are the **Lite Collection** defaults from Lemonade
OmniRouter, sized to keep token-and-cost savings real on commodity hardware:
Setup does **not** download these. The installed rule pulls each one the first
time that modality is requested. They are the **Lite Collection** defaults from
Lemonade OmniRouter, sized to keep token-and-cost savings real on commodity
hardware:

| Modality | Model | Size | Why this default |
|---|---|---|---|
| Image generation | `SD-Turbo` | ~5 GB | Single-step generation, runs on CPU and AMD iGPU/dGPU |
| Text-to-speech | `kokoro-v1` | ~0.3 GB | Only TTS model Lemonade currently supports; CPU-only, low latency |
| Speech-to-text | `Whisper-Tiny` | ~0.1 GB | Smallest Whisper; fast on CPU. Upgrade to `Whisper-Large-v3-Turbo` if accuracy matters more than latency. |

```bash
lemonade pull SD-Turbo
lemonade pull kokoro-v1
lemonade pull Whisper-Tiny
```

To choose a different model while installing the rule, pass it to the setup
script. For example, to make future image requests use SDXL:
To write a different model ID into the rule, pass it to the setup script. For
example, to make future image requests use SDXL:

```bash
python scripts/setup_local_ai.py --image-model SDXL-Turbo
```

The script will pull the selected model and write that model ID into the
installed `AGENTS.md` rule. The same pattern works for `--tts-model` and
`--stt-model`.

Each `pull` is idempotent. To verify what is already downloaded:

```bash
lemonade list --downloaded
```

For coverage of larger / higher-quality alternatives (`SDXL-Turbo`,
`Flux-2-Klein-4B`, `Whisper-Large-v3-Turbo`), see the
That model ID is written into the installed `AGENTS.md` rule and pulled on its
first use. The same pattern works for `--tts-model` and `--stt-model`. For
larger / higher-quality alternatives (`SDXL-Turbo`, `Flux-2-Klein-4B`,
`Whisper-Large-v3-Turbo`), see the
[model picker in reference.md](reference.md#model-picker).

## Step 3: install the routing rule into AGENTS.md
## Step 2: install the routing rule into AGENTS.md

The rule is a Markdown block stored in [`templates/local-ai-rule.md`](templates/local-ai-rule.md).
Append it to the workspace's `AGENTS.md` (create the file if missing). Both
Expand Down Expand Up @@ -169,44 +181,6 @@ block to:

The rule's content is identical; only the file location changes.

## Step 4: smoke-test the three modalities

Verify each modality against the live server before declaring success. These
mirror the inline patterns in the installed rule, so a green pass here means
the rule will work. If you installed with a model override such as
`--image-model SDXL-Turbo`, use that model ID in the smoke test and confirm
the installed `AGENTS.md` rule contains it.

**Image generation** (writes `out.png`):

```bash
curl -sX POST http://localhost:13305/api/v1/images/generations \
-H "Content-Type: application/json" \
-d '{"model":"SD-Turbo","prompt":"a single red apple on a white table","size":"512x512","steps":4,"response_format":"b64_json"}' \
| python -c "import sys,json,base64; open('out.png','wb').write(base64.b64decode(json.load(sys.stdin)['data'][0]['b64_json']))"
```

**Text-to-speech** (writes `out.mp3`):

```bash
curl -sX POST http://localhost:13305/api/v1/audio/speech \
-H "Content-Type: application/json" \
-d '{"model":"kokoro-v1","input":"Local AI is now active.","response_format":"mp3"}' \
-o out.mp3
```

**Speech-to-text** (round-trips `out.mp3` → text via a wav re-encode):

```bash
ffmpeg -y -i out.mp3 -ar 16000 -ac 1 out.wav
curl -sX POST http://localhost:13305/api/v1/audio/transcriptions \
-F "file=@out.wav" -F "model=Whisper-Tiny"
```

If any of the three returns a non-2xx status, fix it now. The rule we just
installed sends future requests to these same endpoints, so a broken endpoint
becomes a broken user experience.

---

## What changes after this skill runs
Expand Down Expand Up @@ -236,8 +210,8 @@ machine.

| Symptom | Cause | Recovery |
|---|---|---|
| `lemonade: command not found` | Server CLI not installed | Install from <https://lemonade-server.ai/install_options.html>; restart shell. |
| `Server is not running` | Service stopped after install | Windows: launch the **Lemonade** Start Menu shortcut. Linux: `sudo systemctl start lemonade-server`. |
| `lemonade: command not found` | CLI not installed | Re-run `python scripts/setup_local_ai.py` (auto-installs the full version). If it just installed on Windows, open a new shell so the user PATH refreshes, or the script will find it under `%LOCALAPPDATA%\lemonade_server`. |
| `Server is not running` | Service stopped after install | Run `lemonade serve` (the setup script launches it for you). |
| `POST /v1/images/generations` returns 404 model not found | Image model not downloaded | `lemonade pull SD-Turbo` and retry. |
| Image generation is slow on CPU (~4–5 min) | sd-cpp on CPU backend | Install the GPU backend on supported AMD hardware: `lemonade backends install sd-cpp:rocm`. |
| `POST /v1/audio/transcriptions` returns 400 unsupported format | Input is not 16 kHz mono WAV | Re-encode with `ffmpeg -i in.* -ar 16000 -ac 1 out.wav`. |
Expand All @@ -249,14 +223,11 @@ machine.
Mark this skill complete only when **all** of the following are true:

- [ ] `lemonade status --json` reports the server running on port 13305.
- [ ] `lemonade list --downloaded` shows `SD-Turbo`, `kokoro-v1`, and
`Whisper-Tiny`.
- [ ] The workspace `AGENTS.md` contains the
`amd-skills:local-ai-use` block.
- [ ] All three smoke tests in Step 4 succeed.
- [ ] On a follow-up turn, asking the agent to "generate an image of X"
causes it to POST to `http://localhost:13305/api/v1/images/generations`
rather than calling a cloud tool.
(pulling the model on first use) rather than calling a cloud tool.

If any box is unchecked, the user is still paying cloud cost for at least
one modality.
Expand Down
Loading
Loading