Refactored local-ai-use skill to be modular (#48)

maxdokukin-amd · danielholanda · web-flow · commit 52c8e6486ed3 · 2026-06-09T16:36:41.000-07:00
Co-authored-by: Daniel Holanda &lt;holand.daniel@gmail.com&gt;
diff --git a/.github/skillspector-allow.yml b/.github/skillspector-allow.yml
@@ -101,6 +101,18 @@ suppressions:
       argparse defaults / explicit --image-model/--tts-model/--stt-model flags,
       not from LLM or model output. Nothing here consumes unvalidated model
       output, so there is no injection sink to sanitize.
+  - skill: local-ai-use
+    rule: TM2
+    file: SKILL.md
+    match: Chaining Abuse
+    reason: >-
+      False positive. Line 103 is the documented Ubuntu/Debian install
+      one-liner `sudo add-apt-repository -y ppa:lemonade-team/stable &&
+      sudo apt-get update && sudo apt-get install -y lemonade-server
+      lemonade-desktop`. The `&&` chaining is the standard apt install
+      sequence (add PPA, refresh index, install package), not tool/command
+      chaining of untrusted or model-derived steps. No LLM output feeds the
+      chain and each command is a fixed, reviewable install step.
   - skill: local-ai-use
     rule: P2
     file: templates/local-ai-rule.md
diff --git a/eval/behavioral/tests/test_local_ai_use.py b/eval/behavioral/tests/test_local_ai_use.py
@@ -29,13 +29,10 @@ def test_generate_image_of_a_cat():
             run.workspace_contains("out.png")
 
             # Positive behavioral expectations
+            run.should("Install Lemonade Server if it is not already installed")
             run.should("Download the SD-Turbo model if the model is not already downloaded")
             run.should("Add a 'Local AI Use' block to AGENTS.md")
 
             # Negative behavioral expectations
-            run.should_not("Use the GenerateImage tool")
-            run.should_not("Use a cloud image API")
+            run.should_not("Pull unrelated modalities for this image generation task")
             run.should_not("Reach for a cloud image path instead of local Lemonade")
-
-            # Skipped behavioral expectations
-            #run.should_not("Pull unrelated modalities for an image-only task")
diff --git a/skills/local-ai-use/SKILL.md b/skills/local-ai-use/SKILL.md
@@ -21,18 +21,28 @@ needs image generation, text-to-speech, or speech-to-text uses the local
 agent's own LLM keeps handling text; only the expensive multimodal calls move
 on-device.
 
-The skill does two things:
-
-1. **Verifies that local Lemonade is reachable and has the right models.**
-2. **Drops a `Local AI Use` block into the workspace `AGENTS.md`** so the agent
+The skill does three things:
+
+1. **Makes sure local Lemonade is installed and running.** If the `lemonade`
+   CLI is missing, the setup script installs the **full version** of Lemonade
+   (server + desktop app) on the user's behalf; if the server is installed but
+   not running, it launches it.
+2. **Verifies that local Lemonade is reachable.**
+3. **Drops a `Local AI Use` block into the workspace `AGENTS.md`** so the agent
    reads the routing rule on every later turn, in Cursor, Claude Code, Codex,
    Gemini CLI, and any other agent that respects `AGENTS.md`.
 
+Models are **not** downloaded during setup. Each default model is pulled
+lazily, on first use, by the routing rule (e.g. the first image request pulls
+the image model). This keeps setup fast and avoids gigabytes of downloads the
+user may never need.
+
 ## When to use this skill
 
 Use this skill when **all** of the following are true:
 
-- The user has, or is willing to install, the system-wide Lemonade Server.
+- The user wants local Lemonade. If it is not yet installed, the setup script
+  installs the **full version** (server + desktop app) for them automatically.
 - The user accepts the default Lemonade endpoint `http://localhost:13305`.
 - The user wants the change to be **persistent** across future turns and
   agent restarts (the rule is written to disk).
@@ -44,102 +54,104 @@ instead.
 ## Prerequisites
 
 - **OS:** Windows 11 x64, Ubuntu/Debian x64, or macOS (beta).
-- **Lemonade Server CLI on `PATH`:** verify with `lemonade --version`. If
-  missing, install from <https://lemonade-server.ai/install_options.html>
-  before continuing. Do not silently install on the user's machine; that is a
-  system-wide change and must be the user's call.
+- **Lemonade Server:** the setup script installs it if missing. It downloads
+  and silently installs the **full version** (Windows `lemonade.msi`, the
+  Ubuntu/Debian `ppa:lemonade-team/stable` PPA plus `lemonade-desktop`, or the
+  macOS `.pkg`), then launches the server. On Linux/macOS this needs `sudo`.
+  Pass `--no-install` if the user wants to install it themselves instead.
 - **Disk:** ~8 GB free for the three default models (SD-Turbo + Whisper-Tiny
-  + kokoro-v1).
-- **Network:** required for the first `lemonade pull` of each model. After
-  that, every modality runs offline.
+  + kokoro-v1), plus ~0.1 GB for the installer itself.
+- **Network:** required for the install download and the first `lemonade pull`
+  of each model. After that, every modality runs offline.
 
 ## The opinionated path
 
 Run this checklist top to bottom. Track progress against it; do not move on
 until each step verifies.
 
 ```
-[ ] 1. Confirm Lemonade Server is installed and reachable
-[ ] 2. Pull the three default modality models
-[ ] 3. Install the routing rule into the workspace AGENTS.md
-[ ] 4. Smoke-test image, TTS, and STT against the local endpoint
+[ ] 1. Ensure Lemonade Server is installed and running (auto-install if missing)
+[ ] 2. Install the routing rule into the workspace AGENTS.md
 ```
 
-The single command that does steps 1, 2, and 3 in one shot is:
+The single command that does both steps in one shot is:
 
 ```bash
 python scripts/setup_local_ai.py
 ```
 
-The script is idempotent: re-running it on a
-fully configured workspace is a no-op apart from a healthcheck. Read the
-sections below for what to do when each step fails.
+It auto-installs the full version of Lemonade if the `lemonade` CLI is
+missing, launches the server if it is not running, then writes the rule. The
+script is idempotent: re-running it on a fully configured workspace is a no-op
+apart from a healthcheck. Read the sections below for what to do when each
+step fails.
 
 ---
 
-## Step 1: confirm Lemonade Server is reachable
+## Step 1: ensure Lemonade Server is installed and running
 
-Run:
+`scripts/setup_local_ai.py` handles this end to end, but here is what it does
+so you can do it by hand or debug it:
 
-```bash
-lemonade status --json
-```
+**1a. Is the CLI installed?** Check whether `lemonade` is on `PATH`
+(`lemonade --version`). If it is not, install the **full version** on the
+user's behalf:
 
-Two acceptable outcomes:
+| OS | Install the full version |
+|---|---|
+| Windows | Download `lemonade.msi` from the [latest release](https://github.com/lemonade-sdk/lemonade/releases/latest/download/lemonade.msi) and run `msiexec /i lemonade.msi /qn` (silent, per-user, no elevation). |
+| Ubuntu/Debian | `sudo add-apt-repository -y ppa:lemonade-team/stable && sudo apt-get update && sudo apt-get install -y lemonade-server lemonade-desktop` |
+| macOS (beta) | Download the `Lemonade-<ver>-Darwin.pkg` from the latest release and run `sudo installer -pkg Lemonade-<ver>-Darwin.pkg -target /`. |
+
+The full installer bundles the server **and** the desktop app; the
+server-only minimal MSI and the legacy `lemonade-server` CLI are deprecated
+upstream. After a Windows install the CLI lands in
+`%LOCALAPPDATA%\lemonade_server` and is added to the *user* PATH (new shells
+only); the setup script probes that directory so it works in the same run.
+
+**1b. Is the server running?** Check `lemonade status --json`.
 
 | `lemonade status` says | Action |
 |---|---|
 | `Server is running on port 13305` | Continue to Step 2. |
-| `Server is not running` | Start it. On Windows, launch the **Lemonade** Start Menu shortcut. On Linux, run `sudo systemctl start lemonade-server`. Re-check `lemonade status`. |
+| `Server is not running` | Launch it with `lemonade serve` (the script does this in the background and polls `/api/v1/health` until it answers). |
 
-If `lemonade` is not on `PATH` at all, the server is not installed. Stop and
-point the user at <https://lemonade-server.ai/install_options.html>. Do not
-attempt a silent install.
+Only if the automatic install genuinely fails (no `apt-get`, no `sudo`,
+download blocked) should you stop and point the user at
+<https://lemonade-server.ai/install_options.html>.
 
 The rest of this skill assumes the endpoint is `http://localhost:13305/api/v1`
 and no API key is required (the system-wide server defaults to no auth on
 loopback). If the user has set `LEMONADE_API_KEY`, the routing rule template
 in `templates/local-ai-rule.md` shows where to add the `Authorization` header.
 
-## Step 2: pull the three default modality models
+### Default modality models (pulled on first use, not during setup)
 
-Pull these three. They are the **Lite Collection** defaults from Lemonade
-OmniRouter, sized to keep token-and-cost savings real on commodity hardware:
+Setup does **not** download these. The installed rule pulls each one the first
+time that modality is requested. They are the **Lite Collection** defaults from
+Lemonade OmniRouter, sized to keep token-and-cost savings real on commodity
+hardware:
 
 | Modality | Model | Size | Why this default |
 |---|---|---|---|
 | Image generation | `SD-Turbo` | ~5 GB | Single-step generation, runs on CPU and AMD iGPU/dGPU |
 | Text-to-speech | `kokoro-v1` | ~0.3 GB | Only TTS model Lemonade currently supports; CPU-only, low latency |
 | Speech-to-text | `Whisper-Tiny` | ~0.1 GB | Smallest Whisper; fast on CPU. Upgrade to `Whisper-Large-v3-Turbo` if accuracy matters more than latency. |
 
-```bash
-lemonade pull SD-Turbo
-lemonade pull kokoro-v1
-lemonade pull Whisper-Tiny
-```
-
-To choose a different model while installing the rule, pass it to the setup
-script. For example, to make future image requests use SDXL:
+To write a different model ID into the rule, pass it to the setup script. For
+example, to make future image requests use SDXL:
 
 ```bash
 python scripts/setup_local_ai.py --image-model SDXL-Turbo
 ```
 
-The script will pull the selected model and write that model ID into the
-installed `AGENTS.md` rule. The same pattern works for `--tts-model` and
-`--stt-model`.
-
-Each `pull` is idempotent. To verify what is already downloaded:
-
-```bash
-lemonade list --downloaded
-```
-
-For coverage of larger / higher-quality alternatives (`SDXL-Turbo`,
-`Flux-2-Klein-4B`, `Whisper-Large-v3-Turbo`), see the
+That model ID is written into the installed `AGENTS.md` rule and pulled on its
+first use. The same pattern works for `--tts-model` and `--stt-model`. For
+larger / higher-quality alternatives (`SDXL-Turbo`, `Flux-2-Klein-4B`,
+`Whisper-Large-v3-Turbo`), see the
 [model picker in reference.md](reference.md#model-picker).
 
-## Step 3: install the routing rule into AGENTS.md
+## Step 2: install the routing rule into AGENTS.md
 
 The rule is a Markdown block stored in [`templates/local-ai-rule.md`](templates/local-ai-rule.md).
 Append it to the workspace's `AGENTS.md` (create the file if missing). Both
@@ -169,44 +181,6 @@ block to:
 
 The rule's content is identical; only the file location changes.
 
-## Step 4: smoke-test the three modalities
-
-Verify each modality against the live server before declaring success. These
-mirror the inline patterns in the installed rule, so a green pass here means
-the rule will work. If you installed with a model override such as
-`--image-model SDXL-Turbo`, use that model ID in the smoke test and confirm
-the installed `AGENTS.md` rule contains it.
-
-**Image generation** (writes `out.png`):
-
-```bash
-curl -sX POST http://localhost:13305/api/v1/images/generations \
-  -H "Content-Type: application/json" \
-  -d '{"model":"SD-Turbo","prompt":"a single red apple on a white table","size":"512x512","steps":4,"response_format":"b64_json"}' \
-  | python -c "import sys,json,base64; open('out.png','wb').write(base64.b64decode(json.load(sys.stdin)['data'][0]['b64_json']))"
-```
-
-**Text-to-speech** (writes `out.mp3`):
-
-```bash
-curl -sX POST http://localhost:13305/api/v1/audio/speech \
-  -H "Content-Type: application/json" \
-  -d '{"model":"kokoro-v1","input":"Local AI is now active.","response_format":"mp3"}' \
-  -o out.mp3
-```
-
-**Speech-to-text** (round-trips `out.mp3` → text via a wav re-encode):
-
-```bash
-ffmpeg -y -i out.mp3 -ar 16000 -ac 1 out.wav
-curl -sX POST http://localhost:13305/api/v1/audio/transcriptions \
-  -F "file=@out.wav" -F "model=Whisper-Tiny"
-```
-
-If any of the three returns a non-2xx status, fix it now. The rule we just
-installed sends future requests to these same endpoints, so a broken endpoint
-becomes a broken user experience.
-
 ---
 
 ## What changes after this skill runs
@@ -236,8 +210,8 @@ machine.
 
 | Symptom | Cause | Recovery |
 |---|---|---|
-| `lemonade: command not found` | Server CLI not installed | Install from <https://lemonade-server.ai/install_options.html>; restart shell. |
-| `Server is not running` | Service stopped after install | Windows: launch the **Lemonade** Start Menu shortcut. Linux: `sudo systemctl start lemonade-server`. |
+| `lemonade: command not found` | CLI not installed | Re-run `python scripts/setup_local_ai.py` (auto-installs the full version). If it just installed on Windows, open a new shell so the user PATH refreshes, or the script will find it under `%LOCALAPPDATA%\lemonade_server`. |
+| `Server is not running` | Service stopped after install | Run `lemonade serve` (the setup script launches it for you). |
 | `POST /v1/images/generations` returns 404 model not found | Image model not downloaded | `lemonade pull SD-Turbo` and retry. |
 | Image generation is slow on CPU (~4–5 min) | sd-cpp on CPU backend | Install the GPU backend on supported AMD hardware: `lemonade backends install sd-cpp:rocm`. |
 | `POST /v1/audio/transcriptions` returns 400 unsupported format | Input is not 16 kHz mono WAV | Re-encode with `ffmpeg -i in.* -ar 16000 -ac 1 out.wav`. |
@@ -249,14 +223,11 @@ machine.
 Mark this skill complete only when **all** of the following are true:
 
 - [ ] `lemonade status --json` reports the server running on port 13305.
-- [ ] `lemonade list --downloaded` shows `SD-Turbo`, `kokoro-v1`, and
-      `Whisper-Tiny`.
 - [ ] The workspace `AGENTS.md` contains the
       `amd-skills:local-ai-use` block.
-- [ ] All three smoke tests in Step 4 succeed.
 - [ ] On a follow-up turn, asking the agent to "generate an image of X"
       causes it to POST to `http://localhost:13305/api/v1/images/generations`
-      rather than calling a cloud tool.
+      (pulling the model on first use) rather than calling a cloud tool.
 
 If any box is unchecked, the user is still paying cloud cost for at least
 one modality.
diff --git a/skills/local-ai-use/scripts/setup_local_ai.py b/skills/local-ai-use/scripts/setup_local_ai.py