Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,8 @@ WebUI/external/service/
WebUI/external/workflows_intel/
WebUI/external/workflows_bak/

*.pyc

*.7z
*.whl
hijacks/
Expand Down
61 changes: 54 additions & 7 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ Concise reference for AI coding agents working in this repository.

Electron + Vue.js desktop app for AI inference on Intel GPUs. Multi-process architecture:
Electron main process orchestrates Vue.js frontend and multiple Python/native backend services
(AI Backend, ComfyUI, LlamaCPP, OpenVINO, Ollama). Frontend code lives in `WebUI/`.
(AI Backend, ComfyUI, LlamaCPP, OpenVINO). Frontend code lives in `WebUI/`.

## Mandatory Rules

Expand Down Expand Up @@ -182,7 +182,7 @@ There is **no Vue Router**. Navigation is state-driven:
- Once running, `promptStore.currentMode` controls which view renders: `chat` → `Chat.vue`, `imageGen`/`imageEdit`/`video` → `WorkflowResult.vue`
- `PromptArea.vue` is the shared prompt input bar across all modes

### Backend Services (5 services, dynamic ports)
### Backend Services (4 services, dynamic ports)

Managed by `electron/subprocesses/apiServiceRegistry.ts`. Each service spawns a child process and exposes an OpenAI-compatible HTTP API:

Expand All @@ -192,14 +192,13 @@ Managed by `electron/subprocesses/apiServiceRegistry.ts`. Each service spawns a
| `llamacpp-backend` | 39000-39999 | `llama-server` (native) | `/health` | GGUF model inference (LLM + embedding sub-servers) |
| `openvino-backend` | 29000-29999 | `ovms` (native) | `/v2/health/ready` | OpenVINO inference (LLM + embedding + transcription sub-servers) |
| `comfyui-backend` | 49000-49999 | ComfyUI `main.py` (Python) | `/queue` | Image/video/3D generation via workflows |
| `ollama-backend` | 40000-41000 | `ollama` (native) | `/api/version` | Ollama inference (preview feature) |

### Three Communication Patterns

1. **Electron IPC** (renderer ↔ main): ALL service lifecycle — start, stop, setup, device selection, `ensureBackendReadiness`. Renderer calls `window.electronAPI.*`, main handles via `ipcMain.handle()`. Main pushes events via `win.webContents.send()`.

2. **Direct HTTP** (renderer → backend): For actual AI operations after service is ready:
- **Chat inference**: Vercel AI SDK `streamText()` → `{backendUrl}/v1/chat/completions` (LlamaCpp/OpenVINO/Ollama)
- **Chat inference**: Vercel AI SDK `streamText()` → `{backendUrl}/v1/chat/completions` (LlamaCpp/OpenVINO)
- **Model management**: `fetch()` → Flask ai-backend `/api/*` (download, check, size)
- **Image generation**: `fetch()` → ComfyUI `/prompt`, `/upload/image`, `/interrupt`, `/free`

Expand Down Expand Up @@ -254,12 +253,11 @@ User sends message → `textInference.ensureReadyForInference()` → IPC `ensure
- `demoMode` — Demo mode overlay + auto-reset timer. IPC: `getDemoModeSettings`. No deps.
- `speechToText` — STT enabled state, initialization. Deps: `backendServices`, `models`, `dialogs`, `globalSetup`
- `audioRecorder` — Browser MediaRecorder, transcription via AI SDK. Deps: `backendServices` (lazy)
- `ollama` — Ollama model pull progress. Deps: `textInference`
- `developerSettings` — Dev console on startup toggle. No deps.

### Feature → File Map

**Chat/LLM**: `views/Chat.vue` → stores: `openAiCompatibleChat`, `textInference`, `conversations`, `presets` → electron: `ensureBackendReadiness` IPC → backend: `llamacpp`/`openvino`/`ollama` via Vercel AI SDK
**Chat/LLM**: `views/Chat.vue` → stores: `openAiCompatibleChat`, `textInference`, `conversations`, `presets` → electron: `ensureBackendReadiness` IPC → backend: `llamacpp`/`openvino` via Vercel AI SDK

**Image/Video Generation**: `views/WorkflowResult.vue` → stores: `imageGenerationPresets`, `comfyUiPresets`, `presets` → electron: service lifecycle IPC → backend: `comfyui-backend` via direct HTTP

Expand All @@ -285,7 +283,56 @@ User sends message → `textInference.ensureReadyForInference()` → IPC `ensure
| `electron/subprocesses/llamaCppBackendService.ts` | LlamaCPP native server (LLM + embedding sub-servers) |
| `electron/subprocesses/openVINOBackendService.ts` | OpenVINO OVMS (LLM + embedding + transcription sub-servers) |
| `electron/subprocesses/comfyUIBackendService.ts` | ComfyUI Python server |
| `electron/subprocesses/ollamaBackendService.ts` | Ollama binary (preview) |
| `electron/subprocesses/langchain.ts` | RAG utility process (document splitting, embedding, vector search) |
| `electron/subprocesses/deviceDetection.ts` | Intel GPU device detection and env var setup |
| `electron/logging/logger.ts` | Logging, sends `debugLog` events to renderer |

## Cursor Cloud specific instructions

### Running the dev server

```bash
cd /workspace/WebUI
npm run fetch-external-resources # first time only — downloads uv + 7zip binaries
DISPLAY=:1 npm run dev
```

The Vite dev server starts on `http://localhost:25413` and Electron opens automatically.
A virtual framebuffer (`Xvfb`) is already running on `:1`.

### Backend services on Linux

The `ai-backend` and `llamacpp-backend` services work on Linux (Ubuntu x64):

- Run `npm run fetch-external-resources` once to download `uv` and `7zip` binaries for
the current platform (placed in `build/resources/`).
- Start the Electron app with `DISPLAY=:1 npm run dev`. On the setup dialog, click
**Install** next to `AI Playground` (ai-backend) and `Llama.cpp - GGUF` (llamacpp-backend).
- `ai-backend` runs a Python Flask server on port 59000 (health: `GET /healthy`).
- `llamacpp-backend` downloads the `ubuntu-x64` CPU build from GitHub releases and
provides on-demand LLM inference (health: `GET /health`).
- ComfyUI and OpenVINO are not yet supported on Linux.

### Testing inference end-to-end

A small test model (`LFM2.5-350M-Q4_K_M.gguf`, ~255 MB) is registered in `models.json`.
To test inference:

1. Start the app, install both backends via the setup dialog, then click **Continue**.
2. Open **Chat Settings**, select **LFM2.5-350M-Q4_K_M.gguf** from the Model dropdown.
3. Type a message and send — the app auto-downloads the model from HuggingFace on first use.
4. The llamacpp-backend will load the model and serve streaming responses.

**Network requirement**: Model downloads redirect through `cas-bridge.xethub.hf.co`
(HuggingFace Xet CDN). This domain must be in the egress allowlist. Allowlist changes
only take effect on new VM sessions — a running VM will not pick up changes.

### Known issues

- **`npm install` requires `--legacy-peer-deps`** due to a `zod@4` vs `zod@3` peer
conflict from `@browserbasehq/stagehand` (transitive dep of `@langchain/community`).
- **`electron/test/subprocesses/service.test.ts` fails** because the `electron` path alias
in `vitest.config.ts` shadows the `electron` package mock. This is a pre-existing issue
on the `dev` branch — 4 of 5 test files (24 tests) pass.
- **Prettier reports 2 pre-existing formatting issues** in `electron/subprocesses/openVINOBackendService.ts`
and `src/components/BackendOptions.vue` on the `dev` branch.
Loading
Loading