From 3bd941caaa89cabedcbf7d48cda54814d6dcac38 Mon Sep 17 00:00:00 2001 From: Cursor Agent Date: Mon, 1 Jun 2026 16:03:46 +0000 Subject: [PATCH] docs: add developer guides for config, runners, and HA cluster Document recently evolved subsystems that lacked in-repo guides: configuration (JSON/YAML, schema, env overrides), runner tag routing, and the admin cluster dashboard API. Link from CONTRIBUTING and README. Co-authored-by: Denis Gukov --- .github/copilot-instructions.md | 9 ++- CONTRIBUTING.md | 11 ++++ README.md | 1 + docs/README.md | 11 ++++ docs/cluster-dashboard.md | 109 ++++++++++++++++++++++++++++++++ docs/configuration.md | 109 ++++++++++++++++++++++++++++++++ docs/runners-and-tags.md | 78 +++++++++++++++++++++++ 7 files changed, 325 insertions(+), 3 deletions(-) create mode 100644 docs/README.md create mode 100644 docs/cluster-dashboard.md create mode 100644 docs/configuration.md create mode 100644 docs/runners-and-tags.md diff --git a/.github/copilot-instructions.md b/.github/copilot-instructions.md index d7a81101d4..d3a011c932 100644 --- a/.github/copilot-instructions.md +++ b/.github/copilot-instructions.md @@ -17,7 +17,7 @@ Semaphore UI is a modern web interface for managing popular DevOps tools like An ### Run the application: - ALWAYS run the bootstrapping steps first - Setup database and admin user: `./bin/semaphore setup` (interactive, use BoltDB option 2 for development) -- Start server: `./bin/semaphore server --config ./config.json` +- Start server: `./bin/semaphore server --config ./config.json` (or `config.yaml`; see `docs/configuration.md`) - Web UI: http://localhost:3000 (login: admin / changeme) - API: http://localhost:3000/api/ (test with: `curl http://localhost:3000/api/ping`) @@ -78,8 +78,10 @@ curl -I http://localhost:3000/ # Should return HTTP 200 ├── db/ - Database models and interfaces ├── services/ - Business logic services ├── util/ - Utility functions and configuration +├── docs/ - Developer guides (configuration, runners, HA) +├── config.schema.yaml - JSON Schema for config.json / config.yaml ├── bin/ - Built binaries (after build) -└── config.json - Runtime configuration (after setup) +└── config.json - Runtime configuration (after setup; YAML also supported) ``` ### Key Commands Reference @@ -150,7 +152,8 @@ During setup, choose option 2 (BoltDB) for simplest development setup: - **NEVER CANCEL** long-running builds or dependency installations - Set appropriate timeouts: deps (5+ min), build (3+ min), tests (2+ min) - The application serves the frontend from the Go backend - no separate frontend server needed -- Configuration is stored in `config.json` after running setup +- Configuration is stored in `config.json` after running setup (or use `config.yaml`; validate with `config.schema.yaml`) +- Developer docs: `docs/README.md` - Default admin credentials after setup: admin / changeme - Linting has known issues - focus on not introducing new ones - Always test changes by running the full application, not just unit tests \ No newline at end of file diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index a75bcf254a..37850f8cf4 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -55,11 +55,22 @@ When creating a pull-request you should: go run cli/main.go service --config ./config.json ``` + Setup writes `config.json` by default. You can also use `config.yaml`; see [docs/configuration.md](docs/configuration.md) for discovery paths, environment overrides, and [`config.schema.yaml`](config.schema.yaml). + Open [localhost:3000](http://localhost:3000) Note: for Windows, you may need [Cygwin](https://www.cygwin.com/) to run certain commands because the [reflex](github.com/cespare/reflex) package probably doesn't work on Windows. You may encounter issues when running `task watch`, but running `task build` etc... will still be OK. +## Developer documentation + +Repository guides for contributors and operators: + +- [docs/README.md](docs/README.md) — index +- [docs/configuration.md](docs/configuration.md) — config file, schema, env vars +- [docs/runners-and-tags.md](docs/runners-and-tags.md) — remote runners and tag routing +- [docs/cluster-dashboard.md](docs/cluster-dashboard.md) — HA cluster admin API + ## Integration tests Dredd is used for API integration tests, if you alter the API in any way you must make sure that the information in the api docs diff --git a/README.md b/README.md index d93a4ec2f5..8688d8343d 100644 --- a/README.md +++ b/README.md @@ -83,6 +83,7 @@ For more installation options, visit our [Installation page](https://semaphoreui * [User Guide](https://docs.semaphoreui.com) * [API Reference](https://semaphoreui.com/api-docs) * [Postman Collection](https://www.postman.com/semaphoreui) +* [Developer docs](docs/README.md) — configuration, runners, HA cluster dashboard (in-repo) ## Awesome Semaphore diff --git a/docs/README.md b/docs/README.md new file mode 100644 index 0000000000..2bb716352d --- /dev/null +++ b/docs/README.md @@ -0,0 +1,11 @@ +# Developer documentation + +Internal guides for contributors and operators. User-facing product docs live at [docs.semaphoreui.com](https://docs.semaphoreui.com). + +| Guide | Audience | Covers | +|-------|----------|--------| +| [Configuration](configuration.md) | Developers, operators | `config.json` / `config.yaml`, env vars, JSON Schema | +| [Runners and tags](runners-and-tags.md) | Developers, operators | Remote runners, tag routing, webhooks | +| [Cluster dashboard](cluster-dashboard.md) | Operators (HA) | Admin cluster API, task state inspection, recovery | + +Implementation plans for upcoming work are under `docs/plans/`. diff --git a/docs/cluster-dashboard.md b/docs/cluster-dashboard.md new file mode 100644 index 0000000000..c9b0549f79 --- /dev/null +++ b/docs/cluster-dashboard.md @@ -0,0 +1,109 @@ +# Cluster dashboard (HA) + +The cluster dashboard is an **admin-only** UI and API for inspecting high-availability (HA) deployments and the shared task state backend. It requires the enterprise HA feature (`features.high_availability`). + +## When it applies + +| `ha.enabled` | Dashboard | +|--------------|-----------| +| `false` | UI shows HA disabled; `GET /api/cluster` returns `{"ha_enabled": false}` only | +| `true` | Full node list, Redis stats, task snapshot, maintenance clear | + +Configure HA in the server config: + +```yaml +ha: + enabled: true + node_id: semaphore-1 # optional; auto-generated if empty + redis: + addr: redis.example.com:6379 + pass: "" +``` + +`util.HAEnabled()` is true when `ha` is set and `ha.enabled` is true. + +## Admin API + +All routes require an authenticated **admin** session (same as other `/api/...` admin routes). + +### `GET /api/cluster` + +Returns cluster status: + +- `ha_enabled` (boolean) — always present +- `node_id` (string) — this instance, when HA config exists +- `nodes` (array) — peer nodes, heartbeats, versions (when HA overlay is active) +- `redis` (object) — connection, memory, key groups (when inspector available) + +When HA is enabled but the cluster inspector is unavailable, the handler responds with **503** and a short error message. When HA is disabled, the response is **200** with only `ha_enabled: false` (no error). + +### `GET /api/cluster/tasks` + +Returns a **task state snapshot** from the task pool store: + +| Field | Meaning | +|-------|---------| +| `queue` | Tasks waiting to start | +| `running` | Tasks currently executing | +| `active_by_project` | Per-project active task records | +| `aliases` | Alias string → task ID | +| `claims` | Task IDs claimed for distributed coordination | + +Works in non-HA mode too (in-memory store); fields may be empty arrays/objects if the store does not implement introspection. + +### `DELETE /api/cluster/tasks` + +Maintenance: clear selected record groups from the backend (Redis in HA). Body: + +```json +{ + "scope": { + "queue": true, + "running": false, + "active": false, + "aliases": false, + "claims": false, + "runtime_fields": false + } +} +``` + +At least one scope flag must be `true`. Use only when recovering from a stuck cluster state (orphaned queue entries, stale claims). Clearing **running** or **active** while real tasks execute can cause inconsistent behavior. + +The UI exposes the same scope checkboxes under **Clear tasks from Redis** (enabled only when `ha_enabled` is true). + +## UI entry + +**Admin → Cluster dashboard** (`web/src/views/Cluster.vue`): + +- Node table and Redis memory chart when HA is active +- Live task tables from `/api/cluster/tasks` +- Upgrade prompt when `features.high_availability` is false + +## Architecture sketch + +```mermaid +flowchart LR + subgraph nodes [Semaphore nodes] + N1[Node A] + N2[Node B] + end + Redis[(Redis task state)] + N1 --> Redis + N2 --> Redis + Admin[Admin UI] --> API["/api/cluster*"] + API --> N1 +``` + +`TaskStateStore` implementations may expose `TaskStateInspector` for snapshots and `ClearTasks`. See `services/tasks/task_state_store.go`. + +## OpenAPI + +Cluster endpoints are documented in `api-docs.yml` under the `cluster` tag (may be commented until Dredd hooks cover them). Regenerate the public Swagger bundle when enabling them in CI. + +## Related code + +- `api/cluster.go` — handlers +- `api/router.go` — route registration +- `pro_interfaces` — `ClusterInspector` for nodes/Redis +- `services/tasks/task_state_store.go` — snapshot and clear types diff --git a/docs/configuration.md b/docs/configuration.md new file mode 100644 index 0000000000..00e48d0278 --- /dev/null +++ b/docs/configuration.md @@ -0,0 +1,109 @@ +# Configuration + +Semaphore reads settings from a config file, then applies environment-variable overrides and built-in defaults. The canonical field list is maintained in [`config.schema.yaml`](../config.schema.yaml) (JSON Schema draft 2020-12), generated from `util.ConfigType` in Go. + +## File format and discovery + +Supported formats: **JSON** (`.json`) and **YAML** (`.yaml`, `.yml`). Keys use `snake_case` and match the `json` struct tags in `util/config.go`. + +### Search order + +When `--config` is not passed and `SEMAPHORE_CONFIG_PATH` is unset, the server looks for the first existing file among: + +1. `./config.json`, `./config.yaml`, `./config.yml` (current working directory) +2. `/usr/local/etc/semaphore/config.{json,yaml,yml}` +3. `/etc/semaphore/config.{json,yaml,yml}` + +Explicit path: + +```bash +./bin/semaphore server --config /etc/semaphore/config.yaml +# or +export SEMAPHORE_CONFIG_PATH=/etc/semaphore/config.yaml +``` + +Interactive setup (`semaphore setup`) still writes `config.json` by default; YAML is fully supported for hand-written or GitOps-managed installs. + +### Load order + +`util.ConfigInit` applies settings in this order (later steps win): + +1. Config file (if present and not disabled with `--no-config`) +2. Environment variables (`SEMAPHORE_*`, see `env:` tags on struct fields) +3. Defaults from struct `default:` tags + +Sensitive values can be loaded from companion files (for example `runner.token_file`, `subscription.key_file`) after the main file is parsed. + +## Schema validation + +Use `config.schema.yaml` in your editor (YAML language server with JSON Schema) or in CI to validate configs before deploy. The schema `$id` is `https://semaphoreui.com/schemas/config.schema.json`. + +To regenerate the schema after changing `util.ConfigType`, follow [`.claude/skills/semaphore-config-schema/SKILL.md`](../.claude/skills/semaphore-config-schema/SKILL.md). + +## Common options (quick reference) + +| Area | Keys | Notes | +|------|------|-------| +| Database | `dialect`, `mysql` / `postgres` / `sqlite` / `bolt` | `bolt` is deprecated; prefer `sqlite` for embedded DB | +| HTTP | `port`, `interface`, `web_host` | `web_host` is the public URL used in links and emails | +| TLS | `tls.enabled`, `tls.cert_file`, `tls.key_file` | Optional HTTP→HTTPS redirect via `tls.http_redirect_addr` **or** `tls.http_redirect_port` (mutually exclusive) | +| Auth | `mfa.totp`, `mfa.email` | Former top-level `auth` was renamed to `mfa` | +| Runners | `use_remote_runner`, `runner_registration_token`, `runner` | Per-runner CLI config block when running `semaphore runner` | +| HA | `ha.enabled`, `ha.node_id`, `ha.redis` | Requires enterprise overlay; see [Cluster dashboard](cluster-dashboard.md) | +| Concurrency | `max_parallel_tasks` | Server-wide cap; per-runner limit is `runner.max_parallel_tasks` | + +Environment variable names mirror keys: `port` → `SEMAPHORE_PORT`, nested fields use underscores (`SEMAPHORE_TLS_ENABLED`, `SEMAPHORE_HA_REDIS_ADDR`). Fields tagged `sensitive` are cleared from the process environment after load so secrets do not leak to child processes. + +## Examples + +### Minimal development (SQLite) + +```yaml +dialect: sqlite +sqlite: + host: /tmp/semaphore.db +port: ":3000" +tmp_path: /tmp/semaphore +cookie_hash: +cookie_encryption: +access_key_encryption: +``` + +Generate secrets with `semaphore setup` or `openssl rand -base64 32`. + +### TLS with HTTP redirect + +```yaml +tls: + enabled: true + cert_file: /etc/semaphore/tls.crt + key_file: /etc/semaphore/tls.key + http_redirect_port: 8080 +``` + +A second listener on port `8080` redirects clients to HTTPS. Use `http_redirect_addr` instead when you need a non-default bind address (for example `:8080` or `127.0.0.1:8080`). + +### Remote runner (server side) + +```yaml +use_remote_runner: true +runner_registration_token: "" +``` + +Runners register with that token; task routing uses project/global runners and optional tags (see [Runners and tags](runners-and-tags.md)). + +## Troubleshooting + +| Symptom | Check | +|---------|--------| +| Server exits on start | Run with explicit `--config`; validate against `config.schema.yaml` | +| Wrong database | `dialect` and the matching `mysql`/`postgres`/`sqlite` block | +| Broken login cookies after config change | `cookie_hash` / `cookie_encryption` must stay stable or all sessions invalidate | +| Runner never picks up jobs | `use_remote_runner`, runner `active`, tag match on template/inventory | +| HA features missing in UI | `ha.enabled` and enterprise subscription; cluster API returns `ha_enabled: false` when disabled | + +## Related code + +- `util/config.go`, `util/config_auth.go` — struct definitions and loading +- `util/config_test.go` — YAML/JSON load tests +- `cli/cmd/root.go` — `--config`, `--no-config` flags diff --git a/docs/runners-and-tags.md b/docs/runners-and-tags.md new file mode 100644 index 0000000000..fccad18f96 --- /dev/null +++ b/docs/runners-and-tags.md @@ -0,0 +1,78 @@ +# Runners and tags + +Semaphore can execute tasks on the server process or on **remote runners** (separate `semaphore runner` processes). Tags restrict which runner may execute a task. + +## Modes + +| Mode | Config | Behavior | +|------|--------|----------| +| Local | `use_remote_runner: false` (default) | Task pool runs jobs on the Semaphore server | +| Remote | `use_remote_runner: true` | Tasks are assigned to registered runners via `RemoteJob` | + +Runners are **project-scoped** (bound to one project) or **global** (any project). Registration uses `runner_registration_token` on the server and `semaphore runner register` on the runner host. + +## Tags + +### Data model + +- Each runner has zero or more string **tags** (`db.Runner.Tags`). +- Templates and inventories may set optional `runner_tag`. When a task runs, the effective tag is **inventory overrides template** if the inventory defines one. + +### Routing rules + +When `use_remote_runner` is true and a task needs a runner (`TaskPool` / `RemoteJob`): + +1. If `runner_tag` is set → select **active** runners whose tags include that value (`RunnerFilterTagCompleteMatch`). +2. If `runner_tag` is empty → select runners marked **default** (`RunnerFilterIsDefault`). +3. Project runners are tried before global runners; order within each group is shuffled (`crypto/rand`) for load spreading. +4. A runner is preferred if it sent a heartbeat within **30 minutes** or has a **webhook** configured (webhook-only runners are treated as always reachable). +5. Among eligible runners, the first with `running_tasks < max_parallel_tasks` wins. + +If no runner matches, the task stays in **waiting** state with error `no runners available`. + +### UI and API + +- **Admin → Runners**: edit tags on global runners. +- **Project → Runners**: project-scoped runners and tags (requires `project_runners` feature). +- Template form: **Runner tag** dropdown populated from `GET /api/project/{id}/runner_tags`. +- Inventory form: optional **Runner tag** (overrides template). +- Tag catalog: `GET /api/runner_tags` (global), `GET /api/project/{id}/runner_tags` (project). + +CLI registration: + +```bash +semaphore runner register --tags linux,amd64 +``` + +## Webhooks + +Runners may define a `webhook` URL. Semaphore POSTs JSON when a task is assigned: + +```json +{ + "action": "start", + "project_id": 1, + "task_id": 42, + "template_id": 3, + "runner_id": 7 +} +``` + +Use webhooks to spawn **one-off** runners (`runner.one_off` in config) in autoscaling environments. + +## Operational checklist + +1. Enable `use_remote_runner` and set `runner_registration_token`. +2. Register runners; confirm **Active** and recent **Last seen**. +3. Set template or inventory `runner_tag` when you need dedicated capacity. +4. Mark exactly one default runner per scope if you rely on untagged templates. +5. For stuck waiting tasks, verify tag spelling and that at least one active runner carries the tag. + +Manual test case: [test/test-cases/TC-028-runner-tags.md](../test/test-cases/TC-028-runner-tags.md). + +## Related code + +- `services/tasks/RemoteJob.go` — runner selection +- `services/tasks/TaskPool.go` — when remote jobs are created +- `db/Runner.go` — tag filter modes +- `api/runners.go`, `pro/api/projects/runners.go` — HTTP handlers