Releases · idemerge/llm-api-bench

15 May 06:39

idemerge

v2.15.3

c83e2d7

v2.15.3 — strict CI tag validation Latest

Latest

Revert v2.15.2's CI dedup. Tag pushes now run the full quality job before docker.

Why

v2.15.2 skipped quality on tag pushes to avoid duplicate CI runs. That left a security gap: a tag pointing at an unvalidated commit (e.g. git tag v9.9.9 some-sha directly) would trigger a Docker push without going through type check / lint / tests.

Trade-off

Each release runs quality twice (~1m each) instead of once, but guarantees Docker images are only built from validated commits. Worth it.

No code changes. Same 906 tests.

🤖 Generated with Claude Code

Assets 2

15 May 06:29

idemerge

v2.15.2

fccabed

v2.15.2 — fix CI lint + dedup CI runs

Patch release: fix CI lint failures from v2.15.1 + stop double-triggering CI on releases.

Fixed

Backend lint: unused DEFAULT_CONFIRM_DELAY_MS and test locals (mainStarted, warmupCount) flagged by ESLint
CI workflow: quality job now skips on tag pushes. Releases previously fired CI twice (branch push + tag push for the same commit). The docker job still triggers on tags

No code changes beyond lint cleanup. 906 tests, same as v2.15.0.

🤖 Generated with Claude Code

Assets 2

15 May 06:24

idemerge

v2.15.1

104ea6c

v2.15.1 — fix CI

Patch release to fix CI typecheck failure on v2.15.0.

supertest and @types/supertest were installed at the workspace root instead of backend/package.json. TypeScript resolved them locally via the parent node_modules, but GitHub Actions installs each sub-package independently, so the type check failed with Cannot find module 'supertest'.

No code changes. Same 906 tests as v2.15.0.

🤖 Generated with Claude Code

Assets 2

15 May 06:21

idemerge

v2.15.0

cac8d8d

v2.15.0 — K-of-N alerts + 8 bug fixes + 906 tests

Reliability + correctness sweep on the monitor/alert pipeline, plus 8 frontend/backend bugs caught by a reverse-review of the test suite. Tests grew from ~148 → 906.

Changed

K-of-N alert voting (default 4-of-5) replaces strict "N consecutive failures or one ok abandons cycle." A flaky upstream that returns one healthy response between failures no longer suppresses real outage alerts for 30+ minutes. New alertConfirmFailThreshold config (range 1–alertConfirmCount).
Confirmation cycles exit early on both directions: alert fires when failCount reaches threshold, abandons when threshold becomes mathematically unreachable.
Health-check probe timeout: 180s → 90s for monitor probes and "Test Connection". Playground/benchmark calls keep their longer timeouts.
Confirmation probes within the same provider now run in parallel instead of serially.

Fixed

Race in confirmation queue between delete(key) and await confirmProbe's re-add — duplicate parallel cycles. Added inFlight token map + stale-token check.
Recovery alert no longer leaves a zombie down cycle in flight — explicitly cancels pending/in-flight confirmation on recovery.
Webhook delivery failures now retry instead of silently consuming the alert — sendFeishuAlert throws on non-2xx; fire path re-queues the cycle instead of recording lastAlertAt (which would suppress retries for 6h).
useMonitor.saveConfig no longer "phantom-saves" on non-2xx — was reflecting saved state into UI even when server rejected.
usePlaygroundHistory.deleteEntry / clearAll no longer "phantom-deletes" — checks res.ok before mutating local state.
useWorkflow mutations (cancel/delete/duplicate) surface server errors into state.error instead of silently returning false.
useBenchmark rejects malformed responses — validates array shape before setBenchmarks(data) to prevent state pollution.
PUT /api/monitor/targets now accepts [] — dropped .min(1) so users can clear the monitor list.
startWorkflow correctly toggles isRunning — sets true at try-block start so the catch's setIsRunning(false) actually has work to do.
providerStore rejects duplicate model id/name within a provider — collisions previously corrupted monitor target tracking.

Added

Settings UI exposes the K threshold as a K / N selector that auto-adjusts options when N changes.
Comprehensive test coverage expansion: 906 total tests (712 backend + 194 frontend) covering alert state coordination, K-of-N decision math, multi-provider streaming token fields, route HTTP semantics via supertest, store CRUD with sqlite migrations, full executeWorkflow integration.
CLAUDE.md gains a "Writing tests" discipline section recording the meta-lesson: 8 of these fixes came from a reverse-review where tests had been silently rewritten to match buggy code. The Iron Rule: when a test fails, suspect the code first.

🤖 Generated with Claude Code

Assets 2

13 May 15:57

idemerge

v2.14.0

01acbd6

v2.14.0 — Configurable alert confirmation & reminder fixes

Added

Configurable alert confirmation — number of consecutive failures (default 5, range 1-20) and delay between checks (default 1 min, range 1-60) before sending alerts, replacing the previous fixed single 1-minute re-check
Monitor settings UI exposes confirm count and confirm delay alongside language and reminder interval

Fixed

Alert reminder interval ignored — every save of monitor settings was wiping last_alert_at because setTargets/addTarget rebuilt the row without preserving the column, so reminders fired roughly every probe interval instead of every 6 hours
Status oscillation triggered spurious alerts — wasDown now treats down and very_slow as the same down state, so flipping between them doesn't fire a new "down" alert
PUT /api/monitor/config silently dropped alertConfirmCount and alertConfirmDelayMinutes from the request body — UI changes were not persisted
Alert confirmation probe now records a ping on error (previously failed probes left no DB trace) and re-queues on transient failures instead of silently dropping the confirmation

Dev experience

Backend dev watcher swapped from tsx watch to nodemon --legacy-watch polling — tsx watch was missing source edits made by atomic-replace writes (inode changes), causing "the code didn't update" frustration
Frontend Vite watcher hardened with usePolling for parity

Full Changelog: v2.13.1...v2.14.0

Assets 2

11 May 13:28

idemerge

v2.13.1

6136eeb

v2.13.1

What's Changed

🔔 Alert Confirmation Check

Down/reminder alerts now require a second probe after 1 minute to reduce false positives from transient failures
Recovery alerts are still sent immediately without confirmation

🐳 Docker

docker-compose.yml now uses Docker Hub image (idemerge/llm-api-bench) instead of local build

🧹 Code Quality

Removed 8 unused variables flagged by code quality analysis

Full Changelog: v2.13.0...v2.13.1

Assets 2

11 May 11:43

idemerge

v2.13.0

0bd1cfa

v2.13.0

What's New

🌐 Full i18n Support

Chinese/English language switcher in sidebar and login page
All hardcoded UI strings replaced with translation keys
Language preference persisted in localStorage

🔔 Feishu Webhook Alerts

Per-target alert enable/disable toggle
Status change detection: new failure, repeated failure (configurable interval), recovery
DB-persisted alert state — survives server restarts
Optional webhook signature verification
Configurable notification language (en/zh, default en)
Alert bell indicator on monitor model cards (color-coded by health status)

📝 Monitoring Settings

New alert configuration section: webhook URL, signing secret, language, reminder interval

Full Changelog: v2.12.0...v2.13.0

Assets 2

28 Apr 08:34

idemerge

v2.12.1

c2b443c

v2.12.1

Fixed

Touch targets undersized: removed size="small" from Settings buttons, increased model tag padding
Heading scale too flat: increased H1 from 20px to 24px
Capability tags (T/S/V) nearly illegible: increased font from 8px to 10px with larger padding
Mobile parameter labels overflow: responsive grid for Core Parameters section
Playground history panel overlaps form on mobile: full-screen overlay on mobile
Grammar: "1 models" now correctly pluralized across Monitor and History pages
antd deprecation: replaced Alert message prop with title (5 instances)
History page duplicate heading: removed redundant H2 title

Full Changelog: v2.12.0...v2.12.1

Assets 2

28 Apr 06:08

idemerge

v2.12.0

62e36d8

v2.12.0

What's Changed

Added

Naming validation for Provider name, Model ID, and DisplayName (backend schemas + frontend real-time hints)
- Provider name: [a-zA-Z0-9][a-zA-Z0-9_-]{0,63} — no spaces
- Model ID: [a-zA-Z0-9][a-zA-Z0-9._/-]{0,63} — LiteLLM vendor/model compatible
- DisplayName: [a-zA-Z0-9][a-zA-Z0-9 ._-]{0,63} — human-readable
Frontend validation unit tests (16 cases) + backend boundary tests (4 cases)

Changed

Project renamed from LLM API Radar to LLM API Bench (repo, UI, docs, Docker, CI)
Playground history sidebar shows ProviderName/DisplayName instead of raw model ID
Backend stores model displayName in playground history
Adaptive QuickButtons: auto-shrink when >7 options to prevent line wrapping
Max concurrency raised to 5000, max iterations to 10M

Fixed

Getting Started hint no longer flashes on page refresh
Playground provider/model selectors no longer flash raw IDs before names load
Playground history correctly resolves model displayName from provider data
Quick Start cd path fixed in both READMEs

Tests

175 tests total (frontend 76 + backend 99), all passing

Full Changelog: v2.11.2...v2.12.0

Assets 2

28 Apr 03:53

idemerge

v2.11.3

0b83465

v2.11.3

What's Changed

Changed

Raised max concurrency from 1000 to 5000 (frontend, backend validation, route caps)
Raised max iterations from 1M to 10M (frontend, backend validation, route caps)
Added quick-select buttons for 2K/5K concurrency and 5M/10M iterations
Fixed Quick Start instructions in both READMEs: cd llm-benchmark → cd llm-api-radar
Updated README (EN/CN) with new concurrency/iterations limits

Tests

Added boundary validation tests for concurrency (5000/5001) and iterations (10M/10M+1)
All 155 tests passing (frontend 60, backend 95)

Full Changelog: v2.11.2...v2.11.3

Assets 2

Releases: idemerge/llm-api-bench

v2.15.3 — strict CI tag validation

Why

Trade-off

Uh oh!

v2.15.2 — fix CI lint + dedup CI runs

Fixed

Uh oh!

v2.15.1 — fix CI

Uh oh!

v2.15.0 — K-of-N alerts + 8 bug fixes + 906 tests

Changed

Fixed

Added

Uh oh!

v2.14.0 — Configurable alert confirmation & reminder fixes

Added

Fixed

Dev experience

Uh oh!

v2.13.1

What's Changed

🔔 Alert Confirmation Check

🐳 Docker

🧹 Code Quality

Uh oh!

v2.13.0

What's New

🌐 Full i18n Support

🔔 Feishu Webhook Alerts

📝 Monitoring Settings

Uh oh!

v2.12.1

Fixed

Uh oh!

v2.12.0

What's Changed

Added

Changed

Fixed

Tests

Uh oh!

v2.11.3

What's Changed

Changed

Tests

Uh oh!