Releases: idemerge/llm-api-bench
v2.15.3 — strict CI tag validation
Revert v2.15.2's CI dedup. Tag pushes now run the full quality job before docker.
Why
v2.15.2 skipped quality on tag pushes to avoid duplicate CI runs. That left a security gap: a tag pointing at an unvalidated commit (e.g. git tag v9.9.9 some-sha directly) would trigger a Docker push without going through type check / lint / tests.
Trade-off
Each release runs quality twice (~1m each) instead of once, but guarantees Docker images are only built from validated commits. Worth it.
No code changes. Same 906 tests.
🤖 Generated with Claude Code
v2.15.2 — fix CI lint + dedup CI runs
Patch release: fix CI lint failures from v2.15.1 + stop double-triggering CI on releases.
Fixed
- Backend lint: unused
DEFAULT_CONFIRM_DELAY_MSand test locals (mainStarted,warmupCount) flagged by ESLint - CI workflow:
qualityjob now skips on tag pushes. Releases previously fired CI twice (branch push + tag push for the same commit). Thedockerjob still triggers on tags
No code changes beyond lint cleanup. 906 tests, same as v2.15.0.
🤖 Generated with Claude Code
v2.15.1 — fix CI
Patch release to fix CI typecheck failure on v2.15.0.
supertest and @types/supertest were installed at the workspace root instead of backend/package.json. TypeScript resolved them locally via the parent node_modules, but GitHub Actions installs each sub-package independently, so the type check failed with Cannot find module 'supertest'.
No code changes. Same 906 tests as v2.15.0.
🤖 Generated with Claude Code
v2.15.0 — K-of-N alerts + 8 bug fixes + 906 tests
Reliability + correctness sweep on the monitor/alert pipeline, plus 8 frontend/backend bugs caught by a reverse-review of the test suite. Tests grew from ~148 → 906.
Changed
- K-of-N alert voting (default 4-of-5) replaces strict "N consecutive failures or one ok abandons cycle." A flaky upstream that returns one healthy response between failures no longer suppresses real outage alerts for 30+ minutes. New
alertConfirmFailThresholdconfig (range 1–alertConfirmCount). - Confirmation cycles exit early on both directions: alert fires when failCount reaches threshold, abandons when threshold becomes mathematically unreachable.
- Health-check probe timeout: 180s → 90s for monitor probes and "Test Connection". Playground/benchmark calls keep their longer timeouts.
- Confirmation probes within the same provider now run in parallel instead of serially.
Fixed
- Race in confirmation queue between
delete(key)andawait confirmProbe's re-add — duplicate parallel cycles. AddedinFlighttoken map + stale-token check. - Recovery alert no longer leaves a zombie down cycle in flight — explicitly cancels pending/in-flight confirmation on recovery.
- Webhook delivery failures now retry instead of silently consuming the alert —
sendFeishuAlertthrows on non-2xx; fire path re-queues the cycle instead of recordinglastAlertAt(which would suppress retries for 6h). useMonitor.saveConfigno longer "phantom-saves" on non-2xx — was reflecting saved state into UI even when server rejected.usePlaygroundHistory.deleteEntry/clearAllno longer "phantom-deletes" — checksres.okbefore mutating local state.useWorkflowmutations (cancel/delete/duplicate) surface server errors intostate.errorinstead of silently returning false.useBenchmarkrejects malformed responses — validates array shape beforesetBenchmarks(data)to prevent state pollution.PUT /api/monitor/targetsnow accepts[]— dropped.min(1)so users can clear the monitor list.startWorkflowcorrectly togglesisRunning— sets true at try-block start so the catch'ssetIsRunning(false)actually has work to do.providerStorerejects duplicate model id/name within a provider — collisions previously corrupted monitor target tracking.
Added
- Settings UI exposes the K threshold as a
K / Nselector that auto-adjusts options when N changes. - Comprehensive test coverage expansion: 906 total tests (712 backend + 194 frontend) covering alert state coordination, K-of-N decision math, multi-provider streaming token fields, route HTTP semantics via supertest, store CRUD with sqlite migrations, full
executeWorkflowintegration. CLAUDE.mdgains a "Writing tests" discipline section recording the meta-lesson: 8 of these fixes came from a reverse-review where tests had been silently rewritten to match buggy code. The Iron Rule: when a test fails, suspect the code first.
🤖 Generated with Claude Code
v2.14.0 — Configurable alert confirmation & reminder fixes
Added
- Configurable alert confirmation — number of consecutive failures (default 5, range 1-20) and delay between checks (default 1 min, range 1-60) before sending alerts, replacing the previous fixed single 1-minute re-check
- Monitor settings UI exposes confirm count and confirm delay alongside language and reminder interval
Fixed
- Alert reminder interval ignored — every save of monitor settings was wiping
last_alert_atbecausesetTargets/addTargetrebuilt the row without preserving the column, so reminders fired roughly every probe interval instead of every 6 hours - Status oscillation triggered spurious alerts —
wasDownnow treatsdownandvery_slowas the same down state, so flipping between them doesn't fire a new "down" alert - PUT
/api/monitor/configsilently droppedalertConfirmCountandalertConfirmDelayMinutesfrom the request body — UI changes were not persisted - Alert confirmation probe now records a ping on error (previously failed probes left no DB trace) and re-queues on transient failures instead of silently dropping the confirmation
Dev experience
- Backend dev watcher swapped from
tsx watchtonodemon --legacy-watchpolling —tsx watchwas missing source edits made by atomic-replace writes (inode changes), causing "the code didn't update" frustration - Frontend Vite watcher hardened with
usePollingfor parity
Full Changelog: v2.13.1...v2.14.0
v2.13.1
What's Changed
🔔 Alert Confirmation Check
- Down/reminder alerts now require a second probe after 1 minute to reduce false positives from transient failures
- Recovery alerts are still sent immediately without confirmation
🐳 Docker
docker-compose.ymlnow uses Docker Hub image (idemerge/llm-api-bench) instead of local build
🧹 Code Quality
- Removed 8 unused variables flagged by code quality analysis
Full Changelog: v2.13.0...v2.13.1
v2.13.0
What's New
🌐 Full i18n Support
- Chinese/English language switcher in sidebar and login page
- All hardcoded UI strings replaced with translation keys
- Language preference persisted in localStorage
🔔 Feishu Webhook Alerts
- Per-target alert enable/disable toggle
- Status change detection: new failure, repeated failure (configurable interval), recovery
- DB-persisted alert state — survives server restarts
- Optional webhook signature verification
- Configurable notification language (en/zh, default en)
- Alert bell indicator on monitor model cards (color-coded by health status)
📝 Monitoring Settings
- New alert configuration section: webhook URL, signing secret, language, reminder interval
Full Changelog: v2.12.0...v2.13.0
v2.12.1
Fixed
- Touch targets undersized: removed
size="small"from Settings buttons, increased model tag padding - Heading scale too flat: increased H1 from 20px to 24px
- Capability tags (T/S/V) nearly illegible: increased font from 8px to 10px with larger padding
- Mobile parameter labels overflow: responsive grid for Core Parameters section
- Playground history panel overlaps form on mobile: full-screen overlay on mobile
- Grammar: "1 models" now correctly pluralized across Monitor and History pages
- antd deprecation: replaced Alert
messageprop withtitle(5 instances) - History page duplicate heading: removed redundant H2 title
Full Changelog: v2.12.0...v2.12.1
v2.12.0
What's Changed
Added
- Naming validation for Provider name, Model ID, and DisplayName (backend schemas + frontend real-time hints)
- Provider name:
[a-zA-Z0-9][a-zA-Z0-9_-]{0,63}— no spaces - Model ID:
[a-zA-Z0-9][a-zA-Z0-9._/-]{0,63}— LiteLLMvendor/modelcompatible - DisplayName:
[a-zA-Z0-9][a-zA-Z0-9 ._-]{0,63}— human-readable
- Provider name:
- Frontend validation unit tests (16 cases) + backend boundary tests (4 cases)
Changed
- Project renamed from LLM API Radar to LLM API Bench (repo, UI, docs, Docker, CI)
- Playground history sidebar shows
ProviderName/DisplayNameinstead of raw model ID - Backend stores model
displayNamein playground history - Adaptive QuickButtons: auto-shrink when >7 options to prevent line wrapping
- Max concurrency raised to 5000, max iterations to 10M
Fixed
- Getting Started hint no longer flashes on page refresh
- Playground provider/model selectors no longer flash raw IDs before names load
- Playground history correctly resolves model displayName from provider data
- Quick Start
cdpath fixed in both READMEs
Tests
- 175 tests total (frontend 76 + backend 99), all passing
Full Changelog: v2.11.2...v2.12.0
v2.11.3
What's Changed
Changed
- Raised max concurrency from 1000 to 5000 (frontend, backend validation, route caps)
- Raised max iterations from 1M to 10M (frontend, backend validation, route caps)
- Added quick-select buttons for 2K/5K concurrency and 5M/10M iterations
- Fixed Quick Start instructions in both READMEs:
cd llm-benchmark→cd llm-api-radar - Updated README (EN/CN) with new concurrency/iterations limits
Tests
- Added boundary validation tests for concurrency (5000/5001) and iterations (10M/10M+1)
- All 155 tests passing (frontend 60, backend 95)
Full Changelog: v2.11.2...v2.11.3