Skip to content

Add API benchmark suite#1966

Open
keilogic wants to merge 1 commit into
SecureBananaLabs:mainfrom
keilogic:codex/api-benchmark-suite-30
Open

Add API benchmark suite#1966
keilogic wants to merge 1 commit into
SecureBananaLabs:mainfrom
keilogic:codex/api-benchmark-suite-30

Conversation

@keilogic
Copy link
Copy Markdown

/claim #30

Summary

  • Added a dependency-light Node API benchmark runner under benchmarks/.
  • Covers /health plus every mounted /api/* route: auth, users, jobs, proposals, payments, reviews, messages, notifications, uploads, search, and admin metrics.
  • Captures p50/p95/p99 latency, TTFB, sustained and peak RPS, status counts, and error rate.
  • Writes reproducible JSON and Markdown reports to benchmarks/results/.
  • Added reviewable thresholds, npm run benchmark, npm run benchmark:smoke, .env.benchmark.example, and a GitHub Actions smoke workflow.
  • Included a short demo video at demos/api-benchmark-suite-30.mp4.

Local Benchmark Result

  • Mode: full local loopback, in-process API server
  • Endpoints: 21
  • Samples: 126
  • Aggregate RPS: 42
  • Error rate: 0%
  • Thresholds: passed

Validation

  • npm run benchmark:smoke -> 21 endpoints, 42 samples, thresholds passed
  • BENCHMARK_DURATION_SECONDS=3 BENCHMARK_CONCURRENCY=4 BENCHMARK_WARMUP_REQUESTS=1 BENCHMARK_MAX_REQUESTS_PER_ENDPOINT=6 npm run benchmark -> 21 endpoints, 126 samples, thresholds passed
  • node --test apps/api/src/tests/health.test.js -> 1 passed
  • ffprobe -v error -select_streams v:0 -show_entries stream=codec_name,width,height,duration -of default=noprint_wrappers=1 demos/api-benchmark-suite-30.mp4 -> h264, 1280x720, 8s
  • git diff --cached --check -> passed before commit

Benchmark Environment

Hardware

  • CPU model & core count: AMD Ryzen 7 5800X, 8 cores / 16 logical processors
  • RAM (total & available during benchmark): 17.1 GB total; about 6.9 GB free before implementation checks
  • Storage type (SSD / NVMe / HDD): local SSD/NVMe-class Windows workstation storage
  • Network interface (Ethernet / WiFi / loopback): loopback for local benchmark runs
  • Machine type (local workstation / cloud VM / CI runner -- include instance type if cloud): local workstation
  • OS & version: Microsoft Windows 11 Pro 10.0.26200

Runtime

  • Node.js version (or relevant runtime): Node.js v24.15.0, npm 11.15.0
  • Any resource limits applied (Docker memory cap, cgroup limits, etc.): none intentionally applied
  • Other significant processes running during benchmark (yes / no -- if yes, describe): yes, normal Codex desktop and shell activity

If submitted by or with an AI agent

  • Agent or tool name (e.g. Claude Code, Devin, Copilot Workspace, AutoGPT): OpenAI Codex
  • Underlying model and version (e.g. claude-sonnet-4-5, gpt-4o -- if known): GPT-5 based Codex coding agent
  • Inference provider (e.g. Anthropic, OpenAI, Azure, self-hosted): OpenAI
  • Orchestration framework if any (e.g. LangChain, AutoGen, custom): none beyond Codex shell/GitHub tooling
  • Execution mode (fully autonomous / human-supervised / human-initiated per step): human-initiated, agent-executed implementation
  • Did the agent have shell/tool access during execution (yes / no): yes
  • Did the agent have internet access during execution (yes / no): yes, for GitHub issue/PR checks
  • Were benchmark commands run by the agent directly or handed off to the human to run: run directly by the agent locally
  • Any known agent constraints or sandboxing that may have affected execution: no production secrets or staging targets; validation used local loopback and synthetic benchmark data

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant