Skip to content

test-sustained-load: review threshold + admin-exempt auth credential #132

@brandonrc

Description

@brandonrc

Problem

`tests/stress/test-sustained-load.sh` fails deterministically on the v1.1.9-rc.1 release-gate runs:

Run Total reqs Successful Errors (4xx/5xx) Timeouts Rate
25265469440 10,174 4,705 5,469 0 53% (threshold 30%)
25265748133 7,371 4,552 2,819 0 38% (threshold 30%)

Zero timeouts in both runs. Backend stays responsive (recovery check passes in 10s after the load). Variance in throughput (169 -> 122 req/s) suggests ARC runner host load fluctuating between runs.

The "errors but no timeouts" pattern strongly indicates rate-limit hits (429) or DB-pool saturation (503), not capacity collapse.

Two candidate root causes

(A) Test isn't using the rate-limit-exempt admin

The backend now supports `RATE_LIMIT_EXEMPT_USERNAMES` (artifact-keeper#995). The default smoke admin user (`admin`) is configured as exempt in `helm/values-test.yaml`. Verify:

  1. Is `test-sustained-load.sh` authenticating as `admin` (or another exempt account)?
  2. If so, is the `X-RateLimit-Exempt: true` header present on its responses?
  3. If not, the test is a fair test of the rate limiter behavior under burst, but then the 30% threshold is unrealistic.

(B) Threshold is too aggressive for shared runner capacity

Even with rate-limit exemption, sustained 169 req/s × 5 workers across backend + postgres + meilisearch + scanners on a 4-CPU/8-Gi namespace quota is tight. ARC runner host CPU contention with other pods on rocky K8s adds variance.

Action items

  1. Inspect `tests/stress/test-sustained-load.sh` to confirm what credential it uses
  2. Capture a sample failed-response body (likely 429 with rate-limit headers)
  3. Decide: fix the test auth, raise the threshold, or both

Workaround in place

The release-gate workflow already has `continue-on-error: true` on `stress-tests` so this does not block v1.1.9-rc.1 tagging. See artifact-keeper-test#... for the parallel security-tests workaround.

Related

  • artifact-keeper#886 (v1.1.9 release coordination)
  • artifact-keeper#991 (v1.1.x auth-path perf investigation)
  • artifact-keeper#995 (RATE_LIMIT_EXEMPT_USERNAMES support)
  • artifact-keeper#1001 (companion Grype DB-seeding issue)

Metadata

Metadata

Assignees

No one assigned

    Labels

    v1.2.0Targeted for v1.2.0 release

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status
    Done

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions