Skip to content

ci: shard e2e tests with nextest --partition hash for ~3x speedup #832

@Evrard-Nil

Description

@Evrard-Nil

Follow-up to #831 (test workflow speedup).

Context

PR #831 split the test workflow into parallel jobs and got e2e down to ~6m. The e2e suite has 608 tests in a single binary (tests/e2e_all/). The next big win is sharding that binary across N parallel CI jobs using nextest's --partition hash i/N.

Proposal

Add a matrix to the e2e-test job:

strategy:
  fail-fast: false
  matrix:
    shard: [1, 2, 3]
steps:
  - run: cargo nextest run --test e2e_all --partition hash ${{ matrix.shard }}/3

Safety

The shared-DB isolation (UUID-scoped orgs/workspaces + advisory-locked bootstrap in crates/api/tests/common/db_setup.rs) is already safe across parallel processes. --partition hash is deterministic (same test → same shard), so no cross-shard collision. Each shard gets its own PostgreSQL service container.

Consideration

This runs 3 e2e jobs in parallel on the same self-hosted runner (gpu11), which shares with prod. If CPU contention with model-serving is visible, this should land together with a dedicated CI VM (see nearai/infra issue). Alternatively, start with 2-way sharding.

Expected impact

~3× e2e wall-clock reduction (5m58s → ~2-2.5m), bounded by the slowest shard.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions