feat: add automated competitive leaderboard system by byte-the-bot · Pull Request #61 · BattlesnakeOfficial/arena

byte-the-bot · 2026-02-21T12:32:55Z

Summary

Adds a complete leaderboard system with Weng-Lin skill-based ratings for automated Battlesnake competitions
Snakes opt-in to leaderboards, get matched automatically (~100 games/day via cron), and are ranked by conservative skill estimate (mu - 3*sigma)
Includes web UI (rankings, join/leave, placement section) and JSON API endpoints

Changes

Schema (4 new tables):

leaderboards — defines competition types (seeded with Standard 11x11)
leaderboard_entries — per-snake ratings (mu, sigma, display_score, games, wins, losses)
leaderboard_games — links games to leaderboards
leaderboard_game_results — audit trail of rating changes per game

New files:

server/src/models/leaderboard.rs — all model structs and DB queries (runtime sqlx, not macro)
server/src/routes/leaderboard.rs — web routes (list, detail, join, leave)
server/src/routes/api/leaderboards.rs — JSON API (list, rankings, opt-in/out)
server/src/leaderboard_matchmaker.rs — skill-band matchmaking with jitter + unit tests
server/src/leaderboard_ratings.rs — Weng-Lin multi-team rating updates via skillratings crate

Modified files:

server/Cargo.toml — added skillratings dependency
server/src/jobs.rs — LeaderboardMatchmakerJob + LeaderboardRatingUpdateJob
server/src/cron.rs — matchmaker runs every 15 minutes
server/src/game_runner.rs — triggers rating update after leaderboard game completion
server/src/routes.rs — leaderboard web + API routes registered, nav link added
server/src/main.rs — new modules declared

Notes

Uses runtime sqlx::query_as (not compile-time macros) since the new tables aren't in the offline query cache yet. The .sqlx/ cache will need updating after the migration runs against a real database.
Only public snakes can join leaderboards. Manual/custom games do not affect leaderboard ratings.
Minimum 10 games before a snake appears in rankings (shown in "Placement" section until then).

Test plan

cargo fmt — clean
cargo clippy — no warnings
cargo test --all-targets — all tests pass (23 total, 4 new matchmaker tests)
Deploy to staging, run migration, verify leaderboard seeded
Join a snake, verify matchmaker creates games on cron interval
Verify ratings update after game completion
Check leaderboard page renders rankings correctly

🤖 Generated with Claude Code

Implements a complete leaderboard system with Weng-Lin skill ratings for automated Battlesnake competitions. Schema: 4 new tables (leaderboards, leaderboard_entries, leaderboard_games, leaderboard_game_results) with the initial Standard 11x11 leaderboard seeded. Features: - Snake opt-in/out with pause/resume support (public snakes only) - Weng-Lin rating algorithm via skillratings crate (mu/sigma/display_score) - Skill-band matchmaking with jitter for variety (~100 games/day) - Automatic rating updates after game completion - Rankings page with placement section for new snakes - JSON API for leaderboard data (list, rankings, opt-in/out) - Leaderboards link added to home page navigation Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Fix race condition in rating updates: wrap all rating operations in a database transaction with FOR UPDATE row locking to prevent concurrent game completions from overwriting each other's ratings - Extract pure rating calculation into testable `calculate_rating_updates` function and add 7 unit tests covering winner/loser rating changes, sigma convergence, display score math, upset bonus, and ID preservation - Refactor `update_rating`, `create_game_result`, and new `get_entry_for_update` to accept generic sqlx executors - Add TODO for matchmaker atomicity (transaction around game creation + job enqueue) - Document RUNS_PER_DAY dependency on cron interval - Add TODO for switching to compile-time sqlx query macros Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Fix transactional atomicity in matchmaker: game creation, leaderboard linking, and enqueued_at are now wrapped in a single transaction to prevent zombie games. Job enqueue remains outside the transaction. - Refactor create_game_with_snakes to expose a _tx variant accepting &mut PgConnection for transaction composition. - Add set_game_enqueued_at_tx and update create_leaderboard_game to accept generic executor for use within transactions. - Add TODO for recently-matched deprioritization in select_match. - Add E2E tests for leaderboard UI pages (list, detail, join, pause, resume, placement) and API endpoints (list, rankings, opt-in, opt-out, validation). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

byte-the-bot

Review feedback assisted by the critical-code-reviewer skill.

Summary

A well-structured leaderboard feature — solid schema design, clean separation of rating computation from DB, good transaction discipline in the matchmaker, thorough e2e tests, and proper authorization checks throughout. The codebase shows thoughtfulness. That said, there's a duplicate rating application bug that will corrupt leaderboard data on job retries, a subtle integer math error that makes the GAMES_PER_DAY constant a lie, and several smaller issues worth addressing.

Critical Issues (Blocking)

1. No idempotency guard on rating updates — double-application corrupts scores

leaderboard_ratings.rs:update_ratings has no check for whether ratings were already applied for a given leaderboard_game_id. If LeaderboardRatingUpdateJob runs twice (job retry after timeout, duplicate enqueue, worker crash after commit but before job ACK), every snake's rating gets moved twice for a single game.

The leaderboard_game_results table lacks a UNIQUE constraint on (leaderboard_game_id, leaderboard_entry_id), so duplicate inserts succeed silently.

Fix: Either add UNIQUE (leaderboard_game_id, leaderboard_entry_id) to leaderboard_game_results and use ON CONFLICT DO NOTHING (letting the first run win), or check at the start of update_ratings whether results already exist for this game and bail early:

let existing = sqlx::query_scalar::<_, i64>(
    "SELECT COUNT(*) FROM leaderboard_game_results WHERE leaderboard_game_id = $1"
)
.bind(leaderboard_game_id)
.fetch_one(pool).await?;

if existing > 0 {
    tracing::info!("Ratings already applied for game {}, skipping", leaderboard_game_id);
    return Ok(());
}

The unique constraint approach is stronger (DB-level guarantee vs application-level check).

2. `RUNS_PER_DAY` / `GAMES_PER_DAY` integer division silently drops games

leaderboard_matchmaker.rs:28:

const RUNS_PER_DAY: i32 = 96;
// ...
let games_per_run = (GAMES_PER_DAY / RUNS_PER_DAY).max(1); // 100 / 96 = 1

Integer division: 100 / 96 = 1. You get 96 games/day, not 100. The GAMES_PER_DAY constant is a lie.

Worse: RUNS_PER_DAY is a manual constant with a comment saying "Must stay in sync with the cron interval in cron.rs." That's a synchronization bug waiting to happen — change the cron interval in one file and forget the other.

Fix: Either:

Derive RUNS_PER_DAY from the cron interval (make the interval a shared const)
Or switch to a "fractional accumulator" approach: track games created per day in the DB and check against the budget
At minimum, use f64 division and round: ((GAMES_PER_DAY as f64 / RUNS_PER_DAY as f64).ceil() as i32).max(1) → gives 2 games per run, overshooting to ~192/day. Neither integer floor nor ceil gives you exactly 100. The fractional accumulator approach is the real fix.

Required Changes

3. `unwrap_or_default()` swallows database errors in web route

routes/leaderboard.rs (show_leaderboard):

let user_snakes = if let Some(ref u) = user {
    battlesnake::get_battlesnakes_by_user_id(&state.db, u.user_id)
        .await
        .unwrap_or_default()  // DB error? Shrug, empty list.
} else {
    vec![]
};

Same pattern for user_entries. If the database is having issues, the user sees an empty "Your Snakes" section with no indication anything is wrong. They might think they have no snakes and re-create them, or think they left the leaderboard.

Fix: Propagate the error with .with_status(StatusCode::INTERNAL_SERVER_ERROR)?.

4. `update_rating` has duplicate SQL for win/loss

models/leaderboard.rs — update_rating has two nearly-identical SQL statements differing only in wins = wins + 1 vs losses = losses + 1. If you ever change the rating update logic, you need to remember to update both.

Fix: Single query:

UPDATE leaderboard_entries
SET mu = $2, sigma = $3, display_score = $4,
    games_played = games_played + 1,
    wins = wins + CASE WHEN $5 THEN 1 ELSE 0 END,
    losses = losses + CASE WHEN $5 THEN 0 ELSE 1 END
WHERE leaderboard_entry_id = $1

5. `set_game_enqueued_at_tx` duplicates `set_game_enqueued_at`

models/game.rs now has both set_game_enqueued_at (takes &PgPool) and set_game_enqueued_at_tx (takes &mut PgConnection). This pattern will proliferate — every function that needs to participate in a transaction gets a _tx twin.

The leaderboard.rs model already shows the better pattern: use E: sqlx::Executor<'e, Database = Postgres> as the parameter type. Both &PgPool and &mut PgConnection satisfy this trait.

Fix: Refactor set_game_enqueued_at to accept a generic executor, then delete the _tx variant. Same applies to create_game_with_snakes / create_game_with_snakes_tx long-term, though that's a larger refactor.

6. Missing index on `leaderboard_game_results.leaderboard_game_id`

The update_ratings function queries leaderboard_game_results by leaderboard_game_id (for the idempotency check you'll add). Even without that, any future query for "show me the rating history for this game" will need this index. FK columns without indexes are a common PostgreSQL oversight.

Fix: Add to the migration:

CREATE INDEX idx_leaderboard_game_results_game_id
    ON leaderboard_game_results (leaderboard_game_id);

7. Win/loss semantics are misleading

leaderboard_ratings.rs:77: is_win: *placement == 1

In a 4-player game, only 1st place is a "win" and 2nd/3rd/4th are all "losses." A snake that consistently finishes 2nd will show a 0% win rate despite being excellent. Users will see "50 games, 0 wins, 50 losses" for a snake that never finished last.

This is a design decision, but "losses" is the wrong word for 2nd place. Consider renaming to first_place_count / non_first_count, or tracking placements more granularly (count 1st/2nd/3rd/4th separately), or at least documenting this behavior in the UI so users aren't confused.

Suggestions

8. Unbounded ranking queries

get_ranked_entries and get_placement_entries have no LIMIT. Once a leaderboard has thousands of participants, these return everything. Add pagination or at least a reasonable LIMIT 100.

9. Extra query per non-leaderboard game

game_runner.rs now runs find_leaderboard_game_by_game_id after every game completion. For the vast majority of games (manual/custom), this is a wasted query returning None. It's not expensive (indexed UUID lookup), but it's worth noting that this adds latency to every game, not just leaderboard games.

10. Test determinism

select_match uses rand::thread_rng() with no seed, making the matchmaker tests non-deterministic. The property tests (correct size, unique snakes) are fine for this, but if you ever need to debug a matchmaking failure, you can't reproduce it. Consider accepting an &mut impl Rng parameter so tests can pass StdRng::seed_from_u64(42).

11. `TODO` in shipped code

leaderboard_matchmaker.rs:100-102:

TODO: Add recently-matched deprioritization to prevent the same group of snakes
from being matched repeatedly in low-volume periods.

This is shipping a known limitation. Fine for v1, but track it as a task so it doesn't get forgotten.

Verdict

Request Changes — The idempotency bug (#1) is a data corruption risk that needs to be fixed before merge. The integer division issue (#2) is less critical but produces incorrect behavior. The rest are quality improvements.

The overall architecture is clean — pure computation separated from DB, proper transaction discipline, good auth checks, solid e2e coverage. Once the blocking issues are resolved, this is a solid feature.

- Add idempotency guard: early return check + UNIQUE constraint on leaderboard_game_results(game_id, entry_id) + ON CONFLICT DO NOTHING - Fix integer division: use ceiling division with shared MATCHMAKER_INTERVAL_SECS constant derived from cron.rs (eliminates manual sync bug) - Propagate DB errors instead of swallowing with unwrap_or_default in web routes - Merge duplicate win/loss SQL into single UPDATE with CASE expression - Rename wins/losses to first_place_finishes/non_first_finishes for accurate semantics in 4-player games where 2nd place isn't a "loss" - Add missing index on leaderboard_game_results.leaderboard_game_id - Add LIMIT 100 to unbounded ranking/placement queries - Accept seeded RNG in select_match for test determinism Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

getByText(snakeName) matched both a <span> in the form section and a <td> in the entries table, causing Playwright strict mode violations. Use getByRole('cell', { name }) to target the table cell specifically. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

getByText('10') was matching both the "Minimum: 10 games" paragraph and the <td>10</td> cell, causing a strict mode violation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

byte-the-bot and others added 3 commits February 21, 2026 07:32

byte-the-bot commented Feb 21, 2026

View reviewed changes

byte-the-bot and others added 3 commits February 21, 2026 17:13

Fix e2e test: use specific cell locator for games remaining

7834084

getByText('10') was matching both the "Minimum: 10 games" paragraph and the <td>10</td> cell, causing a strict mode violation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

feat: add automated competitive leaderboard system#61

feat: add automated competitive leaderboard system#61
byte-the-bot wants to merge 6 commits intomainfrom
implement/BS-009

byte-the-bot commented Feb 21, 2026

Uh oh!

byte-the-bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

byte-the-bot commented Feb 21, 2026

Summary

Changes

Notes

Test plan

Uh oh!

byte-the-bot left a comment

Choose a reason for hiding this comment

Summary

Critical Issues (Blocking)

1. No idempotency guard on rating updates — double-application corrupts scores

2. RUNS_PER_DAY / GAMES_PER_DAY integer division silently drops games

Required Changes

3. unwrap_or_default() swallows database errors in web route

4. update_rating has duplicate SQL for win/loss

5. set_game_enqueued_at_tx duplicates set_game_enqueued_at

6. Missing index on leaderboard_game_results.leaderboard_game_id

7. Win/loss semantics are misleading

Suggestions

8. Unbounded ranking queries

9. Extra query per non-leaderboard game

10. Test determinism

11. TODO in shipped code

Verdict

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

2. `RUNS_PER_DAY` / `GAMES_PER_DAY` integer division silently drops games

3. `unwrap_or_default()` swallows database errors in web route

4. `update_rating` has duplicate SQL for win/loss

5. `set_game_enqueued_at_tx` duplicates `set_game_enqueued_at`

6. Missing index on `leaderboard_game_results.leaderboard_game_id`

11. `TODO` in shipped code