feat: add automated competitive leaderboard system#61
feat: add automated competitive leaderboard system#61byte-the-bot wants to merge 6 commits intomainfrom
Conversation
Implements a complete leaderboard system with Weng-Lin skill ratings for automated Battlesnake competitions. Schema: 4 new tables (leaderboards, leaderboard_entries, leaderboard_games, leaderboard_game_results) with the initial Standard 11x11 leaderboard seeded. Features: - Snake opt-in/out with pause/resume support (public snakes only) - Weng-Lin rating algorithm via skillratings crate (mu/sigma/display_score) - Skill-band matchmaking with jitter for variety (~100 games/day) - Automatic rating updates after game completion - Rankings page with placement section for new snakes - JSON API for leaderboard data (list, rankings, opt-in/out) - Leaderboards link added to home page navigation Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix race condition in rating updates: wrap all rating operations in a database transaction with FOR UPDATE row locking to prevent concurrent game completions from overwriting each other's ratings - Extract pure rating calculation into testable `calculate_rating_updates` function and add 7 unit tests covering winner/loser rating changes, sigma convergence, display score math, upset bonus, and ID preservation - Refactor `update_rating`, `create_game_result`, and new `get_entry_for_update` to accept generic sqlx executors - Add TODO for matchmaker atomicity (transaction around game creation + job enqueue) - Document RUNS_PER_DAY dependency on cron interval - Add TODO for switching to compile-time sqlx query macros Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix transactional atomicity in matchmaker: game creation, leaderboard linking, and enqueued_at are now wrapped in a single transaction to prevent zombie games. Job enqueue remains outside the transaction. - Refactor create_game_with_snakes to expose a _tx variant accepting &mut PgConnection for transaction composition. - Add set_game_enqueued_at_tx and update create_leaderboard_game to accept generic executor for use within transactions. - Add TODO for recently-matched deprioritization in select_match. - Add E2E tests for leaderboard UI pages (list, detail, join, pause, resume, placement) and API endpoints (list, rankings, opt-in, opt-out, validation). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
byte-the-bot
left a comment
There was a problem hiding this comment.
Review feedback assisted by the critical-code-reviewer skill.
Summary
A well-structured leaderboard feature — solid schema design, clean separation of rating computation from DB, good transaction discipline in the matchmaker, thorough e2e tests, and proper authorization checks throughout. The codebase shows thoughtfulness. That said, there's a duplicate rating application bug that will corrupt leaderboard data on job retries, a subtle integer math error that makes the GAMES_PER_DAY constant a lie, and several smaller issues worth addressing.
Critical Issues (Blocking)
1. No idempotency guard on rating updates — double-application corrupts scores
leaderboard_ratings.rs:update_ratings has no check for whether ratings were already applied for a given leaderboard_game_id. If LeaderboardRatingUpdateJob runs twice (job retry after timeout, duplicate enqueue, worker crash after commit but before job ACK), every snake's rating gets moved twice for a single game.
The leaderboard_game_results table lacks a UNIQUE constraint on (leaderboard_game_id, leaderboard_entry_id), so duplicate inserts succeed silently.
Fix: Either add UNIQUE (leaderboard_game_id, leaderboard_entry_id) to leaderboard_game_results and use ON CONFLICT DO NOTHING (letting the first run win), or check at the start of update_ratings whether results already exist for this game and bail early:
let existing = sqlx::query_scalar::<_, i64>(
"SELECT COUNT(*) FROM leaderboard_game_results WHERE leaderboard_game_id = $1"
)
.bind(leaderboard_game_id)
.fetch_one(pool).await?;
if existing > 0 {
tracing::info!("Ratings already applied for game {}, skipping", leaderboard_game_id);
return Ok(());
}The unique constraint approach is stronger (DB-level guarantee vs application-level check).
2. RUNS_PER_DAY / GAMES_PER_DAY integer division silently drops games
leaderboard_matchmaker.rs:28:
const RUNS_PER_DAY: i32 = 96;
// ...
let games_per_run = (GAMES_PER_DAY / RUNS_PER_DAY).max(1); // 100 / 96 = 1Integer division: 100 / 96 = 1. You get 96 games/day, not 100. The GAMES_PER_DAY constant is a lie.
Worse: RUNS_PER_DAY is a manual constant with a comment saying "Must stay in sync with the cron interval in cron.rs." That's a synchronization bug waiting to happen — change the cron interval in one file and forget the other.
Fix: Either:
- Derive
RUNS_PER_DAYfrom the cron interval (make the interval a shared const) - Or switch to a "fractional accumulator" approach: track games created per day in the DB and check against the budget
- At minimum, use
f64division and round:((GAMES_PER_DAY as f64 / RUNS_PER_DAY as f64).ceil() as i32).max(1)→ gives 2 games per run, overshooting to ~192/day. Neither integer floor nor ceil gives you exactly 100. The fractional accumulator approach is the real fix.
Required Changes
3. unwrap_or_default() swallows database errors in web route
routes/leaderboard.rs (show_leaderboard):
let user_snakes = if let Some(ref u) = user {
battlesnake::get_battlesnakes_by_user_id(&state.db, u.user_id)
.await
.unwrap_or_default() // DB error? Shrug, empty list.
} else {
vec![]
};Same pattern for user_entries. If the database is having issues, the user sees an empty "Your Snakes" section with no indication anything is wrong. They might think they have no snakes and re-create them, or think they left the leaderboard.
Fix: Propagate the error with .with_status(StatusCode::INTERNAL_SERVER_ERROR)?.
4. update_rating has duplicate SQL for win/loss
models/leaderboard.rs — update_rating has two nearly-identical SQL statements differing only in wins = wins + 1 vs losses = losses + 1. If you ever change the rating update logic, you need to remember to update both.
Fix: Single query:
UPDATE leaderboard_entries
SET mu = $2, sigma = $3, display_score = $4,
games_played = games_played + 1,
wins = wins + CASE WHEN $5 THEN 1 ELSE 0 END,
losses = losses + CASE WHEN $5 THEN 0 ELSE 1 END
WHERE leaderboard_entry_id = $15. set_game_enqueued_at_tx duplicates set_game_enqueued_at
models/game.rs now has both set_game_enqueued_at (takes &PgPool) and set_game_enqueued_at_tx (takes &mut PgConnection). This pattern will proliferate — every function that needs to participate in a transaction gets a _tx twin.
The leaderboard.rs model already shows the better pattern: use E: sqlx::Executor<'e, Database = Postgres> as the parameter type. Both &PgPool and &mut PgConnection satisfy this trait.
Fix: Refactor set_game_enqueued_at to accept a generic executor, then delete the _tx variant. Same applies to create_game_with_snakes / create_game_with_snakes_tx long-term, though that's a larger refactor.
6. Missing index on leaderboard_game_results.leaderboard_game_id
The update_ratings function queries leaderboard_game_results by leaderboard_game_id (for the idempotency check you'll add). Even without that, any future query for "show me the rating history for this game" will need this index. FK columns without indexes are a common PostgreSQL oversight.
Fix: Add to the migration:
CREATE INDEX idx_leaderboard_game_results_game_id
ON leaderboard_game_results (leaderboard_game_id);7. Win/loss semantics are misleading
leaderboard_ratings.rs:77: is_win: *placement == 1
In a 4-player game, only 1st place is a "win" and 2nd/3rd/4th are all "losses." A snake that consistently finishes 2nd will show a 0% win rate despite being excellent. Users will see "50 games, 0 wins, 50 losses" for a snake that never finished last.
This is a design decision, but "losses" is the wrong word for 2nd place. Consider renaming to first_place_count / non_first_count, or tracking placements more granularly (count 1st/2nd/3rd/4th separately), or at least documenting this behavior in the UI so users aren't confused.
Suggestions
8. Unbounded ranking queries
get_ranked_entries and get_placement_entries have no LIMIT. Once a leaderboard has thousands of participants, these return everything. Add pagination or at least a reasonable LIMIT 100.
9. Extra query per non-leaderboard game
game_runner.rs now runs find_leaderboard_game_by_game_id after every game completion. For the vast majority of games (manual/custom), this is a wasted query returning None. It's not expensive (indexed UUID lookup), but it's worth noting that this adds latency to every game, not just leaderboard games.
10. Test determinism
select_match uses rand::thread_rng() with no seed, making the matchmaker tests non-deterministic. The property tests (correct size, unique snakes) are fine for this, but if you ever need to debug a matchmaking failure, you can't reproduce it. Consider accepting an &mut impl Rng parameter so tests can pass StdRng::seed_from_u64(42).
11. TODO in shipped code
leaderboard_matchmaker.rs:100-102:
TODO: Add recently-matched deprioritization to prevent the same group of snakes
from being matched repeatedly in low-volume periods.
This is shipping a known limitation. Fine for v1, but track it as a task so it doesn't get forgotten.
Verdict
Request Changes — The idempotency bug (#1) is a data corruption risk that needs to be fixed before merge. The integer division issue (#2) is less critical but produces incorrect behavior. The rest are quality improvements.
The overall architecture is clean — pure computation separated from DB, proper transaction discipline, good auth checks, solid e2e coverage. Once the blocking issues are resolved, this is a solid feature.
- Add idempotency guard: early return check + UNIQUE constraint on leaderboard_game_results(game_id, entry_id) + ON CONFLICT DO NOTHING - Fix integer division: use ceiling division with shared MATCHMAKER_INTERVAL_SECS constant derived from cron.rs (eliminates manual sync bug) - Propagate DB errors instead of swallowing with unwrap_or_default in web routes - Merge duplicate win/loss SQL into single UPDATE with CASE expression - Rename wins/losses to first_place_finishes/non_first_finishes for accurate semantics in 4-player games where 2nd place isn't a "loss" - Add missing index on leaderboard_game_results.leaderboard_game_id - Add LIMIT 100 to unbounded ranking/placement queries - Accept seeded RNG in select_match for test determinism Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
getByText(snakeName) matched both a <span> in the form section and
a <td> in the entries table, causing Playwright strict mode violations.
Use getByRole('cell', { name }) to target the table cell specifically.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
getByText('10') was matching both the "Minimum: 10 games" paragraph
and the <td>10</td> cell, causing a strict mode violation.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Summary
Changes
Schema (4 new tables):
leaderboards— defines competition types (seeded with Standard 11x11)leaderboard_entries— per-snake ratings (mu, sigma, display_score, games, wins, losses)leaderboard_games— links games to leaderboardsleaderboard_game_results— audit trail of rating changes per gameNew files:
server/src/models/leaderboard.rs— all model structs and DB queries (runtime sqlx, not macro)server/src/routes/leaderboard.rs— web routes (list, detail, join, leave)server/src/routes/api/leaderboards.rs— JSON API (list, rankings, opt-in/out)server/src/leaderboard_matchmaker.rs— skill-band matchmaking with jitter + unit testsserver/src/leaderboard_ratings.rs— Weng-Lin multi-team rating updates viaskillratingscrateModified files:
server/Cargo.toml— addedskillratingsdependencyserver/src/jobs.rs—LeaderboardMatchmakerJob+LeaderboardRatingUpdateJobserver/src/cron.rs— matchmaker runs every 15 minutesserver/src/game_runner.rs— triggers rating update after leaderboard game completionserver/src/routes.rs— leaderboard web + API routes registered, nav link addedserver/src/main.rs— new modules declaredNotes
sqlx::query_as(not compile-time macros) since the new tables aren't in the offline query cache yet. The.sqlx/cache will need updating after the migration runs against a real database.Test plan
cargo fmt— cleancargo clippy— no warningscargo test --all-targets— all tests pass (23 total, 4 new matchmaker tests)🤖 Generated with Claude Code