Skip to content

Add on-call rotation feature#1

Merged
ewels merged 3 commits intomainfrom
feature/on-call
Mar 19, 2026
Merged

Add on-call rotation feature#1
ewels merged 3 commits intomainfrom
feature/on-call

Conversation

@ewels
Copy link
Member

@ewels ewels commented Mar 19, 2026

Summary

  • Adds a complete on-call rotation system managed via /nf-core on-call slash commands, backed by the existing DynamoDB table
  • All on-call commands are restricted to @core-team members (permission checked centrally in the router)
  • Includes a background async scheduler that auto-extends the 8-week rolling roster, sends reminders (assignment, 1-week-before, daily at 8am user timezone), and posts Monday announcements to #core
  • Round-robin assignment uses @core-team usergroup membership — no separate member list to maintain
  • Fixes pre-existing mypy error in admin.py:669 that was failing CI on main

Commands

All commands require @core-team membership.

Command Description
on-call list Show the upcoming ~8 weeks of assignments
on-call me Show the caller's upcoming on-call dates
on-call switch [YYYY-MM-DD] Swap on-call weeks with another person, DMs both parties
on-call skip Skip next on-call week, assigns a replacement via round-robin
on-call unavailable <start> <end> Mark unavailable for a date range, auto-skips overlapping weeks
on-call reboot Wipe and rebuild the entire schedule from scratch

New files

  • src/nf_core_bot/db/oncall.py — DynamoDB CRUD (roster, round-robin state, unavailability, reminder tracking)
  • src/nf_core_bot/commands/oncall/ — 6 command handlers + shared helpers
  • src/nf_core_bot/scheduler/oncall_jobs.py — Background scheduler (roster extension, reminders, cleanup)
  • tests/test_oncall_db.py, tests/test_oncall_commands.py, tests/test_oncall_scheduler.py — 84 new tests

Modified files

File Change
app.py Launches scheduler as background task (with strong reference to prevent GC)
router.py Routes on-call subcommands; ack() + is_core_team() checked centrally before dispatch
help.py On-call help text, all commands marked admin-only
admin.py Fix pre-existing mypy error on line 669 (typed org_lists slice)
CLAUDE.md Updated with on-call architecture, DynamoDB key patterns, and command docs

DynamoDB key patterns

PK SK Purpose
ONCALL#<YYYY-MM-DD> ROSTER Weekly roster entry
ONCALL_META ROUND_ROBIN Round-robin state
ONCALL_META REMINDERS#<YYYY-MM-DD> Reminder tracking per week
ONCALL_UNAVAIL#<user_id> <start>#<end> Unavailability range

Architecture notes

  • Permission model: All on-call commands gated by @core-team membership check in _route_oncall(). ack() called once centrally; handlers do not receive ack.
  • Scheduler: Background asyncio task (60s loop). Monday jobs (roster extension, cleanup) fire once per week via _last_weekly_run guard. Daily reminders respect user Slack timezone.
  • Round-robin: Picks person with oldest last_assigned date. queue_front list gives priority to people who previously skipped. When all members exhausted, cycles through again.
  • Reboot: Deletes all roster entries + resets round-robin state, then rebuilds. Useful for initial deployment or when core team membership changes.

Quality

  • 356/356 tests pass (84 new + 272 existing)
  • Ruff: clean
  • Mypy: 0 errors (including fix for pre-existing admin.py issue)

ewels and others added 3 commits March 19, 2026 13:19
Introduces on-call rotation management for @core-team members:
- Slash commands: list, me, switch, skip, unavailable (via /nf-core on-call)
- Background scheduler: auto-extends roster, sends reminders (assignment,
  week-before, daily at user's local 8am), posts #core announcements
- Round-robin assignment with skip queue priority and unavailability tracking
- DynamoDB operations for roster, round-robin state, unavailability, reminders

Includes simplification pass: deduplicated ddb_table test fixture into
conftest.py, extracted add_to_queue_front helper, parallelized independent
API/DB calls, added date pre-filtering in reminder loop, cached tz lookups.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- All on-call commands now require @core-team membership (checked
  centrally in _route_oncall before dispatching)
- ack() called once in the router; individual handlers no longer
  receive or call ack
- New 'on-call reboot' command wipes schedule and rebuilds from
  scratch (useful for initial deployment and membership changes)
- Updated CLAUDE.md with on-call architecture, key patterns, and
  command documentation
- Updated help text to mark all on-call commands as admin-only
@ewels ewels merged commit 8e36c19 into main Mar 19, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant