Skip to content

Latest commit

 

History

History
154 lines (121 loc) · 6.71 KB

File metadata and controls

154 lines (121 loc) · 6.71 KB

yards-bot v1 — exec plan

Goals

Replace the manual reaction-sweep with a real-time Discord bot. No database, no Rails integration. React-with-emoji grants a role; un-reacting revokes it. A periodic backfill sweep heals anything the gateway missed.

Scope

Single-tenant. yards-bot only runs against the Fleetyards Discord. The Discord application is private (bot_public: false) and the config hardcodes Fleetyards-specific IDs. Anything we'd want to expose to other Star Citizen servers (slash-command lookups, release announcements, etc.) belongs in a separate, future, public bot — not in this repo.

Out of scope for v1

Account linking with the Fleetyards Rails app, slash commands, audit-to-channel, captcha, welcome DMs, role sync with fleets/supporters. Multi-tenant operation. Tracked separately for v2+ (or, for the multi-tenant pieces, the future public bot).

Stack

Repository layout

src/
  index.ts          # entry; wires intents, events, sweep, health, signals
  config.ts         # loads + zod-validates verifications.yaml
  client.ts         # discord.js Client setup (intents, partials)
  events/
    ready.ts        # logs guilds, kicks off initial sweep
    reactionAdd.ts  # grant role
    reactionRemove.ts # revoke role
  sweep.ts          # reconcile each verification against current reactors
  health.ts         # GET /healthz returns 200 when gateway is READY
  logger.ts         # pino instance
config/
  verifications.yaml          # gitignored; created from the example
  verifications.example.yaml  # committed reference
test/
  sweep.test.ts
  config.test.ts
Dockerfile
.dockerignore
.github/workflows/ci.yml
deploy.yml          # Kamal config (added with deploy step)
biome.json
tsconfig.json
tsconfig.build.json
package.json
README.md

Config format

YAML, validated with zod at startup. Bot exits non-zero on invalid config.

verifications:
  - name: rules-checkmark
    guild_id: "1000000000000000001"
    channel_id: "1000000000000000002"
    message_id: "1000000000000000003"
    emoji: ""           # unicode emoji, or "name:id" for custom
    role_id: "1000000000000000004"
    on_remove: revoke     # revoke | keep

sweep:
  on_startup: true
  cron: "0 */6 * * *"     # every 6h

Discord intents & partials

  • Intents: Guilds, GuildMessageReactions. No privileged intents required — members are fetched on demand via REST instead of cached via the GuildMembers intent.
  • Partials: Message, Channel, Reaction, User. The first three let the bot receive reaction events on messages older than its uptime; User is required for MESSAGE_REACTION_REMOVE on uncached users, because the gateway payload only ships user_id (not a member object like the add event does), and without it discord.js silently drops the event.

Required bot permissions in the guild

  • View Channel on each configured channel
  • Read Message History
  • Manage Roles
  • The bot's top role must sit above every role_id it manages. Discord silently rejects role changes otherwise — the bot should detect this on startup and log a clear warning.

Behavior

On READY

  1. Log connected guilds and the set of verifications it will manage.
  2. For each verification, fetch the message (warms partials) and run the initial sweep if sweep.on_startup is true.
  3. Register cron schedule.

On MESSAGE_REACTION_ADD

  1. Resolve partials.
  2. Match {guild_id, channel_id, message_id, emoji} against config; no-op if no match.
  3. Fetch member; skip if already has role.
  4. Add role; log a structured event with user/role/source.

On MESSAGE_REACTION_REMOVE

  1. Same matching as above.
  2. No-op if on_remove: keep.
  3. Otherwise remove role; log.

Sweep

  1. For each verification: paginate reactors via REST (channels.messages.reactions.get).
  2. Fetch all members currently holding the role.
  3. Diff:
    • Reactor with no role → grant.
    • Role holder who didn't react → revoke (only if on_remove: revoke).
    • Reactor who left the guild → log, skip.
  4. Idempotent. Safe to run on a cron.

Observability

  • pino JSON to stdout — Kamal/Docker captures.
  • AppSignal: errors auto-captured; custom counters for role.grant, role.revoke, sweep.duration_ms, gateway.disconnect.
  • /healthz returns 200 when the gateway has emitted READY and the last sweep finished without error; 503 otherwise.

Deployment

  • Single instance — multiple would duplicate gateway connections and role grants.
  • Kamal service, alongside the existing fleetyards stack.
  • Secrets via Kamal env: DISCORD_TOKEN, APPSIGNAL_PUSH_API_KEY.
  • Source of truth for the bot token: 1Password (op://Fleetyards/DISCORD_BOT_LIVE/credential).

Testing

  • Unit: config validation (valid/invalid YAML), sweep diff logic with a mocked REST layer.
  • Manual smoke: deploy to a staging guild (or the live guild with a throwaway verification config), react + unreact, watch logs.

Implementation order

  1. Repo scaffold + tsconfig + biome + CI — done (PR #0, initial scaffold).
  2. Config loader with zod — done (PR #1).
  3. discord.js client + READY logging — done (PR #2).
  4. Sweep (TS port of the original bash script) with unit tests, run on startup — done (PR #3).
  5. reactionAdd + reactionRemove handlers — done (PR #4).
  6. Cron-scheduled sweep on top of the on-startup sweep.
  7. Health endpoint.
  8. Dockerfile finalization + local docker-compose.
  9. Kamal deploy config.
  10. AppSignal wiring.
  11. README polish (role-hierarchy gotcha, bot invite URL).

Risks

  • Role hierarchy misconfiguration is the most common silent failure for this kind of bot. Sweep should explicitly probe and log on startup.
  • Partials.User pitfall: MESSAGE_REACTION_REMOVE is silently dropped by discord.js if the user isn't cached and Partials.User isn't enabled (the gateway payload only carries user_id). We do enable it; don't remove it from src/client.ts without testing the remove path live.
  • Single-instance constraint. Kamal config must not scale horizontally — duplicate gateway connections would produce duplicate role grants.
  • Stateless trade-off: no in-bot audit trail beyond logs. Acceptable in v1; if richer audit is needed later, the cheapest add is a Discord audit channel before reaching for a DB.