|
| 1 | +# discord-archive |
| 2 | + |
| 3 | +Archive, search, and browse Discord server messages with a fast web viewer. |
| 4 | + |
| 5 | +- **Full archive**: channels, threads, attachments, reactions, embeds, users |
| 6 | +- **Incremental updates**: only fetches new messages on re-runs |
| 7 | +- **Full-text search**: FTS5-powered search with `#channel` and `@user` filters |
| 8 | +- **Keyboard-driven**: [use-kbd] omnibar (Cmd+K), shortcuts, arrow-key navigation |
| 9 | +- **Deployable**: Cloudflare Workers + D1 + Pages, with GitHub Actions CI/CD |
| 10 | +- **Versioned data**: [DVX]-tracked archive with S3 remote cache |
| 11 | + |
| 12 | +## Quick start |
| 13 | + |
| 14 | +```bash |
| 15 | +# 1. Set your Discord bot token and guild ID |
| 16 | +export DISCORD_TOKEN="your-bot-token" |
| 17 | +export DISCORD_GUILD="your-guild-id" |
| 18 | + |
| 19 | +# 2. Archive all messages |
| 20 | +./archive.py |
| 21 | + |
| 22 | +# 3. Build the SQLite database |
| 23 | +./build_db.py |
| 24 | + |
| 25 | +# 4. Start the local API server |
| 26 | +./server.py & |
| 27 | + |
| 28 | +# 5. Start the viewer |
| 29 | +cd app && pnpm install && pnpm dev |
| 30 | +# Open http://localhost:5272 |
| 31 | +``` |
| 32 | + |
| 33 | +## Architecture |
| 34 | + |
| 35 | +``` |
| 36 | +discord-archive/ |
| 37 | + archive.py # Discord API → JSON (incremental, per-channel files) |
| 38 | + build_db.py # JSON → SQLite (normalized, FTS5 search index) |
| 39 | + build_index.py # JSON → index.json (for static viewer) |
| 40 | + server.py # Local dev API server (Starlette + SQLite) |
| 41 | + archive/ # DVX-tracked raw JSON archive + attachments |
| 42 | + archive.db # SQLite database (derived from archive/) |
| 43 | + app/ # Vite + React viewer |
| 44 | + api/ # Cloudflare Worker (D1-backed API) |
| 45 | + d1-import.sh # Full SQLite → D1 import |
| 46 | + d1-sync.py # Incremental D1 sync (zero downtime) |
| 47 | + .github/workflows/ # CI/CD for app, worker, and archive updates |
| 48 | +``` |
| 49 | + |
| 50 | +## Scripts |
| 51 | + |
| 52 | +### `archive.py` |
| 53 | + |
| 54 | +Archives all messages from a Discord guild to per-channel JSON files. |
| 55 | + |
| 56 | +```bash |
| 57 | +./archive.py # archive all channels + threads |
| 58 | +./archive.py --no-threads # skip thread messages |
| 59 | +./archive.py --no-attachments # skip downloading attachments |
| 60 | +./archive.py --backfill-attachments # re-fetch expired CDN URLs and download |
| 61 | +./archive.py -g 123456789 # specify guild ID |
| 62 | +./archive.py -o my-archive # custom output directory |
| 63 | +``` |
| 64 | + |
| 65 | +Requires `DISCORD_TOKEN` env var (bot token with Message Content intent). |
| 66 | + |
| 67 | +### `build_db.py` |
| 68 | + |
| 69 | +Builds a normalized SQLite database from the JSON archive. |
| 70 | + |
| 71 | +```bash |
| 72 | +./build_db.py # default: archive/ → archive.db |
| 73 | +./build_db.py -i my-archive -o my.db # custom paths |
| 74 | +``` |
| 75 | + |
| 76 | +Creates tables: `channels`, `messages`, `users`, `attachments`, `reactions`, `embeds`, `threads`, plus a `messages_fts` FTS5 index. |
| 77 | + |
| 78 | +### `server.py` |
| 79 | + |
| 80 | +Local development API server. |
| 81 | + |
| 82 | +```bash |
| 83 | +./server.py # serves archive.db on :5273 |
| 84 | +``` |
| 85 | + |
| 86 | +Endpoints: `/api/channels`, `/api/channels/:id/messages`, `/api/messages/:id`, `/api/search`, `/api/users`. Also serves downloaded attachments from `/attachments/`. |
| 87 | + |
| 88 | +## Viewer (`app/`) |
| 89 | + |
| 90 | +React + TypeScript + Vite application with: |
| 91 | + |
| 92 | +- Virtual scrolling ([TanStack Virtual]) for large channels |
| 93 | +- Full-text search with `#channel` and `@user` autocomplete |
| 94 | +- Permalink URLs (`#channelId/messageId`) |
| 95 | +- Message grouping, reactions with tooltips, embed rendering |
| 96 | +- Keyboard navigation via [use-kbd] (Cmd+K omnibar, `/` search, `?` shortcuts) |
| 97 | +- Responsive layout (collapsible sidebar, mobile support) |
| 98 | +- Prefetch on hover (channels, search results, mentions) |
| 99 | + |
| 100 | +```bash |
| 101 | +cd app |
| 102 | +pnpm install |
| 103 | +pnpm dev # http://localhost:5272 (proxies /api to :5273) |
| 104 | +pnpm build # production build |
| 105 | +``` |
| 106 | + |
| 107 | +## Deployment |
| 108 | + |
| 109 | +### Cloudflare (Workers + D1 + Pages) |
| 110 | + |
| 111 | +The `api/` directory contains a Cloudflare Worker that serves the same API backed by D1. |
| 112 | + |
| 113 | +```bash |
| 114 | +cd api |
| 115 | +pnpm install |
| 116 | + |
| 117 | +# Create D1 database |
| 118 | +npx wrangler d1 create my-discord-archive |
| 119 | +# Update wrangler.toml with the database_id |
| 120 | + |
| 121 | +# Import data |
| 122 | +./d1-import.sh ../archive.db # local D1 |
| 123 | +./d1-import.sh --remote ../archive.db # remote D1 |
| 124 | + |
| 125 | +# Deploy worker |
| 126 | +npx wrangler deploy |
| 127 | + |
| 128 | +# Deploy viewer |
| 129 | +cd ../app |
| 130 | +VITE_API_BASE=https://your-worker.workers.dev pnpm build |
| 131 | +npx wrangler pages deploy dist --project-name my-discord-archive |
| 132 | +``` |
| 133 | + |
| 134 | +### Incremental updates |
| 135 | + |
| 136 | +```bash |
| 137 | +./archive.py # fetch new messages |
| 138 | +./build_db.py # rebuild SQLite |
| 139 | +cd api && ./d1-sync.py --remote # sync delta to D1 (zero downtime) |
| 140 | +``` |
| 141 | + |
| 142 | +### GitHub Actions |
| 143 | + |
| 144 | +Three workflows in `.github/workflows/`: |
| 145 | + |
| 146 | +| Workflow | Trigger | What it does | |
| 147 | +|---|---|---| |
| 148 | +| `deploy-app.yml` | Push to `app/`, manual | Build + deploy viewer to CF Pages | |
| 149 | +| `deploy-worker.yml` | Push to `api/`, manual | Deploy Worker to CF | |
| 150 | +| `update-archive.yml` | Manual (+ future cron) | Fetch new messages, rebuild DB, sync to D1 | |
| 151 | + |
| 152 | +Required secrets: `CLOUDFLARE_TOKEN`, `DISCORD_TOKEN` |
| 153 | +Required variables: `CLOUDFLARE_ACCOUNT_ID`, `VITE_API_BASE`, `AWS_ROLE_ARN` |
| 154 | + |
| 155 | +### DVX / Data versioning |
| 156 | + |
| 157 | +The `archive/` directory is tracked with [DVX] (a [DVC] fork). Each archive update creates a new snapshot; individual file blobs are deduplicated. |
| 158 | + |
| 159 | +```bash |
| 160 | +dvx add archive # track archive state |
| 161 | +dvx push # push to S3 remote |
| 162 | +dvx pull # restore archive from remote |
| 163 | +``` |
| 164 | + |
| 165 | +## Discord bot setup |
| 166 | + |
| 167 | +1. Go to the [Discord Developer Portal] |
| 168 | +2. Create a new application, add a bot |
| 169 | +3. Enable **Message Content Intent** under Bot settings |
| 170 | +4. Generate a bot token → set as `DISCORD_TOKEN` |
| 171 | +5. Invite the bot to your server with `Read Message History` + `Read Messages` permissions |
| 172 | +6. Find your guild ID (right-click server name → Copy Server ID) → set as `DISCORD_GUILD` |
| 173 | + |
| 174 | +[use-kbd]: https://github.com/runsascoded/use-kbd |
| 175 | +[TanStack Virtual]: https://tanstack.com/virtual |
| 176 | +[DVX]: https://github.com/runsascoded/dvx |
| 177 | +[DVC]: https://dvc.org |
| 178 | +[Discord Developer Portal]: https://discord.com/developers/applications |
0 commit comments