Skip to content

Commit 1cdd6fb

Browse files
ryan-williamsclaude
andcommitted
Generalize for reuse: remove hardcoded guild ID, add README
- \`DEFAULT_GUILD\` now reads from \`DISCORD_GUILD\` env var (required if not passed via \`--guild\` flag) - Generic User-Agent string - Comprehensive README: quick start, architecture, scripts, deployment, Discord bot setup The repo is reusable for any Discord server — all Marin-specific config is in env vars, \`wrangler.toml\`, and \`.dvc/config\`. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 4edccca commit 1cdd6fb

File tree

3 files changed

+182
-4
lines changed

3 files changed

+182
-4
lines changed

README.md

Lines changed: 178 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,178 @@
1+
# discord-archive
2+
3+
Archive, search, and browse Discord server messages with a fast web viewer.
4+
5+
- **Full archive**: channels, threads, attachments, reactions, embeds, users
6+
- **Incremental updates**: only fetches new messages on re-runs
7+
- **Full-text search**: FTS5-powered search with `#channel` and `@user` filters
8+
- **Keyboard-driven**: [use-kbd] omnibar (Cmd+K), shortcuts, arrow-key navigation
9+
- **Deployable**: Cloudflare Workers + D1 + Pages, with GitHub Actions CI/CD
10+
- **Versioned data**: [DVX]-tracked archive with S3 remote cache
11+
12+
## Quick start
13+
14+
```bash
15+
# 1. Set your Discord bot token and guild ID
16+
export DISCORD_TOKEN="your-bot-token"
17+
export DISCORD_GUILD="your-guild-id"
18+
19+
# 2. Archive all messages
20+
./archive.py
21+
22+
# 3. Build the SQLite database
23+
./build_db.py
24+
25+
# 4. Start the local API server
26+
./server.py &
27+
28+
# 5. Start the viewer
29+
cd app && pnpm install && pnpm dev
30+
# Open http://localhost:5272
31+
```
32+
33+
## Architecture
34+
35+
```
36+
discord-archive/
37+
archive.py # Discord API → JSON (incremental, per-channel files)
38+
build_db.py # JSON → SQLite (normalized, FTS5 search index)
39+
build_index.py # JSON → index.json (for static viewer)
40+
server.py # Local dev API server (Starlette + SQLite)
41+
archive/ # DVX-tracked raw JSON archive + attachments
42+
archive.db # SQLite database (derived from archive/)
43+
app/ # Vite + React viewer
44+
api/ # Cloudflare Worker (D1-backed API)
45+
d1-import.sh # Full SQLite → D1 import
46+
d1-sync.py # Incremental D1 sync (zero downtime)
47+
.github/workflows/ # CI/CD for app, worker, and archive updates
48+
```
49+
50+
## Scripts
51+
52+
### `archive.py`
53+
54+
Archives all messages from a Discord guild to per-channel JSON files.
55+
56+
```bash
57+
./archive.py # archive all channels + threads
58+
./archive.py --no-threads # skip thread messages
59+
./archive.py --no-attachments # skip downloading attachments
60+
./archive.py --backfill-attachments # re-fetch expired CDN URLs and download
61+
./archive.py -g 123456789 # specify guild ID
62+
./archive.py -o my-archive # custom output directory
63+
```
64+
65+
Requires `DISCORD_TOKEN` env var (bot token with Message Content intent).
66+
67+
### `build_db.py`
68+
69+
Builds a normalized SQLite database from the JSON archive.
70+
71+
```bash
72+
./build_db.py # default: archive/ → archive.db
73+
./build_db.py -i my-archive -o my.db # custom paths
74+
```
75+
76+
Creates tables: `channels`, `messages`, `users`, `attachments`, `reactions`, `embeds`, `threads`, plus a `messages_fts` FTS5 index.
77+
78+
### `server.py`
79+
80+
Local development API server.
81+
82+
```bash
83+
./server.py # serves archive.db on :5273
84+
```
85+
86+
Endpoints: `/api/channels`, `/api/channels/:id/messages`, `/api/messages/:id`, `/api/search`, `/api/users`. Also serves downloaded attachments from `/attachments/`.
87+
88+
## Viewer (`app/`)
89+
90+
React + TypeScript + Vite application with:
91+
92+
- Virtual scrolling ([TanStack Virtual]) for large channels
93+
- Full-text search with `#channel` and `@user` autocomplete
94+
- Permalink URLs (`#channelId/messageId`)
95+
- Message grouping, reactions with tooltips, embed rendering
96+
- Keyboard navigation via [use-kbd] (Cmd+K omnibar, `/` search, `?` shortcuts)
97+
- Responsive layout (collapsible sidebar, mobile support)
98+
- Prefetch on hover (channels, search results, mentions)
99+
100+
```bash
101+
cd app
102+
pnpm install
103+
pnpm dev # http://localhost:5272 (proxies /api to :5273)
104+
pnpm build # production build
105+
```
106+
107+
## Deployment
108+
109+
### Cloudflare (Workers + D1 + Pages)
110+
111+
The `api/` directory contains a Cloudflare Worker that serves the same API backed by D1.
112+
113+
```bash
114+
cd api
115+
pnpm install
116+
117+
# Create D1 database
118+
npx wrangler d1 create my-discord-archive
119+
# Update wrangler.toml with the database_id
120+
121+
# Import data
122+
./d1-import.sh ../archive.db # local D1
123+
./d1-import.sh --remote ../archive.db # remote D1
124+
125+
# Deploy worker
126+
npx wrangler deploy
127+
128+
# Deploy viewer
129+
cd ../app
130+
VITE_API_BASE=https://your-worker.workers.dev pnpm build
131+
npx wrangler pages deploy dist --project-name my-discord-archive
132+
```
133+
134+
### Incremental updates
135+
136+
```bash
137+
./archive.py # fetch new messages
138+
./build_db.py # rebuild SQLite
139+
cd api && ./d1-sync.py --remote # sync delta to D1 (zero downtime)
140+
```
141+
142+
### GitHub Actions
143+
144+
Three workflows in `.github/workflows/`:
145+
146+
| Workflow | Trigger | What it does |
147+
|---|---|---|
148+
| `deploy-app.yml` | Push to `app/`, manual | Build + deploy viewer to CF Pages |
149+
| `deploy-worker.yml` | Push to `api/`, manual | Deploy Worker to CF |
150+
| `update-archive.yml` | Manual (+ future cron) | Fetch new messages, rebuild DB, sync to D1 |
151+
152+
Required secrets: `CLOUDFLARE_TOKEN`, `DISCORD_TOKEN`
153+
Required variables: `CLOUDFLARE_ACCOUNT_ID`, `VITE_API_BASE`, `AWS_ROLE_ARN`
154+
155+
### DVX / Data versioning
156+
157+
The `archive/` directory is tracked with [DVX] (a [DVC] fork). Each archive update creates a new snapshot; individual file blobs are deduplicated.
158+
159+
```bash
160+
dvx add archive # track archive state
161+
dvx push # push to S3 remote
162+
dvx pull # restore archive from remote
163+
```
164+
165+
## Discord bot setup
166+
167+
1. Go to the [Discord Developer Portal]
168+
2. Create a new application, add a bot
169+
3. Enable **Message Content Intent** under Bot settings
170+
4. Generate a bot token → set as `DISCORD_TOKEN`
171+
5. Invite the bot to your server with `Read Message History` + `Read Messages` permissions
172+
6. Find your guild ID (right-click server name → Copy Server ID) → set as `DISCORD_GUILD`
173+
174+
[use-kbd]: https://github.com/runsascoded/use-kbd
175+
[TanStack Virtual]: https://tanstack.com/virtual
176+
[DVX]: https://github.com/runsascoded/dvx
177+
[DVC]: https://dvc.org
178+
[Discord Developer Portal]: https://discord.com/developers/applications

api/src/index.ts

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
/**
2-
* Marin Discord Archive API — Cloudflare Worker with D1.
2+
* Discord Archive API — Cloudflare Worker with D1.
33
*
44
* Endpoints:
55
* GET /api/channels

archive.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@
2020
err = lambda *a, **kw: print(*a, file=sys.stderr, **kw)
2121

2222
BASE = "https://discord.com/api/v10"
23-
DEFAULT_GUILD = "1354881461060243556"
23+
DEFAULT_GUILD = os.environ.get("DISCORD_GUILD", "")
2424
# Text-like channel types: text(0), announcement(5), public thread(11), private thread(12), announcement thread(10)
2525
TEXT_CHANNEL_TYPES = {0, 5, 10, 11, 12}
2626

@@ -285,7 +285,7 @@ async def run(guild_id, out_dir, download_att, fetch_threads, backfill_att=False
285285
token = os.environ["DISCORD_TOKEN"]
286286
headers = {
287287
"Authorization": f"Bot {token}",
288-
"User-Agent": "MarinBot (https://github.com/Open-Athena/marin-bot, 0.1)",
288+
"User-Agent": "discord-archive (https://github.com/Open-Athena/marin-bot, 0.1)",
289289
}
290290

291291
out_dir = Path(out_dir)
@@ -355,7 +355,7 @@ async def run(guild_id, out_dir, download_att, fetch_threads, backfill_att=False
355355
@command()
356356
@option('-A', '--no-attachments', is_flag=True, help='Skip downloading attachments')
357357
@option('-b', '--backfill-attachments', is_flag=True, help='Download all missing attachments from existing archive')
358-
@option('-g', '--guild', default=DEFAULT_GUILD, help='Guild (server) ID')
358+
@option('-g', '--guild', default=DEFAULT_GUILD, required=not DEFAULT_GUILD, help='Guild (server) ID, or set DISCORD_GUILD env var')
359359
@option('-o', '--out-dir', default='archive', help='Output directory for JSON files')
360360
@option('-T', '--no-threads', is_flag=True, help='Skip fetching thread messages')
361361
def main(guild, no_attachments, backfill_attachments, no_threads, out_dir):

0 commit comments

Comments
 (0)