You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
# Search indexing runs locally in Docker, not on the VPS — each update script's
134
+
# final step delegates to `deploy.sh --search-docker --source <name>`.
133
135
./scripts/update.sh # All sources incrementally
134
136
./scripts/update.sh --source ecfr # One source
135
137
./scripts/update.sh --skip-deploy # Local only
@@ -298,6 +300,9 @@ Note: identifiers use `/us/cfr/` (content type) not `/us/ecfr/` (data source). B
298
300
-**Docker search index checkpoints**: The incremental indexing script writes checkpoint files (`.search-indexed-at-{source}`) into the content directory. For Docker runs, these are persisted in `downloads/.search-checkpoints/` and restored into the temp content dir on each run. If this directory is deleted, the next Docker index run will scan all files from scratch.
299
301
-**Docker volume profiles**: `MEILI_PROFILE=dev|full` selects volume (`meili-data-dev` or `meili-data-full`). Dev mode runs without master key (`MEILI_ENV=development`). Full mode requires `MEILI_MASTER_KEY` for VPS-compatible data.
300
302
-**Cloudflare "Managed robots.txt"**: When enabled, Cloudflare overwrites the site's `robots.txt` to block AI crawlers. For LexBuild (public domain legal content), this should be **OFF**. The custom `robots.txt` at `apps/astro/public/robots.txt` blocks AI crawlers from `/_astro/` (hashed static assets), `/nav/` (internal JSON), and `/api/` while allowing legal content.
303
+
-**VPS PM2 logs live at `/home/ubuntu/pm2/logs/lexbuild/`**, not `~/.pm2/logs/`. The latter is legacy — only `pm2-logrotate-out.log` still writes there. Check the new path when debugging PM2-managed services.
304
+
-**VPS has 6 GiB swap** at `/swapfile` (persisted in `/etc/fstab`). Added as defense against Meilisearch OOM during bulk upserts on a 7.6 GiB RAM Lightsail box. Don't remove.
305
+
-**Stuck Meilisearch tasks crash-loop across restarts**: document-addition tasks that OOM Meilisearch are persisted in LMDB and re-attempted after every PM2 restart (observed ~60s crash cycle, 160+ restarts in 2.5 hours). Cancel via `curl -XPOST -H "Authorization: Bearer $MEILI_MASTER_KEY" "http://127.0.0.1:7700/tasks/cancel?uids=<list>"` — the cancellation typically executes during a healthy window even if the stuck task itself can't complete.
Copy file name to clipboardExpand all lines: apps/astro/CLAUDE.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -158,7 +158,7 @@ npx tsx scripts/index-search-incremental.ts --set-checkpoint # Set checkpoint w
158
158
159
159
Script notes:
160
160
-**generate-highlights.ts**: Forks child processes in 2k-file chunks (default, tunable via `--chunk-size N`) to avoid Shiki OOM. Each child is heap-capped at 2GB (`--max-old-space-size`). Uses `matter(raw, { cache: false })` to prevent gray-matter from caching every file in memory. Supports `--limit N` for testing. Changing themes requires updating both this script and `src/lib/shiki.ts`, then deleting existing `.highlighted.html` files.
161
-
-**index-search.ts** and **index-search-incremental.ts**: Must be kept in sync — sources indexed, `SearchDocument` shape, and `configureIndex` settings must match. Both index USC, eCFR, and FR. Full reindex deletes and rebuilds; incremental upserts only changed files (mtime-based per-source checkpoints in `.search-indexed-at-{usc,ecfr,fr}`). Checkpoints are always written after indexing, even with `--source` — each source tracks independently. 500 docs/batch, 300s waitForTask timeout. Document IDs sanitized (dots/colons → underscores).
161
+
-**index-search.ts** and **index-search-incremental.ts**: Must be kept in sync — sources indexed, `SearchDocument` shape, and `configureIndex` settings must match. Both index USC, eCFR, and FR. Full reindex deletes and rebuilds; incremental upserts only changed files (mtime-based per-source checkpoints in `.search-indexed-at-{usc,ecfr,fr}`). Checkpoints are always written after indexing, even with `--source` — each source tracks independently. Default 500 docs/batch (override with `--batch-size N` or `MEILI_BATCH_SIZE` env var), 300s waitForTask timeout. `--verbose-batches` logs first/last doc ID per flush — pair with `--batch-size 1` to bisect poison docs. Document IDs sanitized (dots/colons → underscores).
162
162
-**generate-nav.ts**: Includes reserved title placeholders (USC 53, eCFR 35). Chapter grouping for eCFR derived from filesystem directories, not `_meta.json`.
163
163
-**All pipeline scripts support `--source usc|ecfr|fr`**: `generate-nav.ts`, `generate-sitemap.ts`, `generate-highlights.ts`, `index-search.ts` (full), and `index-search-incremental.ts` all accept `--source` to process a single source. Sitemap `--source` doesn't rewrite the sitemap index (run without `--source` to rebuild the full index). Highlights `--source` filters by content path prefix.
0 commit comments