Skip to content

Commit 9c952ad

Browse files
fix(astro): add batch-size override and ECONNRESET retry to incremental search indexer
Meilisearch can silently restart mid-task under memory pressure (observed ~60s crash cycles during FR bulk upserts on the 7.6 GiB VPS), causing ECONNRESET on either the addDocuments POST or the waitForTask polling that follows. The previous indexer died outright on the first failure even though the submitted task was already persisted in LMDB and would typically resume on server recovery. Changes (apps/astro/scripts/index-search-incremental.ts): - New flushWithRetry() in BatchIndexer waits for /health to return "available" (up to 180s) and retries the wait on the original taskUid rather than resubmitting the batch. Up to 5 attempts per flush. - New --batch-size <n> CLI flag and MEILI_BATCH_SIZE env var override the default of 500 docs/batch. Smaller batches reduce per-flush Meilisearch memory and let crash recovery happen between batches instead of inside one. - New --verbose-batches flag prints the first/last doc ID of every flushed batch, with stdout force-flushed so the last logged ID is durable through a crash. Combined with --batch-size 1 this isolates poison documents. apps/astro/CLAUDE.md already documents these flags; this commit brings the code in line with the documentation. The full-reindex sibling (index-search.ts) has the same OOM-vulnerable pattern and should get the same treatment in a follow-up — scoped out of this PR because only the incremental script was field-validated on the VPS.
1 parent 7b0daec commit 9c952ad

1 file changed

Lines changed: 59 additions & 3 deletions

File tree

apps/astro/scripts/index-search-incremental.ts

Lines changed: 59 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -217,9 +217,7 @@ class BatchIndexer {
217217
process.stdout.write(` → flushing ${toSend.length} docs: ${label}\n`);
218218
}
219219

220-
const index = this.client.index(this.indexName);
221-
const task = await index.addDocuments(toSend);
222-
await this.client.tasks.waitForTask(task.taskUid, { timeout: 300_000 });
220+
await this.flushWithRetry(toSend);
223221

224222
this.totalSent += toSend.length;
225223
this.batchesSent++;
@@ -231,6 +229,64 @@ class BatchIndexer {
231229
}
232230
}
233231

232+
// Meilisearch can silently restart mid-task under memory pressure, causing
233+
// ECONNRESET on either addDocuments POST or waitForTask polling. Submitted
234+
// tasks are persisted in LMDB and typically resume on server recovery, so we
235+
// wait for /health to return "available" and retry — rather than giving up
236+
// after a short backoff that can easily expire inside one crash cycle
237+
// (observed ~60s between crashes). waitForTask reuses the original taskUid
238+
// so we wait for the already-enqueued task rather than resubmitting.
239+
private async flushWithRetry(toSend: SearchDocument[]): Promise<void> {
240+
const maxAttempts = 5;
241+
const healthWaitMs = 180_000;
242+
const index = this.client.index(this.indexName);
243+
let taskUid: number | null = null;
244+
245+
for (let attempt = 1; attempt <= maxAttempts; attempt++) {
246+
try {
247+
if (taskUid === null) {
248+
const task = await index.addDocuments(toSend);
249+
taskUid = task.taskUid;
250+
}
251+
await this.client.tasks.waitForTask(taskUid, { timeout: 300_000 });
252+
return;
253+
} catch (err) {
254+
if (attempt === maxAttempts) throw err;
255+
const message = err instanceof Error ? err.message : String(err);
256+
const firstId = toSend[0]?.id ?? "";
257+
const context = taskUid !== null ? `waitForTask(${taskUid})` : "addDocuments";
258+
process.stdout.write(
259+
` ⟳ attempt ${attempt}/${maxAttempts - 1}${context} failed (${message}) for batch starting ${firstId}\n`,
260+
);
261+
const recovered = await this.waitForMeiliHealth(healthWaitMs);
262+
if (!recovered) {
263+
process.stdout.write(` ⟳ Meilisearch did not recover within ${healthWaitMs / 1000}s — giving up this batch\n`);
264+
throw err;
265+
}
266+
// Small grace period after recovery lets Meilisearch finish its startup.
267+
await new Promise((resolve) => setTimeout(resolve, 3000));
268+
}
269+
}
270+
}
271+
272+
private async waitForMeiliHealth(maxWaitMs: number): Promise<boolean> {
273+
const deadline = Date.now() + maxWaitMs;
274+
const pollMs = 5000;
275+
while (Date.now() < deadline) {
276+
try {
277+
const health = await this.client.health();
278+
if (health.status === "available") {
279+
process.stdout.write(` ⟳ Meilisearch healthy — resuming\n`);
280+
return true;
281+
}
282+
} catch {
283+
// Connection refused / reset — Meilisearch still down or restarting
284+
}
285+
await new Promise((resolve) => setTimeout(resolve, pollMs));
286+
}
287+
return false;
288+
}
289+
234290
get total(): number {
235291
return this.totalSent;
236292
}

0 commit comments

Comments
 (0)