You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat(#23): kb ingest accepts multiple ids and prints a batch summary (#27)
* feat(#23): kb ingest accepts multiple ids and prints a batch summary
paper7 kb ingest used to take exactly one identifier. Running 10-15
ingests in a research session meant launching the command per paper,
then diffing paper7 kb list against the expected id list to find the
failures, then retrying by hand.
The argument is now variadic via Argument.variadic({ min: 1 }). Single-id
behaviour is preserved exactly — the paper's markdown still streams to
stdout, so existing pipes keep working. With two or more ids, a new
runKbIngestBatch path takes over: it ingests serially (arxiv enforces a
~3s rate limit and S2 caps at ~1 req/s on the unauth tier; concurrency
buys 429s rather than throughput), and prints one summary block:
Ingested: N/M papers to <sources-dir>
Failed:
<id> — <reason>
Parse failures, network errors, and cache errors all land in the
Failed: list with a per-id reason. The batch exits 0 as long as at
least one paper landed; if every id failed the new KbIngestBatchFailed
error fires and the process exits 1 with 'error: all kb ingests failed'
on stderr while the summary still goes to stdout.
The renderer is intentionally terse — soft fallbacks from PR3
(ar5iv → abstract-only) print their own warnings via Effect.logWarning
during ingest and count toward Ingested:, so the summary just reports
the final tally.
Closes#23.
* feat(#23): preserve effect boundaries in kb ingest batch
Pull rendering and the final fail-decision out of src/kb.ts so the
domain module returns data and the CLI adapter decides how to present
it. runKbIngestBatch now returns KbIngestBatchResult (attempts +
sourcesDir); src/commands/kb.ts logs the summary, formats per-id
errors, and raises KbIngestBatchFailed when every id failed.
Narrow the per-id error boundary with Effect.catchTags. Only the four
external fetch failures (GetArxivError, GetAr5ivError, GetPubmedError,
GetCrossrefError) are converted to per-id Failed entries; KbIoError
and the rest of GetError stay in the typed error channel so a wiki
write failure / disk-full / permission problem still fails the whole
batch loudly instead of being silently reported as a skipped paper.
BatchAttempt now carries the typed BatchIngestError payload (new
KbInvalidIdentifier tag covers unparseable raw ids), and the CLI
renderer is the only place that stringifies it. KbIngestBatchFailed is
raised with bare 'yield* new KbIngestBatchFailed(...)' per repo
convention.
0 commit comments