feat(#23): kb ingest accepts multiple ids and prints a batch summary by Vigtu · Pull Request #27 · p7dotorg/paper7

Vigtu · 2026-05-15T11:23:14Z

Summary

paper7 kb ingest argument is now variadic via Argument.variadic({ min: 1 }).
Single-id behaviour is unchanged: the paper's markdown still streams to stdout, preserving existing pipes.
With two or more ids, a new runKbIngestBatch path takes over: serial ingest, per-id catch, single summary block on stdout.
Adds KbIngestBatchFailed tagged error so the process exits 1 when every id in a batch fails.

Why

Running 10–15 ingests in a research session means launching the command per paper, then diffing paper7 kb list against the expected id list to find the failures, then retrying by hand. Variadic + summary collapses three steps into one.

Batch output shape

Ingested: N/M papers to <sources-dir>
Failed:
  <id> — <reason>

Hard failures (unparseable identifier, network/HTTP error, cache write error) hit the Failed: block with a per-id reason. Soft fallbacks from PR3 (ar5iv → abstract-only) emit their own warnings via Effect.logWarning during ingest and count toward Ingested:, so the summary just reports the final tally.

Exit codes

≥1 paper ingested → exit 0, summary on stdout.
Zero ingested with at least one attempt → exit 1, summary on stdout, error: all kb ingests failed on stderr.

Concurrency

Sequential, intentionally. arxiv export API enforces ~3s/request rate limiting and S2 caps at ~1 req/s on the unauth tier — concurrency buys 429s, not throughput. A --concurrency flag would be future work behind a retry strategy.

Test plan

npx vitest run → 171 pass / 0 fail.
npx tsc --noEmit clean.
New tests cover: two valid ids (Ingested: 2/2, no markdown in output), one valid + one bogus (Ingested: 1/2 + Failed: bogus.id — invalid identifier), all bogus (exit 1 + summary on stdout + error on stderr).
Existing single-id test guards regression on the pipe-friendly path (paper markdown still on stdout).

Closes #23.

…summary paper7 kb ingest used to take exactly one identifier. Running 10-15 ingests in a research session meant launching the command per paper, then diffing paper7 kb list against the expected id list to find the failures, then retrying by hand. The argument is now variadic via Argument.variadic({ min: 1 }). Single-id behaviour is preserved exactly — the paper's markdown still streams to stdout, so existing pipes keep working. With two or more ids, a new runKbIngestBatch path takes over: it ingests serially (arxiv enforces a ~3s rate limit and S2 caps at ~1 req/s on the unauth tier; concurrency buys 429s rather than throughput), and prints one summary block: Ingested: N/M papers to <sources-dir> Failed: <id> — <reason> Parse failures, network errors, and cache errors all land in the Failed: list with a per-id reason. The batch exits 0 as long as at least one paper landed; if every id failed the new KbIngestBatchFailed error fires and the process exits 1 with 'error: all kb ingests failed' on stderr while the summary still goes to stdout. The renderer is intentionally terse — soft fallbacks from PR3 (ar5iv → abstract-only) print their own warnings via Effect.logWarning during ingest and count toward Ingested:, so the summary just reports the final tally. Closes p7dotorg#23.

EduSantosBrito

Effect-idiomatic review: requesting changes.

The feature is useful, and keeping the batch sequential is a reasonable default. Effect's forEach defaults to concurrency 1, so the implementation is sequential as intended. The main changes I would ask for are about preserving Effect boundaries.

Why this matters in Effect: an Effect program should keep domain work, typed failures, and runtime/CLI concerns separate. That is what makes the code composable, testable, and easy to reason about. If a domain function prints to the console and catches every error as a string, it becomes much harder to reuse or inspect with Effect's typed model.

Concrete concerns:

runKbIngestBatch logs from src/kb.ts:

yield* Console.log(renderBatchSummary(attempts, paths.sources))

Existing runKb returns a string and commands/kb.ts handles Console.log. I would keep that boundary: have runKbIngestBatch return a KbIngestBatchResult, then render/log it in src/commands/kb.ts. The domain module should describe the batch result; the command adapter should decide how to print it.

ingestOneForBatch catches every KbError | GetError and converts it to { kind: "fail", reason: string }.

Effect.catch((error) =>
  Effect.succeed({ kind: "fail", raw, reason: batchErrorReason(error) })
)

That erases the typed error channel. In Effect, broad catches are best kept at clear boundaries. Here, only errors that are genuinely per-id/recoverable should become failed entries. Shared or systemic failures should remain typed failures. For example, a write/permission problem in the wiki directory is probably not "one id failed"; it may mean the whole batch cannot write.

Suggested shape:

Parse failures can become per-id failures.
Expected per-id Get* failures can become per-id failures if that is the desired UX.
Keep the typed error in the BatchAttempt payload as long as possible, and render it only in the CLI layer.
Let systemic KbIoError / setup failures fail the command normally.

For example:

type BatchAttempt =
  | { readonly _tag: "Ingested"; readonly raw: string }
  | { readonly _tag: "Failed"; readonly raw: string; readonly error: BatchIngestError }

Then commands/kb.ts can render the summary and format BatchIngestError for humans.

This keeps the nice batch UX while preserving the main Effect benefit: typed, explicit recoverability instead of catch-all stringification.

Verification: I checked this PR locally with tsc --noEmit and the kb tests; they pass.

EduSantosBrito · 2026-05-21T12:23:17Z

Follow-up: underused Effect APIs that would help this batch implementation.

Effect.result / Result.match

For batch work, this is often better than catching everything into a string. It lets each per-id attempt become success/failure data while preserving the typed error:

const attempts = yield* Effect.forEach(rawIds, (raw) =>
  ingestOne(raw).pipe(Effect.result)
)

Then the command layer can use Result.match to render the summary. This keeps domain errors typed until the CLI boundary.

Effect.catchTags

If only some failures should be per-id failures, catch just those tags:

ingest(id).pipe(
  Effect.map(() => ({ _tag: "Ingested", raw })),
  Effect.catchTags({
    GetArxivError: (error) => Effect.succeed({ _tag: "Failed", raw, error }),
    GetAr5ivError: (error) => Effect.succeed({ _tag: "Failed", raw, error }),
    GetPubmedError: (error) => Effect.succeed({ _tag: "Failed", raw, error }),
    GetCrossrefError: (error) => Effect.succeed({ _tag: "Failed", raw, error })
  })
)

Leave systemic errors like KbIoError in the error channel.

Effect.fn

runKbIngestBatch / ingestOneForBatch are reusable workflow functions and good candidates for Effect.fn("runKbIngestBatch") or Effect.fn("ingestOneForBatch"). It improves trace names and makes it clear these are Effect business operations.

Effect.forEach concurrency option, if needed later

Current sequential behavior is fine: Effect.forEach defaults to concurrency 1. If you later add controlled parallelism, the idiomatic way is:

Effect.forEach(rawIds, ingestOneForBatch, { concurrency: 2 })

No custom queue needed unless the policy becomes more complex.

Pull rendering and the final fail-decision out of src/kb.ts so the domain module returns data and the CLI adapter decides how to present it. runKbIngestBatch now returns KbIngestBatchResult (attempts + sourcesDir); src/commands/kb.ts logs the summary, formats per-id errors, and raises KbIngestBatchFailed when every id failed. Narrow the per-id error boundary with Effect.catchTags. Only the four external fetch failures (GetArxivError, GetAr5ivError, GetPubmedError, GetCrossrefError) are converted to per-id Failed entries; KbIoError and the rest of GetError stay in the typed error channel so a wiki write failure / disk-full / permission problem still fails the whole batch loudly instead of being silently reported as a skipped paper. BatchAttempt now carries the typed BatchIngestError payload (new KbInvalidIdentifier tag covers unparseable raw ids), and the CLI renderer is the only place that stringifies it. KbIngestBatchFailed is raised with bare 'yield* new KbIngestBatchFailed(...)' per repo convention.

Vigtu · 2026-05-21T21:39:06Z

Addressed both points in f78532a:

1. Domain / CLI boundary — runKbIngestBatch no longer logs or decides exit. It now returns:

type KbIngestBatchResult = {
  readonly attempts: ReadonlyArray<BatchAttempt>
  readonly sourcesDir: string
}

src/commands/kb.ts is now the only place that calls Console.log, renders the summary, and raises KbIngestBatchFailed when no ids succeeded — same UX, but the domain module is reusable/testable in isolation.

2. Narrowed catchTags + typed payload — replaced the blanket Effect.catch with Effect.catchTags({ GetArxivError, GetAr5ivError, GetPubmedError, GetCrossrefError }). KbIoError (wiki dir write/perm), GetCacheWriteError, etc. now stay in the typed error channel so a systemic failure fails the whole batch instead of being reported as a skipped paper.

BatchAttempt carries the typed BatchIngestError (new KbInvalidIdentifier tag for unparseable raw ids); the CLI renderer is the only place that turns it into a display string.

Also dropped the Effect.fail(new KbIngestBatchFailed(...)) for bare yield* new KbIngestBatchFailed(...) to match the rest of the repo.

tsc --noEmit clean, all 171 tests pass.

EduSantosBrito requested changes May 21, 2026

View reviewed changes

Vigtu requested a review from EduSantosBrito May 21, 2026 21:36

EduSantosBrito approved these changes May 22, 2026

View reviewed changes

lucianfialho merged commit 50c5786 into p7dotorg:main May 22, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(#23): kb ingest accepts multiple ids and prints a batch summary#27

feat(#23): kb ingest accepts multiple ids and prints a batch summary#27
lucianfialho merged 2 commits into
p7dotorg:mainfrom
Vigtu:feat/issue-23-batch-ingest

Vigtu commented May 15, 2026

Uh oh!

EduSantosBrito left a comment

Uh oh!

EduSantosBrito commented May 21, 2026

Uh oh!

Vigtu commented May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Vigtu commented May 15, 2026

Summary

Why

Batch output shape

Exit codes

Concurrency

Test plan

Uh oh!

EduSantosBrito left a comment

Choose a reason for hiding this comment

Uh oh!

EduSantosBrito commented May 21, 2026

Uh oh!

Vigtu commented May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants