feat(ollama): Ollama batch backend — ollama-batch.mjs + batch-runner.sh#680
feat(ollama): Ollama batch backend — ollama-batch.mjs + batch-runner.sh#680mattkirkey wants to merge 6 commits into
Conversation
…er.sh) - Add ollama-batch.mjs: drop-in worker replacement for `claude -p` in batch-runner.sh. Evaluates one offer via Ollama OpenAI-compatible API, writes report .md and tracker TSV, emits JSON summary to stdout. Loopback guard rejects remote OLLAMA_BASE_URL by default (opt-out via OLLAMA_ALLOW_REMOTE=1). Path inputs validated against regex before use in file paths. Min-score gate fires before any file writes. - Update batch/batch-runner.sh: add --backend [claude|ollama] flag and --model NAME flag. Backend-aware check_prerequisites probes Ollama via curl before starting. process_offer dispatches to node ollama-batch.mjs or existing claude -p array invocation depending on backend. - Document OLLAMA_BASE_URL / OLLAMA_MODEL / OLLAMA_TIMEOUT_MS in .env.example. - Add `ollama:batch` npm script. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Welcome to career-ops, @mattkirkey! Thanks for your first PR. A few things to know:
We'll review your PR soon. Join our Discord if you have questions. |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds Ollama as an alternative LLM backend: environment and npm script entries, batch-runner backend selection and validation, and a new Node.js worker that evaluates offers via an Ollama-compatible chat completions API and writes reports/tracker rows. ChangesOllama LLM Backend Integration
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested labels
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 5
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@batch/batch-runner.sh`:
- Around line 104-105: The --model option is parsed into the MODEL variable but
never applied when BACKEND is "ollama", causing ollama to fall back to
${OLLAMA_MODEL:-llama3.3}; update the option-parsing and backend-selection logic
so that when BACKEND == "ollama" and MODEL is set (and OLLAMA_MODEL is not
already set), assign OLLAMA_MODEL="$MODEL" before invoking ollama, or explicitly
reject the combination (exit with an error) if you prefer strictness; apply the
same mapping/rejection in the other parsing block handling CLI args (the block
referencing MODEL/BACKEND around the later duplicate section) so both places
consistently honor --model for the ollama backend.
- Around line 169-176: The preflight currently only checks reachability of
${OLLAMA_BASE_URL} (assigned to ollama_url) but must also reject non-loopback
URLs to avoid later per-offer failures; update the block that defines local
ollama_url and does the curl check to parse the host from
${OLLAMA_BASE_URL:-http://localhost:11434} (strip scheme/port), verify the host
is loopback (localhost, 127.0.0.1, ::1, or a resolved loopback IP), and if not,
print a clear error and exit 1 before attempting curl; keep the existing curl
reachability check afterward so both host validation and connectivity are
enforced.
In `@ollama-batch.mjs`:
- Around line 392-397: Sanitize model-derived fields before writing the TSV:
ensure company, role, and notesStr are cleaned (e.g., replace tabs/newlines with
single spaces or escape them) prior to building tsvLine so they cannot inject
tab or newline characters and break the TSV; update the code that constructs
scoreStr/reportLink/notesStr and the array passed to writeFileSync (referencing
scoreStr, reportLink, notesStr, company, role, writeFileSync, PATHS.tracker, and
batchId) to use the sanitized values.
- Line 321: The assignment to score uses "parseFloat(extract('SCORE')) || null"
which converts a valid 0 score to null; change it to explicitly detect invalid
parses instead of using || so zero is preserved (e.g., parse the value into a
temporary, then set score to null only when Number.isNaN(parsed) or
!isFinite(parsed)); update the expression where score is assigned (the line
using extract('SCORE') / parseFloat) to use that explicit NaN check.
- Around line 192-205: The URL fetch currently only uses new URL(url) for
validation and then calls fetch; harden it against SSRF by resolving and
validating the URL's hostname before fetching: in the block around the new
URL(url) check (and before fetch(...)), parse the hostname, reject obvious
hostnames like "localhost" or "127.0.0.1", and perform a DNS resolution (e.g.,
using dns.promises.lookup or resolve) to obtain all IPs for that hostname and
reject any addresses in private/loopback/link-local ranges (127.0.0.0/8,
10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16, 169.254.0.0/16, ::1, fe80::/10,
fc00::/7, etc.) and also reject direct IP literals that are in those ranges;
only if the resolved addresses pass these checks, proceed to the existing fetch
call (keeping the same headers and AbortSignal) otherwise call fail(...) with a
clear "disallowed or private IP" message.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 82978bb0-55a4-44eb-a365-ec54a47e9b73
📒 Files selected for processing (4)
.env.examplebatch/batch-runner.shollama-batch.mjspackage.json
batch/batch-runner.sh: - Map --model to OLLAMA_MODEL when --backend ollama is selected, so `--backend ollama --model qwen2.5:72b` works without silently falling back to llama3.3. Export the resolved value to child process env. - Add loopback guard in check_prerequisites for Ollama backend: reject non-localhost OLLAMA_BASE_URL before reserving report numbers, so a misconfigured remote URL fails fast rather than polluting batch state. ollama-batch.mjs: - SSRF guard on JD URL fetch: reject non-http/https protocols and private/loopback address ranges (127.*, 10.*, 192.168.*, 172.16-31.*, 169.254.*, ::1) before calling fetch(). - Fix zero-score bug: replace `parseFloat(...) || null` with `Number.isFinite()` check so a score of 0 is not coerced to null. - TSV injection fix: sanitize model-derived strings (company, role, archetype, legitimacy) by replacing tab/CR/LF with a space before assembling the TSV row. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
batch/batch-runner.sh (1)
463-468:⚠️ Potential issue | 🟠 Major | ⚡ Quick win
skippednow persists, but the scheduler still requeues it.With the new
return 0, low-score offers remain in state asskipped. The pending-offer filter later only excludescompleted, so the next batch run will process the same low-score offers again and keep reserving new report numbers.Possible fix
- if [[ "$status" == "completed" ]]; then + if [[ "$status" == "completed" || "$status" == "skipped" ]]; then continue fi🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@batch/batch-runner.sh` around lines 463 - 468, The block that sets low-score offers to "skipped" (update_state called with "skipped") then returns, but the later pending-offer selection only excludes "completed" so skipped offers get requeued; fix by updating the pending-offer filter/query to also exclude "skipped" state (or add "skipped" to the same exclusion list as "completed"), keeping the update_state call and return intact; locate the logic that builds the pending-offer selection (the filter/query that currently excludes "completed") and add "skipped" to that condition so offers marked by update_state(..., "skipped", ...) are not picked up again.
♻️ Duplicate comments (1)
ollama-batch.mjs (1)
197-214:⚠️ Potential issue | 🟠 Major | 🏗️ Heavy liftHostname-only SSRF checks are still bypassable.
This only rejects private targets when the URL string itself is
localhost/RFC1918/etc. A public hostname that resolves to127.0.0.1,::1,fe80::/10,fc00::/7, or other internal addresses will still pass and be fetched. Resolve the hostname first and reject the request if any A/AAAA answer is loopback, link-local, or private.As per coding guidelines, "Check for command injection, path traversal, and SSRF. Ensure scripts handle missing data/ directories gracefully."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@ollama-batch.mjs` around lines 197 - 214, The current SSRF guard using parsedUrl.hostname (h) is insufficient because a public hostname may resolve to private/loopback addresses; update the check to perform a DNS resolution of parsedUrl.hostname (e.g., using dns.resolve or dns.lookup with all records) before fetching and examine every A/AAAA result for private/loopback/link-local ranges (127.0.0.0/8, 10/8, 172.16/12, 192.168/16, 169.254/16, ::1, fe80::/10, fc00::/7, etc.); if any resolved address is in a disallowed range, call fail(`JD URL points to a private/loopback address — refusing to fetch: ${url}`) and abort fetch; keep the existing scheme check for parsedUrl.protocol and retain the hostname-string checks as a fast-fail in addition to the resolution-based validation.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@batch/batch-runner.sh`:
- Around line 176-193: The preflight checks in batch-runner.sh use
OLLAMA_BASE_URL/OLLAMA_MODEL from the runner's environment while
ollama-batch.mjs re-resolves config from .env, causing mismatches; fix by
centralizing the config source: either source the same .env in batch-runner.sh
before performing the loopback and probe checks (so OLLAMA_BASE_URL and
OLLAMA_MODEL are identical), or change the runner to not assume defaults and
instead export the resolved values read by ollama-batch.mjs (ensure
OLLAMA_BASE_URL, OLLAMA_ALLOW_REMOTE and OLLAMA_MODEL are exported to the
child). Also remove or avoid forcing a hardcoded default model in the child (the
code that forces "llama3.3") so the model used in the preflight and worker are
the same.
---
Outside diff comments:
In `@batch/batch-runner.sh`:
- Around line 463-468: The block that sets low-score offers to "skipped"
(update_state called with "skipped") then returns, but the later pending-offer
selection only excludes "completed" so skipped offers get requeued; fix by
updating the pending-offer filter/query to also exclude "skipped" state (or add
"skipped" to the same exclusion list as "completed"), keeping the update_state
call and return intact; locate the logic that builds the pending-offer selection
(the filter/query that currently excludes "completed") and add "skipped" to that
condition so offers marked by update_state(..., "skipped", ...) are not picked
up again.
---
Duplicate comments:
In `@ollama-batch.mjs`:
- Around line 197-214: The current SSRF guard using parsedUrl.hostname (h) is
insufficient because a public hostname may resolve to private/loopback
addresses; update the check to perform a DNS resolution of parsedUrl.hostname
(e.g., using dns.resolve or dns.lookup with all records) before fetching and
examine every A/AAAA result for private/loopback/link-local ranges (127.0.0.0/8,
10/8, 172.16/12, 192.168/16, 169.254/16, ::1, fe80::/10, fc00::/7, etc.); if any
resolved address is in a disallowed range, call fail(`JD URL points to a
private/loopback address — refusing to fetch: ${url}`) and abort fetch; keep the
existing scheme check for parsedUrl.protocol and retain the hostname-string
checks as a fast-fail in addition to the resolution-based validation.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: cb004ebb-417e-4c8c-9fb2-440bab210196
📒 Files selected for processing (2)
batch/batch-runner.shollama-batch.mjs
batch/batch-runner.sh: - Source .env from project root early (set -a / set +a) so OLLAMA_BASE_URL, OLLAMA_MODEL etc. are visible to both preflight checks and child workers without requiring users to export them from their shell. Removes the OLLAMA_MODEL hard-override on the node invocation that conflicted with values set only in .env. - Exclude 'skipped' offers from the pending list alongside 'completed', so low-score offers are not re-queued and assigned new report numbers on every subsequent batch run. ollama-batch.mjs: - Strengthen SSRF guard against DNS rebinding: after the fast-fail hostname-literal check, resolve A and AAAA records via dns.resolve4 / dns.resolve6 and reject any address that falls in a private/loopback/ link-local range. DNS errors are allowed to fall through to fetch() so NXDOMAIN surfaces as a clear network error rather than a silent block. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@ollama-batch.mjs`:
- Around line 147-149: The readFile helper (function readFile) currently only
checks existsSync and then calls readFileSync, which will throw if the path is a
directory or unreadable; update readFile (and the similar usage at the other
occurrence) to first stat the path (fs.statSync or fs.lstatSync) and ensure it's
a regular file, then wrap the readFileSync call in a try/catch to return the
same placeholder string (`[${label} not found — skipping]` or a clear unreadable
placeholder) on error instead of throwing; make sure to catch and handle
permission errors, ENOENT, and EISDIR so the worker emits the structured failure
JSON rather than crashing.
- Around line 452-456: sanitizeTsv currently strips tabs/newlines but leaves
cells vulnerable to spreadsheet formula injection; update sanitizeTsv (used for
company, role and notesStr before building tsvLine) to detect a leading =,+,- or
@ after trimming and prefix it (e.g., with a single quote "'" or a space) so the
cell will be treated as plain text, then return the sanitized value; ensure you
apply this updated sanitizeTsv in the tsvLine construction where company, role
and notesStr are passed.
- Line 114: The switch case handling '--min-score' currently sets minScore with
"minScore = parseFloat(args[++i]) || 0" which silently converts invalid inputs
to 0; change it to parse the value with parseFloat into a temp (e.g., parsed =
parseFloat(args[++i])) and if parsed is NaN or the argument is missing, reject
the input by printing an error and exiting (or throwing) instead of assigning 0,
otherwise assign minScore = parsed; update the '--min-score' case and any
validation around minScore to ensure only valid numeric scores are accepted.
- Around line 252-256: The fetch call using the variable url currently follows
redirects automatically, allowing SSRF via redirect to private IPs; update the
fetch invocation where AbortSignal.timeout is used (the fetch call that sets
headers and signal) to disable automatic redirects by adding redirect: 'manual'
and then explicitly handle follow-up redirects: inspect the response's Location
header, validate the redirected hostname/IP using the same hostname/IP
validation logic you used earlier, and only perform a new fetch if the
redirected target passes validation (re-using the private-host check).
Alternatively, if you prefer following redirects, re-run the exact hostname/IP
validation on the final resolved URL before processing the response. Ensure the
validation logic referenced (the hostname/IP check at lines ~238–246) is applied
to any redirected Location targets.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 112b0857-4360-4a13-8eb5-c5ac99e61ab9
📒 Files selected for processing (2)
batch/batch-runner.shollama-batch.mjs
…ifer#680 round 3) - Validate --min-score at parse time: replace `parseFloat(...) || 0` (silently ignores invalid input) with Number.isFinite check + exit 1 so a typo like `--min-score foo` is caught immediately. - Harden readFile helper: wrap readFileSync in try/catch so permission errors, directory paths, and broken symlinks return a placeholder string instead of crashing the worker with a raw stack trace. - Block SSRF via redirect: replace auto-following fetch with a manual redirect loop (redirect:'manual', max 5 hops). Each Location header is parsed, protocol-checked, hostname-literal-checked, and DNS-resolved against private ranges before following. Mirrors the same guards applied to the initial URL. - Prevent spreadsheet formula injection in TSV output: sanitizeTsv now prefixes cells starting with =, +, -, or @ with a single quote so Excel/Google Sheets treats them as text rather than formulas. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@ollama-batch.mjs`:
- Around line 206-208: The jdFile read can throw and crash the worker; replace
the direct readFileSync usage in the jdFile existence branch by calling the
existing readFile helper (or wrap readFileSync in a try/catch) so unreadable
files (permissions, directories, broken symlinks) are handled gracefully and
jdText is set to a safe default or empty string; update the block that sets
jdText (references: jdFile, jdText) to use readFile(...) and on error log or
swallow the error and leave jdText as '' so the script still emits the expected
structured JSON.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 0da2b71f-ed12-402c-9986-42610c5f74c3
📒 Files selected for processing (1)
ollama-batch.mjs
…ound 4) The jdFile read path called readFileSync directly, bypassing the safe readFile() helper added in the previous round. A permission error or directory path would throw an uncaught exception, crashing the worker without emitting the structured JSON that batch-runner.sh expects. Now calls fail() with a clear message on any read error. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@ollama-batch.mjs`:
- Around line 487-508: Wrap the filesystem calls that can throw
(mkdirSync(PATHS.reports, { recursive: true }) and writeFileSync(reportPath,
reportContent, 'utf-8') — also any other write/mkdir usage around lines that
create reports) in a try-catch and on error call the existing fail(error) helper
(or fail({message: ..., error}) consistent with other uses) so the script emits
the structured failure JSON instead of crashing; locate the mkdirSync and
writeFileSync invocations, surround them with try/catch, pass the caught error
to fail() and return/exit after calling fail() to prevent uncaught exceptions
after the Ollama run.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: b2db65e6-1f53-408c-8f8d-a6efa2eba198
📒 Files selected for processing (1)
ollama-batch.mjs
…er#680 round 5) The four fs calls that run after the Ollama response (mkdirSync for reports/, writeFileSync for the .md report, mkdirSync for tracker/, writeFileSync for the .tsv row) were unguarded. A permission error or ENOSPC would throw and exit the worker without emitting the structured JSON summary that batch-runner.sh parses, making the failure invisible to the orchestrator. Each call now delegates to fail() on error. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Summary
ollama-batch.mjs(new, ~415 lines) — drop-in worker replacingclaude -pin batch mode. Evaluates one offer via Ollama's OpenAI-compatible API, writesreports/<num>-<slug>-<date>.mdandbatch/tracker-additions/<id>.tsv, emits a JSON summary to stdout for the orchestrator.batch/batch-runner.sh(modified) — adds--backend [claude|ollama]flag. Backend-awarecheck_prerequisitesprobes Ollama withcurlbefore starting.process_offerdispatches tonode ollama-batch.mjsor the existingclaude -parray invocation depending on backend..env.example— documentsOLLAMA_BASE_URL,OLLAMA_MODEL,OLLAMA_TIMEOUT_MSand recommended models.package.json— addsollama:batchnpm script.Security
OLLAMA_BASE_URLmust resolve tolocalhost/127.0.0.1/::1; opt-out viaOLLAMA_ALLOW_REMOTE=1(strict equality).--report-num,--date,--batch-idvalidated by regex before use in file paths.reports/.Usage
Test plan
./batch/batch-runner.sh --backend ollama --dry-runlists pending offers without errorsOLLAMA_BASE_URL=http://remote:11434 node ollama-batch.mjs ...exits with loopback error--min-score 4.0with a low-scoring offer exits cleanly without writing files./batch/batch-runner.sh(no--backend) still works as before (Claude backend)Companion PR
PR B adds
ollama-eval.mjs(interactive single-offer evaluator, mirrorsgemini-eval.mjs).The Ollama env vars documented in
.env.examplehere serve both scripts.🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Security
Chores