Skip to content

fix(scan): isolate detect() exceptions, clarify timeout errors, fix resolve error count#599

Open
luisjo88 wants to merge 6 commits into
santifer:mainfrom
luisjo88:fix/573-reliability-fixes
Open

fix(scan): isolate detect() exceptions, clarify timeout errors, fix resolve error count#599
luisjo88 wants to merge 6 commits into
santifer:mainfrom
luisjo88:fix/573-reliability-fixes

Conversation

@luisjo88
Copy link
Copy Markdown
Contributor

@luisjo88 luisjo88 commented May 8, 2026

Context

Follow-up to #573 (plugin-based provider architecture). Three reliability issues found during review that should be addressed before merge.

Changes

R1 — detect() exception isolation (scan.mjs)

A buggy community provider whose detect() throws (e.g. bad regex, .match() on undefined) would previously propagate up through resolveProvider and crash the entire scan run before any network call was made. Wrapped the detect loop body in a try/catch — a failing provider logs a warning and is skipped; the scan continues.

R3 — Silent timeout errors (providers/_http.mjs)

When fetchWithTimeout hit the 10s deadline, AbortError surfaced as "The operation was aborted." — indistinguishable from a network drop or OS signal. Now catches AbortError explicitly and rethrows as "Request timed out after Xms".

R2 — Resolve error count missing from summary (scan.mjs)

Companies with an unknown/typo provider: field fell into resolveErrors but were absent from the skippedCount in the summary line, so targets + skipped != total enabled companies. Summary now appends , N unknown provider when any resolve errors exist.

Test plan

  • node scan.mjs --dry-run still runs without error on a valid portals.yml
  • Adding a provider with a throwing detect() to providers/ logs a warning and does not abort the scan
  • Patching a company with provider: doesnotexist shows it counted in the summary line
  • Killing network mid-fetch surfaces "timed out after 10000ms" in the error list

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Added Ashby, Greenhouse, and Lever job providers.
    • Introduced plugin-based provider architecture with automatic provider detection and explicit provider override.
    • Improved HTTP transport with request timeouts and richer error handling for more reliable fetches.
  • Documentation

    • Updated configuration example with provider routing/detection and optional provider/transport fields.

Review Change Stack

ageem23 and others added 5 commits May 4, 2026 15:19
…#521)

Refactor scan.mjs to load providers dynamically from providers/*.mjs at
startup instead of hardcoding Greenhouse/Ashby/Lever calls. Each plugin
exports { id, detect?, fetch }; tracked_companies entries can force a
specific provider via the optional `provider:` field.

New files:
- providers/_http.mjs — shared timeout-bounded fetch transport with
  body-snippet error reporting (underscore prefix excludes it from the
  loader).
- providers/_types.js — JSDoc @typedef catalog for Job, PortalEntry,
  Context, Provider. Pure docs; provider authors reference these via
  /** @typedef {import('./_types.js').Provider} Provider */ for IDE
  hints under // @ts-check.
- providers/greenhouse.mjs, providers/ashby.mjs, providers/lever.mjs
  — extracted from scan.mjs verbatim, no behavior change.

Resolution:
- Explicit `provider:` wins; otherwise each provider's detect() runs in
  alphabetical-filename order (deterministic across machines) and the
  first hit wins.
- Failed provider imports are logged and skipped — one broken plugin
  doesn't take down the scan.
- Entries with `provider: <unknown>` surface as a resolve error.

Pure refactor — no behavior change for existing portals.yml entries:
- Greenhouse/Ashby/Lever auto-detect from careers_url exactly as today.
- Source label keeps the `${provider.id}-api` suffix so existing
  scan-history.tsv rows continue to match for dedup.
- Title filter, pipeline output format, --dry-run, --company unchanged.
- New `provider:` and reserved `transport:` fields on tracked_companies
  entries are optional.

The `transport:` field is reserved for future transports (browser,
etc.); only http is implemented in Phase A.

This is Phase A of RFC santifer#521. PR santifer#454 bundled this refactor with the
browser transport, scraper provider, and Apify provider — reviewers
asked for a split, and the RFC carved that bundle into three slices:
this Phase A pure refactor, a follow-up browser/scraper PR, and a
follow-up Apify PR (closes santifer#325).

Refs: santifer#521, santifer#230
Replaces (in part): santifer#454
…adowing

The error branch declared `const body = await res.text()`, which shadowed
the outer `body` parameter (the request body). Renamed to `responseText`
for clarity. No behavior change.

Per CodeRabbit review on dry-run PR.

Refs: santifer#521
…esolve error count

- Wrap detect() call in resolveProvider in try/catch so a buggy
  community provider cannot crash the entire scan run (R1)
- Catch AbortError in _http.mjs fetchWithTimeout and rethrow with
  "timed out after Xms" message instead of the opaque AbortError (R3)
- Include unknown-provider count in the scan summary line so the
  company totals add up (R2)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 8, 2026

Welcome to career-ops, @luisjo88! Thanks for your first PR.

A few things to know:

  • Tests will run automatically — check the status below
  • Make sure you've linked a related issue (required for features)
  • Read CONTRIBUTING.md if you haven't

We'll review your PR soon. Join our Discord if you have questions.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 8, 2026

📝 Walkthrough

Walkthrough

Refactors the career portal scanner into a plugin-based system: adds provider type docs, a shared HTTP transport (with timeout, headers, and helpers), three provider modules (Ashby, Greenhouse, Lever) that detect/normalize job feeds, and updates scan.mjs to load, validate, and route to providers per tracked company.

Changes

Provider Plugin Architecture

Layer / File(s) Summary
Type Contract
providers/_types.js
JSDoc typedefs define Job, PortalEntry, DetectHit, FetchOptions, Context, and Provider interface; exports an empty ESM export for tooling.
HTTP Transport
providers/_http.mjs
Adds fetchWithTimeout() (AbortController timeout, default User-Agent, enhanced non-2xx error info), fetchJson(), fetchText(), and makeHttpCtx() to supply transport-aware fetch helpers to providers.
Provider Implementations
providers/ashby.mjs, providers/greenhouse.mjs, providers/lever.mjs
Each provider implements detect(entry) to derive an API URL from careers_url (or entry.api for Greenhouse) and fetch(entry, ctx) to call ctx.fetchJson and normalize job records to the Job shape.
Scanner Refactoring
scan.mjs
Dynamically loads providers/*.mjs (skips _*.mjs), validates and deduplicates provider modules, resolves provider per tracked company via explicit provider or ordered detect() checks, and invokes provider.fetch(company, ctx). Preserves filtering, deduplication, and writes history with source = ${provider.id}-api.
Configuration Documentation
templates/portals.example.yml
Adds comments documenting provider auto-detection precedence, optional provider override, and reserved transport field.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 63.64% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the three key reliability fixes: detect() exception isolation, timeout error clarification, and resolve error count correction in the plugin-based provider refactor.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ESLint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

ESLint skipped: no ESLint configuration detected in root package.json. To enable, add eslint to devDependencies.

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
scan.mjs (1)

285-314: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Reject malformed jobs before deduping and writing results.

The new provider boundary only checks Array.isArray(jobs). Because providers normalize missing fields to '', one bad payload can add blank URLs/titles, poison seenUrls with '', and write unusable pipeline/history rows. Validate title and url after trim, and drop invalid records before they reach the dedup/output path.

Suggested guard
       const jobs = await provider.fetch(company, ctx);
       if (!Array.isArray(jobs)) {
         throw new Error(`${provider.id}: fetch() did not return an array`);
       }
       totalFound += jobs.length;

       for (const job of jobs) {
-        if (!titleFilter(job.title)) {
+        const title = typeof job.title === 'string' ? job.title.trim() : '';
+        const url = typeof job.url === 'string' ? job.url.trim() : '';
+        const jobCompany = typeof job.company === 'string' && job.company.trim() ? job.company.trim() : company.name;
+        const location = typeof job.location === 'string' ? job.location.trim() : '';
+
+        if (!title || !url) {
+          errors.push({ company: company.name, error: `${provider.id}: dropped malformed job record` });
+          continue;
+        }
+
+        if (!titleFilter(title)) {
           totalFiltered++;
           continue;
         }
-        if (seenUrls.has(job.url)) {
+        if (seenUrls.has(url)) {
           totalDupes++;
           continue;
         }
-        const key = `${job.company.toLowerCase()}::${job.title.toLowerCase()}`;
+        const key = `${jobCompany.toLowerCase()}::${title.toLowerCase()}`;
         if (seenCompanyRoles.has(key)) {
           totalDupes++;
           continue;
         }
         // Mark as seen to avoid intra-scan dupes
-        seenUrls.add(job.url);
+        seenUrls.add(url);
         seenCompanyRoles.add(key);
         // Source label keeps the `${provider.id}-api` suffix so existing
         // scan-history.tsv rows continue to match for dedup.
-        newOffers.push({ ...job, source: `${provider.id}-api` });
+        newOffers.push({ title, url, company: jobCompany, location, source: `${provider.id}-api` });
       }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scan.mjs` around lines 285 - 314, The loop over jobs returned by
provider.fetch currently dedupes and records jobs before validating fields;
update the loop inside the for (const job of jobs) block to trim and validate
job.title and job.url (reject if title.trim() === '' or url.trim() === '' or url
is not a valid/absolute URL) before calling titleFilter or touching
seenUrls/seenCompanyRoles/newOffers; when rejecting, increment totalFiltered (or
totalFound if you prefer) and continue so malformed records never pollute
seenUrls/seenCompanyRoles or get pushed to newOffers; keep provider.id-based
source labeling unchanged for valid offers.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@providers/_types.js`:
- Around line 23-37: Add the missing JSDoc property for the optional api field
on the PortalEntry typedef: declare `@property` {string} [api] with a brief
description so providers that read entry.api (e.g., providers/greenhouse.mjs)
and TypeScript consumers won't error; update the existing PortalEntry block (the
typedef named PortalEntry) to include this property alongside name, enabled,
careers_url, provider, and transport.

In `@providers/greenhouse.mjs`:
- Around line 7-12: The resolveApiUrl function currently trusts
entry.api.includes('greenhouse') and raw regex on careers_url which allows host
confusion and SSRF; update resolveApiUrl to parse entry.api and
entry.careers_url with new URL(...) (guarding against missing/invalid values),
require the hostname to be one of an allowlist (e.g. "boards.greenhouse.io",
"greenhouse.io", "boards-api.greenhouse.io", and the regional
"job-boards.greenhouse.io" variants you support), and extract the board slug
from the pathname (not via regex on the whole string); if the URL is not valid
or not an allowlisted Greenhouse host return null. Ensure you reference and
change the resolveApiUrl function to stop using entry.api.includes and the raw
regex fallback.

In `@templates/portals.example.yml`:
- Around line 285-297: The template's provider routing block is misleading about
how tracked_companies are resolved: clarify that scan.mjs uses only an explicit
provider field or provider.detect(careers_url) to choose a provider (and thus
that scan_method values like playwright/websearch/greenhouse_api do NOT
influence provider resolution for tracked_companies); update the prose to state
that to force a provider set tracked_companies[].provider to a loaded provider
id, otherwise scan.mjs will call each provider's detect(careers_url) and pick
the first match, and add a short note that scan.mjs is used for zero-token
portal scanning (direct Greenhouse/Ashby/Lever API calls) per coding guidelines.

---

Outside diff comments:
In `@scan.mjs`:
- Around line 285-314: The loop over jobs returned by provider.fetch currently
dedupes and records jobs before validating fields; update the loop inside the
for (const job of jobs) block to trim and validate job.title and job.url (reject
if title.trim() === '' or url.trim() === '' or url is not a valid/absolute URL)
before calling titleFilter or touching seenUrls/seenCompanyRoles/newOffers; when
rejecting, increment totalFiltered (or totalFound if you prefer) and continue so
malformed records never pollute seenUrls/seenCompanyRoles or get pushed to
newOffers; keep provider.id-based source labeling unchanged for valid offers.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: dbef78b8-ddc4-4694-a4ce-dd5bf906973d

📥 Commits

Reviewing files that changed from the base of the PR and between 8e554cc and ff09198.

📒 Files selected for processing (7)
  • providers/_http.mjs
  • providers/_types.js
  • providers/ashby.mjs
  • providers/greenhouse.mjs
  • providers/lever.mjs
  • scan.mjs
  • templates/portals.example.yml

Comment thread providers/_types.js
Comment on lines +23 to +37
/**
* A single `tracked_companies` entry from `portals.yml`.
*
* Provider-specific fields are opaque to scan.mjs and validated by the
* provider itself. Examples in current providers: `api`, `careers_url`.
* Providers read these directly off the entry object — no schema enforcement
* at the framework level.
*
* @typedef {object} PortalEntry
* @property {string} name User-facing label; appears in logs and placeholders.
* @property {boolean} [enabled] Default: true.
* @property {string} [careers_url] Public listing URL; consumed by detect().
* @property {string} [provider] Explicit provider id — bypasses detect().
* @property {('http')} [transport] Default: 'http'. Reserved for future transports.
*/
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify that PortalEntry omits fields used by current providers/config.
rg -n -C2 '@typedef \{object\} PortalEntry|@property .*careers_url|@property .*provider|@property .*transport' providers/_types.js
rg -n -C2 '\bentry\.api\b|^\s+api:' providers/greenhouse.mjs templates/portals.example.yml

Repository: santifer/career-ops

Length of output: 15091


🏁 Script executed:

#!/bin/bash
# Check if other providers access entry.api besides greenhouse
rg -l 'entry\.api' providers/
# List all provider files
fd -e mjs providers/ | head -20

Repository: santifer/career-ops

Length of output: 452


🏁 Script executed:

#!/bin/bash
# Search for other entry.* accesses in providers to find undocumented fields
rg 'entry\.[a-zA-Z_]+' providers/ --only-matching -h | sort | uniq -c | sort -rn

Repository: santifer/career-ops

Length of output: 10379


🏁 Script executed:

#!/bin/bash
# Search for entry.* patterns in provider files more carefully
rg 'entry\.' providers/*.mjs -o | grep -o 'entry\.[a-zA-Z_]*' | sort | uniq -c | sort -rn

Repository: santifer/career-ops

Length of output: 78


🏁 Script executed:

#!/bin/bash
# Find all entry.FIELDNAME accesses in providers
rg -n 'entry\.[a-zA-Z_]+' providers/*.mjs

Repository: santifer/career-ops

Length of output: 835


Document api on PortalEntry.

The JSDoc prose mentions api as a current provider field example, and providers/greenhouse.mjs reads entry.api, but the typedef omits it. Adding @property {string} [api] would complete the shared contract and avoid TypeScript diagnostics for providers using this field.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@providers/_types.js` around lines 23 - 37, Add the missing JSDoc property for
the optional api field on the PortalEntry typedef: declare `@property` {string}
[api] with a brief description so providers that read entry.api (e.g.,
providers/greenhouse.mjs) and TypeScript consumers won't error; update the
existing PortalEntry block (the typedef named PortalEntry) to include this
property alongside name, enabled, careers_url, provider, and transport.

Comment thread providers/greenhouse.mjs
Comment on lines +7 to +12
function resolveApiUrl(entry) {
if (entry.api && entry.api.includes('greenhouse')) return entry.api;
const url = entry.careers_url || '';
const match = url.match(/job-boards(?:\.eu)?\.greenhouse\.io\/([^/?#]+)/);
if (match) return `https://boards-api.greenhouse.io/v1/boards/${match[1]}/jobs`;
return null;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Tighten Greenhouse URL validation before fetching.

entry.api.includes('greenhouse') will accept arbitrary hosts such as https://greenhouse.attacker.tld/..., and the fallback regex still parses the raw string instead of the URL host. That makes host confusion possible and also misses valid boards.greenhouse.io boards. Parse with new URL(), require an allowlisted Greenhouse hostname, and derive the slug from the pathname.

Suggested hardening
 function resolveApiUrl(entry) {
-  if (entry.api && entry.api.includes('greenhouse')) return entry.api;
-  const url = entry.careers_url || '';
-  const match = url.match(/job-boards(?:\.eu)?\.greenhouse\.io\/([^/?#]+)/);
-  if (match) return `https://boards-api.greenhouse.io/v1/boards/${match[1]}/jobs`;
+  const parse = (value) => {
+    try {
+      return new URL(value);
+    } catch {
+      return null;
+    }
+  };
+
+  const explicit = entry.api ? parse(entry.api) : null;
+  if (explicit?.protocol === 'https:' && explicit.hostname === 'boards-api.greenhouse.io') {
+    return explicit.toString();
+  }
+
+  const careers = entry.careers_url ? parse(entry.careers_url) : null;
+  if (!careers || careers.protocol !== 'https:') return null;
+  if (!['job-boards.greenhouse.io', 'job-boards.eu.greenhouse.io', 'boards.greenhouse.io'].includes(careers.hostname)) {
+    return null;
+  }
+
+  const [slug] = careers.pathname.split('/').filter(Boolean);
+  return slug ? `https://boards-api.greenhouse.io/v1/boards/${slug}/jobs` : null;
-  return null;
 }

As per coding guidelines **/*.mjs: Check for command injection, path traversal, and SSRF. Ensure scripts handle missing data/ directories gracefully.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@providers/greenhouse.mjs` around lines 7 - 12, The resolveApiUrl function
currently trusts entry.api.includes('greenhouse') and raw regex on careers_url
which allows host confusion and SSRF; update resolveApiUrl to parse entry.api
and entry.careers_url with new URL(...) (guarding against missing/invalid
values), require the hostname to be one of an allowlist (e.g.
"boards.greenhouse.io", "greenhouse.io", "boards-api.greenhouse.io", and the
regional "job-boards.greenhouse.io" variants you support), and extract the board
slug from the pathname (not via regex on the whole string); if the URL is not
valid or not an allowlisted Greenhouse host return null. Ensure you reference
and change the resolveApiUrl function to stop using entry.api.includes and the
raw regex fallback.

Comment on lines +285 to 297
#
# Provider routing (scan.mjs):
# By default each entry is auto-detected from `careers_url` against the
# loaded providers in providers/*.mjs (greenhouse, ashby, lever — the
# first one whose detect() matches wins, alphabetical order).
#
# Optional fields on a tracked_companies entry:
# provider: <id> — force a specific provider, skipping detect().
# Useful when careers_url is ambiguous or you want
# to pin behavior. Value must match a loaded
# provider's `id`.
# transport: http — reserved for future transports. Defaults to http.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Clarify that scan_method is not used for tracked-company provider resolution.

This new block explains provider routing, but the section still reads as if scan_method: playwright/websearch/greenhouse_api controls these entries. In the current implementation, scan.mjs only uses explicit provider or detect(careers_url), so entries like OpenAI look configured here but are skipped as "no provider matched". Please make that distinction explicit in the template.

As per coding guidelines scan.mjs: Use scan.mjs for zero-token portal scanning — hits Greenhouse/Ashby/Lever APIs directly, zero LLM cost.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@templates/portals.example.yml` around lines 285 - 297, The template's
provider routing block is misleading about how tracked_companies are resolved:
clarify that scan.mjs uses only an explicit provider field or
provider.detect(careers_url) to choose a provider (and thus that scan_method
values like playwright/websearch/greenhouse_api do NOT influence provider
resolution for tracked_companies); update the prose to state that to force a
provider set tracked_companies[].provider to a loaded provider id, otherwise
scan.mjs will call each provider's detect(careers_url) and pick the first match,
and add a short note that scan.mjs is used for zero-token portal scanning
(direct Greenhouse/Ashby/Lever API calls) per coding guidelines.

ageem23 added a commit to ageem23/career-ops that referenced this pull request May 8, 2026
…ount

- Wrap detect() in try/catch inside resolveProvider so a buggy community
  provider can't crash the entire scan (R1). Mirrors the catch-and-skip
  pattern the loader already applies to failed provider imports.
- Append unknown-provider count to the scan summary line so the totals
  reconcile when a tracked_companies entry references a non-loaded
  provider id (R2). Closes an accounting gap the refactor itself
  introduced.

R3 (AbortError → "timed out after Xms") deferred to a follow-up
hardening PR — pre-refactor fetchJson surfaced AbortError the same
opaque way, so it's a behavior improvement rather than a refactor.

Refs: santifer#521, santifer#599

Co-Authored-By: Luis Johan <luis.j.lithgow@gmail.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@ageem23
Copy link
Copy Markdown
Contributor

ageem23 commented May 8, 2026

Thanks for the careful review @luisjo88 — really appreciate the thinking here.

Cherry-picked R1 (detect() isolation) and R2 (unknown-provider count) into #573 with you as co-author. Both fit Phase A: R1 extends the loader's existing catch-and-skip pattern, and R2 closes an accounting gap the refactor itself introduced.

R3 I'd like to defer — pre-refactor fetchJson already surfaced AbortError the same opaque way, so changing the message is a behavior improvement rather than a refactor. Belongs in the same follow-up hardening PR I flagged for SSRF. Happy to co-sign if you open it.

The bigger picture: moving to the plugin pattern unblocks contributors who want to add their own scanners, so getting the refactor merged is the priority right now. There's a lot of activity around people wanting to contribute scanners — a separate repo of community plugins would be a really useful next step once the contract is stable.

@santifer
Copy link
Copy Markdown
Owner

Hey @luisjo88, this one developed conflicts after we merged your #602 + the batch of CV/eval fixes. Could you rebase onto current main? The reliability improvements here (isolated detect() exceptions + accurate error counts) are a clean win — happy to merge once it's clean.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
templates/portals.example.yml (1)

318-330: 🧹 Nitpick | 🔵 Trivial

Past concern about scan_method confusion remains partially unaddressed.

The new documentation correctly explains how scan.mjs resolves providers (via explicit provider: field or auto-detection from careers_url), but it doesn't clarify that the scan_method field (still present in many entries below, e.g., lines 342, 374, 381) is not used by scan.mjs for provider resolution.

Users may still be confused about which field controls behavior. Consider adding a brief note that explicitly states: "Note: The scan_method field is not used by scan.mjs provider routing. Use provider: to force a specific provider."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@templates/portals.example.yml` around lines 318 - 330, The docs around
provider routing in scan.mjs are unclear about the unused scan_method field;
update the documentation text near the provider routing block to explicitly
state that scan_method is not used by scan.mjs for provider resolution and that
to force a provider users must set the provider: field; reference scan.mjs, the
scan_method field name, and the provider field name in the single-sentence
clarification so readers know which field controls routing.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In `@templates/portals.example.yml`:
- Around line 318-330: The docs around provider routing in scan.mjs are unclear
about the unused scan_method field; update the documentation text near the
provider routing block to explicitly state that scan_method is not used by
scan.mjs for provider resolution and that to force a provider users must set the
provider: field; reference scan.mjs, the scan_method field name, and the
provider field name in the single-sentence clarification so readers know which
field controls routing.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 5a7310cd-f6ef-4dd8-b984-dfb5f4bbb491

📥 Commits

Reviewing files that changed from the base of the PR and between ff09198 and 37b6359.

📒 Files selected for processing (2)
  • scan.mjs
  • templates/portals.example.yml

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants