Skip to content

feat(scan): add Workable, SmartRecruiters, Recruitee ATS parsers#653

Open
jrojomartinez wants to merge 7 commits into
santifer:mainfrom
jrojomartinez:feat/ats-parsers-workable-smartrecruiters-recruitee
Open

feat(scan): add Workable, SmartRecruiters, Recruitee ATS parsers#653
jrojomartinez wants to merge 7 commits into
santifer:mainfrom
jrojomartinez:feat/ats-parsers-workable-smartrecruiters-recruitee

Conversation

@jrojomartinez
Copy link
Copy Markdown

@jrojomartinez jrojomartinez commented May 14, 2026

Rewritten for the 1.8.0 plugin-based provider architecture. The original PR edited detectApi / PARSERS in scan.mjs, but those have been removed in 1.8.0 — providers now live in providers/*.mjs and follow the contract in providers/_types.js. This PR delivers three new provider files.

Summary

Adds Workable, SmartRecruiters, and Recruitee as zero-token providers. Strictly additive — existing providers untouched; a user with none of these in tracked_companies sees no behaviour change.

Files

  • providers/workable.mjs — markdown-feed parser (Workable's only no-auth surface)
  • providers/smartrecruiters.mjs — public /postings API
  • providers/recruitee.mjs — public /api/offers/ per-tenant API
  • test-all.mjs — adds §11 / §12 / §13 with ~27 unit-test assertions
  • templates/portals.example.yml — documents the new URL patterns

Design note — fetchText is already there

1.8.0's providers/_http.mjs exports both fetchJson and fetchText. Workable's documented JSON API requires an auth token and the legacy unauthenticated endpoint 404s universally; the only no-auth public feed is a Markdown document at apply.workable.com/{slug}/jobs.md. The Workable provider uses ctx.fetchText + the new parseWorkableMarkdown parser. No _http.mjs changes needed.

SSRF defence (matches providers/greenhouse.mjs)

Each provider:

  1. Parses the resolved URL via new URL(...).
  2. Asserts https: protocol.
  3. Hostname allowlist (apply.workable.com, api.smartrecruiters.com) — or regex for Recruitee since slugs vary per tenant (^[a-z0-9][a-z0-9-]*\.recruitee\.com$).
  4. redirect: 'error' on the fetch call to prevent server-side-redirect SSRF.

Tests

  • node test-all.mjs --quick passes (upstream baseline + ~27 new assertions across §11 / §12 / §13)
  • Each provider's detect() matches its URL pattern and returns null otherwise
  • Each provider's parser handles the documented response shape AND degenerate inputs (empty/null) without crashing
  • Workable parser strips .md suffix; SmartRecruiters parser rewrites j.ref to the public hostname; Recruitee parser prefers careers_url over url
  • fetch() honours the hostname allowlist (sample test exercises the success path)

Validated downstream

  • Workable: optimile (Ghent / Belgium / Hybrid)
  • SmartRecruiters: sgs (known-active tenant)
  • Recruitee: channable

Summary by CodeRabbit

  • New Features
    • Added Recruitee, SmartRecruiters, and Workable integrations — automatic detection, fetching, and normalized job records (title, location, company, apply link). Includes pagination for SmartRecruiters, markdown feed support for Workable, and stricter URL/hostname validation for safety.
  • Tests
    • Added provider-specific unit and integration tests covering detection, parsing, pagination, URL handling, security checks, and edge cases.
  • Documentation
    • Updated example config with notes on provider auto-detection, recognized careers URL patterns, and detection precedence.

Review Change Stack

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 14, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 30841143-24c9-4056-86ef-deb567293a5d

📥 Commits

Reviewing files that changed from the base of the PR and between 09b6f2b and 434375b.

📒 Files selected for processing (4)
  • providers/recruitee.mjs
  • providers/smartrecruiters.mjs
  • providers/workable.mjs
  • test-all.mjs

📝 Walkthrough

Walkthrough

Adds three providers (Workable, SmartRecruiters, Recruitee) that derive tenant feed/API URLs from careers URLs, validate HTTPS and allowlisted hostnames, fetch with redirects disabled, parse responses into normalized job objects, and add tests and documentation.

Changes

Job Feed Providers with Auto-Detection and Parsing

Layer / File(s) Summary
Workable provider with markdown feed parsing
providers/workable.mjs, test-all.mjs (lines 317–456)
Detects Workable tenant slugs and derives https://apply.workable.com/<slug>/jobs.md, enforces HTTPS on apply.workable.com, fetches markdown with redirect: 'error', parses markdown tables for [View](...) rows, extracts title/location and normalizes job URLs (strips .md). Tests validate detect/fetch, parsing, edge cases, and SSRF host allowlisting.
SmartRecruiters provider with API validation and response parsing
providers/smartrecruiters.mjs, test-all.mjs (lines 457–645)
Derives https://api.smartrecruiters.com/v1/companies/<slug>/postings from careers/jobs URLs, enforces HTTPS and api.smartrecruiters.com, fetches JSON (redirect: 'error') with pagination and early-exit logic, and normalizes postings (title, rewritten/synthesized URL, company, formatted location with Remote). Tests cover detection patterns, parsing, URL rewriting, fallback generation, and pagination behavior.
Recruitee provider with offers API endpoint derivation and response normalization
providers/recruitee.mjs, test-all.mjs (lines 647–744)
Derives https://<slug>.recruitee.com/api/offers/ from <slug>.recruitee.com careers URLs, enforces HTTPS and tenant-subdomain pattern, fetches JSON (redirect: 'error'), and normalizes offers preferring careers_url over url, composing location from explicit fields or city/country with Remote appended. Tests validate detection, parsing rules, and safety for missing/invalid inputs.
Provider auto-detection configuration guidance
templates/portals.example.yml
Adds comments explaining provider auto-detection via detect(), lists supported provider URL patterns, and clarifies that an explicit provider: field overrides auto-detection.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related issues

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 36.36% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: adding three new ATS provider parsers (Workable, SmartRecruiters, Recruitee) as requested in the PR objectives.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ESLint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

ESLint skipped: no ESLint configuration detected in root package.json. To enable, add eslint to devDependencies.

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Workable's documented JSON API requires an auth token; the only
no-auth public surface is a Markdown feed at
`apply.workable.com/<slug>/jobs.md`. The provider auto-detects from
the `apply.workable.com/<slug>` careers_url pattern, fetches via
ctx.fetchText, and parses the table rows.

Follows the SSRF defence pattern from providers/greenhouse.mjs:
hostname allowlist + URL parse + HTTPS check + redirect:'error' on
the fetch call.

Exports parseWorkableMarkdown as a named export so test-all.mjs §11
can unit-test the parser independently of the network.

Tests in test-all.mjs §11:
  - detect() resolves apply.workable.com/<slug> → /jobs.md feed
  - detect() returns null for non-workable URLs
  - parseWorkableMarkdown extracts title/location/company correctly
  - parseWorkableMarkdown strips .md suffix from job URLs
  - empty / null inputs yield empty results without crashing
  - fetch() with allowed hostname reaches the http context

Refs santifer#651
Auto-detects from careers_url pattern
`https://(careers|jobs).smartrecruiters.com/<slug>` and hits the
public /postings endpoint. tracked_companies entries can also set
`provider: smartrecruiters` to bypass detection (useful when the
public careers URL is a branded custom domain like `careers.adyen.com`).

Follows the SSRF defence pattern from providers/greenhouse.mjs:
hostname allowlist (api.smartrecruiters.com) + URL parse + HTTPS
check + redirect:'error'.

Notable parse decisions:
  - location: prefer location.fullLocation; else assemble from
    city/region/country (skipping empties); append "Remote" when
    location.remote is true.
  - url: rewrite j.ref's api.smartrecruiters.com prefix to
    jobs.smartrecruiters.com so the link points at the public job
    page, not the API. Falls back to a synthetic URL when ref is
    missing.

Exports parseSmartRecruitersResponse as a named export so
test-all.mjs §12 can unit-test the parser.

Tests in test-all.mjs §12:
  - detect() resolves both careers.* and jobs.* hostnames
  - detect() returns null for non-SR URLs
  - parser uses fullLocation when present
  - parser assembles city/country/remote when fullLocation absent
  - parser rewrites api.smartrecruiters.com → jobs.smartrecruiters.com
  - parser synthesises a URL when ref is missing
  - empty / malformed inputs yield empty results without crashing

Refs santifer#651
Auto-detects from careers_url pattern `https://<slug>.recruitee.com`
and hits the public /api/offers/ endpoint. tracked_companies entries
can also set `provider: recruitee` to bypass detection.

SSRF defence: per-tenant subdomains are the variable part, so a
static hostname allowlist isn't workable. Uses a regex match on
`<safe-slug>.recruitee.com` (`^[a-z0-9][a-z0-9-]*\.recruitee\.com$`)
+ HTTPS check + redirect:'error'. The regex constrains the slug to
safe characters, preventing attacker-controlled hostnames from
slipping through.

Notable parse decisions:
  - url: prefer `careers_url` (the public job page), fall back to
    `url` (some installs use it instead), empty string otherwise.
  - location: prefer the explicit `location` field; else assemble
    from city/country with "Remote" appended when remote is true.

Exports parseRecruiteeResponse as a named export for tests.

Tests in test-all.mjs §13:
  - detect() resolves <slug>.recruitee.com → /api/offers/
  - detect() returns null for non-recruitee URLs
  - parser prefers careers_url over url
  - parser assembles location from city/country/remote
  - parser uses explicit location field when present
  - empty / null inputs yield empty results without crashing

Refs santifer#651
@jrojomartinez jrojomartinez force-pushed the feat/ats-parsers-workable-smartrecruiters-recruitee branch from f88f2a2 to a67e794 Compare May 16, 2026 16:29
@jrojomartinez jrojomartinez marked this pull request as ready for review May 16, 2026 16:29
@jrojomartinez
Copy link
Copy Markdown
Author

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 16, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@providers/recruitee.mjs`:
- Around line 26-30: resolveApiUrl currently uses a regex on the whole
careers_url and can be tricked by arbitrary strings; instead, parse
entry.careers_url with the URL constructor, verify protocol === 'https:',
validate hostname exactly matches the pattern "<slug>.recruitee.com" (where slug
matches /^[a-z0-9][a-z0-9-]*$/) by splitting hostname on '.' and checking parts
length and values, then extract slug from the hostname and return
`https://${slug}.recruitee.com/api/offers/`; ensure resolveApiUrl catches URL
parsing errors and returns null for missing, non-https, or non-matching
hostnames to avoid SSRF/command-injection/path-traversal risks.

In `@providers/smartrecruiters.mjs`:
- Around line 26-30: The resolveApiUrl function should parse entry.careers_url
with the URL constructor (guarding with try/catch for invalid/missing values),
then require urlObj.hostname to equal exactly "careers.smartrecruiters.com" or
"jobs.smartrecruiters.com" before extracting the slug from urlObj.pathname
(e.g., the first non-empty path segment) and returning the same API string
(https://api.smartrecruiters.com/v1/companies/{slug}/postings?limit=100&offset=0&status=PUBLIC);
if parsing fails, hostname doesn't match, or the slug is missing, return null.
- Around line 76-78: Validate and parse j.ref with the URL constructor before
doing any replace: check that j.ref is a valid URL whose hostname is
"api.smartrecruiters.com" and whose pathname starts with "/v1/companies/"; only
then map it to the jobs.smartrecruiters.com pattern (preserving protocol and
path parts) and otherwise fall back to a sanitized slug. Replace the current
inline replace logic for the url variable with a guarded branch: attempt to
parse j.ref, validate host/path, build the jobs URL from parsed parts if valid,
else construct the fallback using a slugified companyName (lowercase, trim,
collapse whitespace, remove/replace non-alphanumeric chars with hyphens and
strip leading/trailing hyphens) combined with j.id and slugified; ensure you
handle missing companyName/j.id safely and never trust raw j.ref to prevent
malformed URLs or SSRF.

In `@providers/workable.mjs`:
- Around line 26-30: The current resolveFeedUrl(entry) uses a substring regex
and can misdetect non-Workable URLs; instead, parse entry.careers_url with new
URL() inside resolveFeedUrl, catch any thrown errors and return null for
missing/invalid URLs, verify url.protocol === 'https:' and url.hostname ===
'apply.workable.com', then extract the slug from url.pathname (the first path
segment) and return `https://apply.workable.com/${slug}/jobs.md`; do not rely on
a regex on the raw string and ensure all error paths return null to avoid
SSRF/invalid inputs.

In `@test-all.mjs`:
- Around line 373-387: Add a true-negative SSRF test that ensures untrusted
hosts are rejected and fetchText/fetchJson are never invoked: call
workable.fetch with a careers_url like
"https://evil.example/apply.workable.com/slug" (or similar) and provide
transport handlers where fetchText and fetchJson throw if called; then assert
workable.fetch rejects (or throws) for that input so the test verifies the
untrusted-host path rejects before any network helper is invoked. Reference
workable.fetch and the transport methods fetchText/fetchJson when making the
change.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 3c7a9f83-d383-4497-ac17-9b85efab2eb7

📥 Commits

Reviewing files that changed from the base of the PR and between 5d1f3a3 and a67e794.

📒 Files selected for processing (5)
  • providers/recruitee.mjs
  • providers/smartrecruiters.mjs
  • providers/workable.mjs
  • templates/portals.example.yml
  • test-all.mjs

Comment thread providers/recruitee.mjs Outdated
Comment thread providers/smartrecruiters.mjs Outdated
Comment thread providers/smartrecruiters.mjs Outdated
Comment thread providers/workable.mjs Outdated
Comment thread test-all.mjs Outdated
Pre-emptive hardening following the same defensive pattern CodeRabbit
flagged on PR santifer#652. All changes are within the providers shipped in
this PR; no scan.mjs / framework changes.

- All three providers: `careers_url` is now type-checked before .match()
  so a non-string YAML value (number, object, array) returns null from
  detect() rather than throwing.

- smartrecruiters: ref-rewrite uses an anchored regex
  (`/^https:\/\/api\.smartrecruiters\.com\/v1\/companies\//`) so the
  replacement only fires at the URL prefix. The fallback URL path (when
  both j.ref AND j.id are missing) now returns an empty string instead
  of synthesising a URL containing the literal "undefined" — the empty
  string is the contract-allowed default for url per _types.js > Job.
  Magic 100 in the postings limit is now a named SR_PAGE_SIZE constant.

- workable: parseWorkableMarkdown now extracts URLs via a line-level
  regex `/\[View\]\(([^)]+)\)/` rather than a column-position match,
  so a title containing a stray `|` doesn't shift cols[7] and silently
  drop the URL. Rows that still don't resolve a URL are skipped (no
  empty-URL entries leak into the dedup tracker).

- test-all.mjs: 6 new assertions covering the defensive paths
  (non-string careers_url across all 3 providers, the SR no-ref/no-id
  fallback, the Workable stray-pipe survival, and a real Workable
  fetch() rejection test against an unresolvable careers_url).

Refs santifer#651
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
providers/smartrecruiters.mjs (1)

79-79: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Slugify companyName in fallback URL construction.

When j.ref is missing, the fallback URL uses (companyName || '').toLowerCase() directly, which preserves spaces and special characters (e.g., "SGS Group" → "sgs group"). This produces malformed URL paths.

🔧 Suggested fix
+  const companySlug = (companyName || '').toLowerCase().replace(/[^a-z0-9]+/g, '-').replace(/^-|-$/g, '');
   const url = j.ref
     ? j.ref.replace(/^https:\/\/api\.smartrecruiters\.com\/v1\/companies\//, 'https://jobs.smartrecruiters.com/')
-    : j.id ? `https://jobs.smartrecruiters.com/${(companyName || '').toLowerCase()}/${j.id}-${slugified}` : '';
+    : j.id ? `https://jobs.smartrecruiters.com/${companySlug}/${j.id}-${slugified}` : '';
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@providers/smartrecruiters.mjs` at line 79, The fallback URL uses (companyName
|| '').toLowerCase() which leaves spaces/special chars unescaped; update the
ternary branch that builds the URL for j.id to slugify companyName the same way
as the existing slugified job name (use the same slugifying logic/helper used to
compute slugified) and insert that slugifiedCompanyName in place of (companyName
|| '').toLowerCase() so the URL path becomes
https://jobs.smartrecruiters.com/{slugifiedCompanyName}/{j.id}-{slugified}.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In `@providers/smartrecruiters.mjs`:
- Line 79: The fallback URL uses (companyName || '').toLowerCase() which leaves
spaces/special chars unescaped; update the ternary branch that builds the URL
for j.id to slugify companyName the same way as the existing slugified job name
(use the same slugifying logic/helper used to compute slugified) and insert that
slugifiedCompanyName in place of (companyName || '').toLowerCase() so the URL
path becomes
https://jobs.smartrecruiters.com/{slugifiedCompanyName}/{j.id}-{slugified}.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: b1eec1a7-9d2d-41d2-98cb-3e14f71d0722

📥 Commits

Reviewing files that changed from the base of the PR and between a67e794 and fcab2cc.

📒 Files selected for processing (4)
  • providers/recruitee.mjs
  • providers/smartrecruiters.mjs
  • providers/workable.mjs
  • test-all.mjs

Addresses 5 CodeRabbit comments on PR santifer#653 asking for tighter
validation than substring regex on raw URL strings.

- All 3 providers: detect()/resolveXxxUrl() now use new URL() to
  parse careers_url, verify protocol === 'https:', check hostname
  exactly (Workable: apply.workable.com; SmartRecruiters:
  careers./jobs.smartrecruiters.com; Recruitee: regex-validated
  <slug>.recruitee.com), then derive the slug from the parsed
  pathname/hostname. This rejects path-spoofed inputs like
  https://evil.example/apply.workable.com/slug (substring regex
  would have falsely matched).

- smartrecruiters parseSmartRecruitersResponse: j.ref is now
  validated (parses as URL, hostname must be api.smartrecruiters.com,
  pathname must start with /v1/companies/) before the prefix rewrite.
  Invalid refs fall through to the fallback URL path. The fallback
  companyName is now slugified (non-alphanumerics → -, strip
  leading/trailing -) so "My Acme & Co." → "my-acme-co" rather than
  producing a URL with raw spaces/symbols.

- test-all.mjs: 5 new assertions covering the path-spoof rejection
  for all 3 providers, the untrusted-ref-host fall-through, and the
  companyName slugification.

Refs santifer#651
@jrojomartinez
Copy link
Copy Markdown
Author

Thanks for the thorough review — addressed all 5 comments in the latest force-push. Summary:

  • All 3 providers: resolveXxxUrl now uses new URL() + strict hostname checks instead of substring regex. Path-spoofed URLs (e.g. https://evil.example/apply.workable.com/slug) no longer falsely match.
  • SmartRecruiters j.ref: validated via new URL() (hostname must be api.smartrecruiters.com, pathname must start with /v1/companies/) before the prefix rewrite. Invalid refs fall through to the fallback URL path.
  • SmartRecruiters fallback slug: companyName is now slugified (non-alphanumerics → -, strip leading/trailing) so a name like "My Acme & Co." produces my-acme-co instead of leaking spaces/symbols.
  • Workable / SR / Recruitee tests: added path-spoof rejection tests for each. The j.ref and slugification cases also get explicit tests in §12.

Ready for re-review.

@jrojomartinez
Copy link
Copy Markdown
Author

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 16, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

1 similar comment
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 16, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@providers/recruitee.mjs`:
- Around line 80-84: parseRecruiteeResponse currently returns j.careers_url or
j.url verbatim which can lead to off-domain or non-HTTPS job.url values; modify
parseRecruiteeResponse to validate and normalize the URL: parse
j.careers_url/j.url with the URL constructor, ensure protocol === 'https:' and
hostname matches the tenant API host (or an allowed host list derived from the
request/tenant), and only return the validated URL (otherwise return '' or
undefined); reference parseRecruiteeResponse, job.url, j.careers_url, and j.url
when implementing this check.

In `@providers/smartrecruiters.mjs`:
- Line 41: The current postings URL builder hardcodes limit and offset (uses
SR_PAGE_SIZE and offset=0) and the fetch logic only issues a single request,
truncating results; modify the postings fetch flow to page through results by
looping requests: call the URL builder (the function that returns
`https://api.smartrecruiters.com/v1/companies/${slug}/postings?...`) with an
increasing offset (offset += SR_PAGE_SIZE) and fetch repeatedly until the API
returns an empty `content` array, aggregating each response's content into a
single results array; ensure SR_PAGE_SIZE is used for limit, handle HTTP errors
as the existing fetch logic does, and return the combined list instead of a
single-page response.

In `@providers/workable.mjs`:
- Around line 84-88: In parseWorkableMarkdown(), don't trust the raw url from
urlMatch; attempt to construct a URL object from the extracted string (wrap in
try/catch to handle malformed values) and validate that urlObj.protocol ===
'https:' and urlObj.hostname matches your allowed host(s) (or the same
origin/host used when fetching the feed) before pushing to jobs; if validation
fails or URL construction throws, skip that row (i.e., do not push to jobs). Use
the existing symbols urlMatch, url, and jobs and add hostname/protocol checks
and error-safe parsing inside that loop.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: c4afeaa3-d7f3-4f45-873a-760e2b2b4cb2

📥 Commits

Reviewing files that changed from the base of the PR and between fcab2cc and 09b6f2b.

📒 Files selected for processing (4)
  • providers/recruitee.mjs
  • providers/smartrecruiters.mjs
  • providers/workable.mjs
  • test-all.mjs

Comment thread providers/recruitee.mjs
Comment thread providers/smartrecruiters.mjs Outdated
Comment thread providers/workable.mjs
Addresses 3 CodeRabbit comments on PR santifer#653 (round 2).

- recruitee: parseRecruiteeResponse now validates the offer URL via
  new URL() + protocol === 'https:' + RECRUITEE_HOST_RE hostname
  check. Off-domain or non-HTTPS values are dropped (url = '' per
  the Job contract) rather than passed through verbatim.

- workable: parseWorkableMarkdown now validates each [View] link
  the same way (hostname must be apply.workable.com, protocol
  must be https). Rows that fail validation are skipped (continue),
  matching the existing "skip rows with no resolvable URL" semantic.

- smartrecruiters: fetch() now paginates the /postings endpoint
  instead of returning only the first 100 results. Added
  resolveSlug() and buildPostingsUrl(slug, offset) helpers,
  refactored resolveApiUrl() to delegate to them, and the fetch
  loop walks offsets 0, SR_PAGE_SIZE, 2*SR_PAGE_SIZE, ... until
  either an empty page or a short page (less than SR_PAGE_SIZE).
  Safety cap SR_MAX_PAGES = 50 (= 5000 postings) prevents runaway
  loops against a broken API.

- test-all.mjs: 4 new assertions
  - Workable: off-domain + non-https [View] links are dropped
  - Recruitee: off-domain + non-https + missing offer URLs → url=''
  - SmartRecruiters: 2-page aggregation (150 items across 2 pages)
  - SmartRecruiters: stop on the first empty page (1 request)

Refs santifer#651
@jrojomartinez
Copy link
Copy Markdown
Author

Round 2 fixes pushed:

  • Recruitee parser: parseRecruiteeResponse now validates each offer URL via new URL() + https: + RECRUITEE_HOST_RE. Off-domain / non-HTTPS values become url: "" rather than leaking through.
  • Workable parser: parseWorkableMarkdown validates each [View] link the same way (apply.workable.com + https:). Rows that fail validation are skipped.
  • SmartRecruiters pagination: fetch() now loops offsets 0, 100, 200, ... (using SR_PAGE_SIZE) until it sees an empty page or a short page. Refactored into resolveSlug() + buildPostingsUrl(slug, offset) helpers. Safety cap at 50 pages (5000 postings).
  • Tests: 4 new assertions cover URL validation for both parsers and the 2-page-aggregation / empty-page-stop paths for SR pagination.

Ready for re-review.

@jrojomartinez
Copy link
Copy Markdown
Author

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 16, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@jrojomartinez
Copy link
Copy Markdown
Author

@coderabbitai review — checking against latest commit (434375b) after the round-2 fixes for the 3 comments from 17:04.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 16, 2026

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
echo "=== providers/workable.mjs ==="
cat providers/workable.mjs

echo ""
echo "=== providers/smartrecruiters.mjs ==="
cat providers/smartrecruiters.mjs

echo ""
echo "=== providers/recruitee.mjs ==="
cat providers/recruitee.mjs

echo ""
echo "=== templates/portals.example.yml ==="
cat templates/portals.example.yml

echo ""
echo "=== test-all.mjs (tail: new sections only) ==="
# Show sections 11-13 of tests
grep -n "§1[123]\|workable\|smartrecruiters\|recruitee\|SR_PAGE\|SR_MAX\|pagination\|parseWorkable\|parseSmartRec\|parseRecruitee" test-all.mjs | head -120

Length of output: 65977


@jrojomartinez — reviewed against commit 434375b. All five original comments and the three round-2 comments are fully addressed. Here's the final assessment:


✅ Round-2 fixes verified

Issue File Status
Recruitee offer URL validation providers/recruitee.mjs new URL() + https: + RECRUITEE_HOST_RE — off-domain/non-HTTPS → url: ""
Workable [View] link validation providers/workable.mjs new URL() + https: + hostname === 'apply.workable.com' — invalid rows skipped
SmartRecruiters pagination providers/smartrecruiters.mjs buildPostingsUrl(slug, offset), stops on empty page and short page, capped at SR_MAX_PAGES = 50

✅ All earlier comments confirmed resolved

  • Path-spoof prevention — all three providers use new URL() + exact hostname check (not substring/regex on the full URL string). ✅
  • j.ref validationhostname === 'api.smartrecruiters.com' + pathname.startsWith('/v1/companies/') before rewrite; untrusted refs fall through to the fallback. ✅
  • Fallback slug sanitisationcompanyName is slugified ([^a-z0-9]+-, trim leading/trailing) before embedding in the fallback URL. ✅

🔍 One minor nit (non-blocking)

In parseSmartRecruitersResponse, if a posting has an id but an empty name, the fallback URL ends with a trailing dash:

// name = '' → slugified = '' → url = `…/${j.id}-`
url = `https://jobs.smartrecruiters.com/${companySlug}/${j.id}-${slugified}`;

You could guard this with:

url = slugified
  ? `https://jobs.smartrecruiters.com/${companySlug}/${j.id}-${slugified}`
  : `https://jobs.smartrecruiters.com/${companySlug}/${j.id}`;

Not a security concern — just cosmetic for the rare case of a nameless posting.


Overall

The implementation is solid — clean architecture, proper SSRF mitigations (redirect: 'error' + allowlists/regex on every fetch), good pagination design, and thorough test coverage (§11–§13 covers detect, parsing, URL validation, path-spoof, and pagination edge cases). The change is strictly additive and existing providers are untouched. Ready to merge.

[approve]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant