feat(scan): add Workable, SmartRecruiters, Recruitee ATS parsers by jrojomartinez · Pull Request #653 · santifer/career-ops

jrojomartinez · 2026-05-14T23:10:51Z

Rewritten for the 1.8.0 plugin-based provider architecture. The original PR edited detectApi / PARSERS in scan.mjs, but those have been removed in 1.8.0 — providers now live in providers/*.mjs and follow the contract in providers/_types.js. This PR delivers three new provider files.

Summary

Adds Workable, SmartRecruiters, and Recruitee as zero-token providers. Strictly additive — existing providers untouched; a user with none of these in tracked_companies sees no behaviour change.

Files

providers/workable.mjs — markdown-feed parser (Workable's only no-auth surface)
providers/smartrecruiters.mjs — public /postings API
providers/recruitee.mjs — public /api/offers/ per-tenant API
test-all.mjs — adds §11 / §12 / §13 with ~27 unit-test assertions
templates/portals.example.yml — documents the new URL patterns

Design note — `fetchText` is already there

1.8.0's providers/_http.mjs exports both fetchJson and fetchText. Workable's documented JSON API requires an auth token and the legacy unauthenticated endpoint 404s universally; the only no-auth public feed is a Markdown document at apply.workable.com/{slug}/jobs.md. The Workable provider uses ctx.fetchText + the new parseWorkableMarkdown parser. No _http.mjs changes needed.

SSRF defence (matches `providers/greenhouse.mjs`)

Each provider:

Parses the resolved URL via new URL(...).
Asserts https: protocol.
Hostname allowlist (apply.workable.com, api.smartrecruiters.com) — or regex for Recruitee since slugs vary per tenant (^[a-z0-9][a-z0-9-]*\.recruitee\.com$).
redirect: 'error' on the fetch call to prevent server-side-redirect SSRF.

Tests

node test-all.mjs --quick passes (upstream baseline + ~27 new assertions across §11 / §12 / §13)
Each provider's detect() matches its URL pattern and returns null otherwise
Each provider's parser handles the documented response shape AND degenerate inputs (empty/null) without crashing
Workable parser strips .md suffix; SmartRecruiters parser rewrites j.ref to the public hostname; Recruitee parser prefers careers_url over url
fetch() honours the hostname allowlist (sample test exercises the success path)

Validated downstream

Workable: optimile (Ghent / Belgium / Hybrid)
SmartRecruiters: sgs (known-active tenant)
Recruitee: channable

Summary by CodeRabbit

New Features
- Added Recruitee, SmartRecruiters, and Workable integrations — automatic detection, fetching, and normalized job records (title, location, company, apply link). Includes pagination for SmartRecruiters, markdown feed support for Workable, and stricter URL/hostname validation for safety.
Tests
- Added provider-specific unit and integration tests covering detection, parsing, pagination, URL handling, security checks, and edge cases.
Documentation
- Updated example config with notes on provider auto-detection, recognized careers URL patterns, and detection precedence.

coderabbitai · 2026-05-14T23:10:57Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 30841143-24c9-4056-86ef-deb567293a5d

📥 Commits

Reviewing files that changed from the base of the PR and between 09b6f2b and 434375b.

📒 Files selected for processing (4)

providers/recruitee.mjs
providers/smartrecruiters.mjs
providers/workable.mjs
test-all.mjs

📝 Walkthrough

Walkthrough

Adds three providers (Workable, SmartRecruiters, Recruitee) that derive tenant feed/API URLs from careers URLs, validate HTTPS and allowlisted hostnames, fetch with redirects disabled, parse responses into normalized job objects, and add tests and documentation.

Changes

Job Feed Providers with Auto-Detection and Parsing

Layer / File(s)	Summary
Workable provider with markdown feed parsing `providers/workable.mjs`, `test-all.mjs` (lines 317–456)	Detects Workable tenant slugs and derives `https://apply.workable.com/<slug>/jobs.md`, enforces HTTPS on `apply.workable.com`, fetches markdown with `redirect: 'error'`, parses markdown tables for `[View](...)` rows, extracts title/location and normalizes job URLs (strips `.md`). Tests validate detect/fetch, parsing, edge cases, and SSRF host allowlisting.
SmartRecruiters provider with API validation and response parsing `providers/smartrecruiters.mjs`, `test-all.mjs` (lines 457–645)	Derives `https://api.smartrecruiters.com/v1/companies/<slug>/postings` from careers/jobs URLs, enforces HTTPS and `api.smartrecruiters.com`, fetches JSON (`redirect: 'error'`) with pagination and early-exit logic, and normalizes postings (title, rewritten/synthesized URL, company, formatted location with Remote). Tests cover detection patterns, parsing, URL rewriting, fallback generation, and pagination behavior.
Recruitee provider with offers API endpoint derivation and response normalization `providers/recruitee.mjs`, `test-all.mjs` (lines 647–744)	Derives `https://<slug>.recruitee.com/api/offers/` from `<slug>.recruitee.com` careers URLs, enforces HTTPS and tenant-subdomain pattern, fetches JSON (`redirect: 'error'`), and normalizes offers preferring `careers_url` over `url`, composing location from explicit fields or city/country with Remote appended. Tests validate detection, parsing rules, and safety for missing/invalid inputs.
Provider auto-detection configuration guidance `templates/portals.example.yml`	Adds comments explaining provider auto-detection via `detect()`, lists supported provider URL patterns, and clarifies that an explicit `provider:` field overrides auto-detection.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related issues

scan.mjs: add Workable, SmartRecruiters, Recruitee ATS parsers #651: Implements the same set of provider detection/parsing support (Workable, SmartRecruiters, Recruitee); likely related and can be linked to this PR.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 36.36% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately summarizes the main change: adding three new ATS provider parsers (Workable, SmartRecruiters, Recruitee) as requested in the PR objectives.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ESLint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

ESLint skipped: no ESLint configuration detected in root package.json. To enable, add eslint to devDependencies.

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

Generate code and open pull requests
Plan features and break down work
Investigate incidents and troubleshoot customer tickets together
Automate recurring tasks and respond to alerts with triggers
Summarize progress and report instantly

Built for teams:

Shared memory across your entire org—no repeating context
Per-thread sandboxes to safely plan and execute work
Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Workable's documented JSON API requires an auth token; the only no-auth public surface is a Markdown feed at `apply.workable.com/<slug>/jobs.md`. The provider auto-detects from the `apply.workable.com/<slug>` careers_url pattern, fetches via ctx.fetchText, and parses the table rows. Follows the SSRF defence pattern from providers/greenhouse.mjs: hostname allowlist + URL parse + HTTPS check + redirect:'error' on the fetch call. Exports parseWorkableMarkdown as a named export so test-all.mjs §11 can unit-test the parser independently of the network. Tests in test-all.mjs §11: - detect() resolves apply.workable.com/<slug> → /jobs.md feed - detect() returns null for non-workable URLs - parseWorkableMarkdown extracts title/location/company correctly - parseWorkableMarkdown strips .md suffix from job URLs - empty / null inputs yield empty results without crashing - fetch() with allowed hostname reaches the http context Refs santifer#651

Auto-detects from careers_url pattern `https://(careers|jobs).smartrecruiters.com/<slug>` and hits the public /postings endpoint. tracked_companies entries can also set `provider: smartrecruiters` to bypass detection (useful when the public careers URL is a branded custom domain like `careers.adyen.com`). Follows the SSRF defence pattern from providers/greenhouse.mjs: hostname allowlist (api.smartrecruiters.com) + URL parse + HTTPS check + redirect:'error'. Notable parse decisions: - location: prefer location.fullLocation; else assemble from city/region/country (skipping empties); append "Remote" when location.remote is true. - url: rewrite j.ref's api.smartrecruiters.com prefix to jobs.smartrecruiters.com so the link points at the public job page, not the API. Falls back to a synthetic URL when ref is missing. Exports parseSmartRecruitersResponse as a named export so test-all.mjs §12 can unit-test the parser. Tests in test-all.mjs §12: - detect() resolves both careers.* and jobs.* hostnames - detect() returns null for non-SR URLs - parser uses fullLocation when present - parser assembles city/country/remote when fullLocation absent - parser rewrites api.smartrecruiters.com → jobs.smartrecruiters.com - parser synthesises a URL when ref is missing - empty / malformed inputs yield empty results without crashing Refs santifer#651

Auto-detects from careers_url pattern `https://<slug>.recruitee.com` and hits the public /api/offers/ endpoint. tracked_companies entries can also set `provider: recruitee` to bypass detection. SSRF defence: per-tenant subdomains are the variable part, so a static hostname allowlist isn't workable. Uses a regex match on `<safe-slug>.recruitee.com` (`^[a-z0-9][a-z0-9-]*\.recruitee\.com$`) + HTTPS check + redirect:'error'. The regex constrains the slug to safe characters, preventing attacker-controlled hostnames from slipping through. Notable parse decisions: - url: prefer `careers_url` (the public job page), fall back to `url` (some installs use it instead), empty string otherwise. - location: prefer the explicit `location` field; else assemble from city/country with "Remote" appended when remote is true. Exports parseRecruiteeResponse as a named export for tests. Tests in test-all.mjs §13: - detect() resolves <slug>.recruitee.com → /api/offers/ - detect() returns null for non-recruitee URLs - parser prefers careers_url over url - parser assembles location from city/country/remote - parser uses explicit location field when present - empty / null inputs yield empty results without crashing Refs santifer#651

…erns

jrojomartinez · 2026-05-16T16:29:46Z

@coderabbitai review

coderabbitai · 2026-05-16T16:29:51Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@providers/recruitee.mjs`:
- Around line 26-30: resolveApiUrl currently uses a regex on the whole
careers_url and can be tricked by arbitrary strings; instead, parse
entry.careers_url with the URL constructor, verify protocol === 'https:',
validate hostname exactly matches the pattern "<slug>.recruitee.com" (where slug
matches /^[a-z0-9][a-z0-9-]*$/) by splitting hostname on '.' and checking parts
length and values, then extract slug from the hostname and return
`https://${slug}.recruitee.com/api/offers/`; ensure resolveApiUrl catches URL
parsing errors and returns null for missing, non-https, or non-matching
hostnames to avoid SSRF/command-injection/path-traversal risks.

In `@providers/smartrecruiters.mjs`:
- Around line 26-30: The resolveApiUrl function should parse entry.careers_url
with the URL constructor (guarding with try/catch for invalid/missing values),
then require urlObj.hostname to equal exactly "careers.smartrecruiters.com" or
"jobs.smartrecruiters.com" before extracting the slug from urlObj.pathname
(e.g., the first non-empty path segment) and returning the same API string
(https://api.smartrecruiters.com/v1/companies/{slug}/postings?limit=100&offset=0&status=PUBLIC);
if parsing fails, hostname doesn't match, or the slug is missing, return null.
- Around line 76-78: Validate and parse j.ref with the URL constructor before
doing any replace: check that j.ref is a valid URL whose hostname is
"api.smartrecruiters.com" and whose pathname starts with "/v1/companies/"; only
then map it to the jobs.smartrecruiters.com pattern (preserving protocol and
path parts) and otherwise fall back to a sanitized slug. Replace the current
inline replace logic for the url variable with a guarded branch: attempt to
parse j.ref, validate host/path, build the jobs URL from parsed parts if valid,
else construct the fallback using a slugified companyName (lowercase, trim,
collapse whitespace, remove/replace non-alphanumeric chars with hyphens and
strip leading/trailing hyphens) combined with j.id and slugified; ensure you
handle missing companyName/j.id safely and never trust raw j.ref to prevent
malformed URLs or SSRF.

In `@providers/workable.mjs`:
- Around line 26-30: The current resolveFeedUrl(entry) uses a substring regex
and can misdetect non-Workable URLs; instead, parse entry.careers_url with new
URL() inside resolveFeedUrl, catch any thrown errors and return null for
missing/invalid URLs, verify url.protocol === 'https:' and url.hostname ===
'apply.workable.com', then extract the slug from url.pathname (the first path
segment) and return `https://apply.workable.com/${slug}/jobs.md`; do not rely on
a regex on the raw string and ensure all error paths return null to avoid
SSRF/invalid inputs.

In `@test-all.mjs`:
- Around line 373-387: Add a true-negative SSRF test that ensures untrusted
hosts are rejected and fetchText/fetchJson are never invoked: call
workable.fetch with a careers_url like
"https://evil.example/apply.workable.com/slug" (or similar) and provide
transport handlers where fetchText and fetchJson throw if called; then assert
workable.fetch rejects (or throws) for that input so the test verifies the
untrusted-host path rejects before any network helper is invoked. Reference
workable.fetch and the transport methods fetchText/fetchJson when making the
change.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 3c7a9f83-d383-4497-ac17-9b85efab2eb7

📥 Commits

Reviewing files that changed from the base of the PR and between 5d1f3a3 and a67e794.

📒 Files selected for processing (5)

providers/recruitee.mjs
providers/smartrecruiters.mjs
providers/workable.mjs
templates/portals.example.yml
test-all.mjs

Pre-emptive hardening following the same defensive pattern CodeRabbit flagged on PR santifer#652. All changes are within the providers shipped in this PR; no scan.mjs / framework changes. - All three providers: `careers_url` is now type-checked before .match() so a non-string YAML value (number, object, array) returns null from detect() rather than throwing. - smartrecruiters: ref-rewrite uses an anchored regex (`/^https:\/\/api\.smartrecruiters\.com\/v1\/companies\//`) so the replacement only fires at the URL prefix. The fallback URL path (when both j.ref AND j.id are missing) now returns an empty string instead of synthesising a URL containing the literal "undefined" — the empty string is the contract-allowed default for url per _types.js > Job. Magic 100 in the postings limit is now a named SR_PAGE_SIZE constant. - workable: parseWorkableMarkdown now extracts URLs via a line-level regex `/\[View\]$([^)]+)$/` rather than a column-position match, so a title containing a stray `|` doesn't shift cols[7] and silently drop the URL. Rows that still don't resolve a URL are skipped (no empty-URL entries leak into the dedup tracker). - test-all.mjs: 6 new assertions covering the defensive paths (non-string careers_url across all 3 providers, the SR no-ref/no-id fallback, the Workable stray-pipe survival, and a real Workable fetch() rejection test against an unresolvable careers_url). Refs santifer#651

coderabbitai

♻️ Duplicate comments (1)

providers/smartrecruiters.mjs (1)

79-79: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Slugify companyName in fallback URL construction.

When j.ref is missing, the fallback URL uses (companyName || '').toLowerCase() directly, which preserves spaces and special characters (e.g., "SGS Group" → "sgs group"). This produces malformed URL paths.

🔧 Suggested fix

+  const companySlug = (companyName || '').toLowerCase().replace(/[^a-z0-9]+/g, '-').replace(/^-|-$/g, '');
   const url = j.ref
     ? j.ref.replace(/^https:\/\/api\.smartrecruiters\.com\/v1\/companies\//, 'https://jobs.smartrecruiters.com/')
-    : j.id ? `https://jobs.smartrecruiters.com/${(companyName || '').toLowerCase()}/${j.id}-${slugified}` : '';
+    : j.id ? `https://jobs.smartrecruiters.com/${companySlug}/${j.id}-${slugified}` : '';

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@providers/smartrecruiters.mjs` at line 79, The fallback URL uses (companyName
|| '').toLowerCase() which leaves spaces/special chars unescaped; update the
ternary branch that builds the URL for j.id to slugify companyName the same way
as the existing slugified job name (use the same slugifying logic/helper used to
compute slugified) and insert that slugifiedCompanyName in place of (companyName
|| '').toLowerCase() so the URL path becomes
https://jobs.smartrecruiters.com/{slugifiedCompanyName}/{j.id}-{slugified}.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In `@providers/smartrecruiters.mjs`:
- Line 79: The fallback URL uses (companyName || '').toLowerCase() which leaves
spaces/special chars unescaped; update the ternary branch that builds the URL
for j.id to slugify companyName the same way as the existing slugified job name
(use the same slugifying logic/helper used to compute slugified) and insert that
slugifiedCompanyName in place of (companyName || '').toLowerCase() so the URL
path becomes
https://jobs.smartrecruiters.com/{slugifiedCompanyName}/{j.id}-{slugified}.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: b1eec1a7-9d2d-41d2-98cb-3e14f71d0722

📥 Commits

Reviewing files that changed from the base of the PR and between a67e794 and fcab2cc.

📒 Files selected for processing (4)

providers/recruitee.mjs
providers/smartrecruiters.mjs
providers/workable.mjs
test-all.mjs

Addresses 5 CodeRabbit comments on PR santifer#653 asking for tighter validation than substring regex on raw URL strings. - All 3 providers: detect()/resolveXxxUrl() now use new URL() to parse careers_url, verify protocol === 'https:', check hostname exactly (Workable: apply.workable.com; SmartRecruiters: careers./jobs.smartrecruiters.com; Recruitee: regex-validated <slug>.recruitee.com), then derive the slug from the parsed pathname/hostname. This rejects path-spoofed inputs like https://evil.example/apply.workable.com/slug (substring regex would have falsely matched). - smartrecruiters parseSmartRecruitersResponse: j.ref is now validated (parses as URL, hostname must be api.smartrecruiters.com, pathname must start with /v1/companies/) before the prefix rewrite. Invalid refs fall through to the fallback URL path. The fallback companyName is now slugified (non-alphanumerics → -, strip leading/trailing -) so "My Acme & Co." → "my-acme-co" rather than producing a URL with raw spaces/symbols. - test-all.mjs: 5 new assertions covering the path-spoof rejection for all 3 providers, the untrusted-ref-host fall-through, and the companyName slugification. Refs santifer#651

jrojomartinez · 2026-05-16T16:59:03Z

Thanks for the thorough review — addressed all 5 comments in the latest force-push. Summary:

All 3 providers: resolveXxxUrl now uses new URL() + strict hostname checks instead of substring regex. Path-spoofed URLs (e.g. https://evil.example/apply.workable.com/slug) no longer falsely match.
SmartRecruiters j.ref: validated via new URL() (hostname must be api.smartrecruiters.com, pathname must start with /v1/companies/) before the prefix rewrite. Invalid refs fall through to the fallback URL path.
SmartRecruiters fallback slug: companyName is now slugified (non-alphanumerics → -, strip leading/trailing) so a name like "My Acme & Co." produces my-acme-co instead of leaking spaces/symbols.
Workable / SR / Recruitee tests: added path-spoof rejection tests for each. The j.ref and slugification cases also get explicit tests in §12.

Ready for re-review.

jrojomartinez · 2026-05-16T16:59:36Z

@coderabbitai review

coderabbitai · 2026-05-16T16:59:40Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai · 2026-05-16T17:04:01Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@providers/recruitee.mjs`:
- Around line 80-84: parseRecruiteeResponse currently returns j.careers_url or
j.url verbatim which can lead to off-domain or non-HTTPS job.url values; modify
parseRecruiteeResponse to validate and normalize the URL: parse
j.careers_url/j.url with the URL constructor, ensure protocol === 'https:' and
hostname matches the tenant API host (or an allowed host list derived from the
request/tenant), and only return the validated URL (otherwise return '' or
undefined); reference parseRecruiteeResponse, job.url, j.careers_url, and j.url
when implementing this check.

In `@providers/smartrecruiters.mjs`:
- Line 41: The current postings URL builder hardcodes limit and offset (uses
SR_PAGE_SIZE and offset=0) and the fetch logic only issues a single request,
truncating results; modify the postings fetch flow to page through results by
looping requests: call the URL builder (the function that returns
`https://api.smartrecruiters.com/v1/companies/${slug}/postings?...`) with an
increasing offset (offset += SR_PAGE_SIZE) and fetch repeatedly until the API
returns an empty `content` array, aggregating each response's content into a
single results array; ensure SR_PAGE_SIZE is used for limit, handle HTTP errors
as the existing fetch logic does, and return the combined list instead of a
single-page response.

In `@providers/workable.mjs`:
- Around line 84-88: In parseWorkableMarkdown(), don't trust the raw url from
urlMatch; attempt to construct a URL object from the extracted string (wrap in
try/catch to handle malformed values) and validate that urlObj.protocol ===
'https:' and urlObj.hostname matches your allowed host(s) (or the same
origin/host used when fetching the feed) before pushing to jobs; if validation
fails or URL construction throws, skip that row (i.e., do not push to jobs). Use
the existing symbols urlMatch, url, and jobs and add hostname/protocol checks
and error-safe parsing inside that loop.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: c4afeaa3-d7f3-4f45-873a-760e2b2b4cb2

📥 Commits

Reviewing files that changed from the base of the PR and between fcab2cc and 09b6f2b.

📒 Files selected for processing (4)

providers/recruitee.mjs
providers/smartrecruiters.mjs
providers/workable.mjs
test-all.mjs

Addresses 3 CodeRabbit comments on PR santifer#653 (round 2). - recruitee: parseRecruiteeResponse now validates the offer URL via new URL() + protocol === 'https:' + RECRUITEE_HOST_RE hostname check. Off-domain or non-HTTPS values are dropped (url = '' per the Job contract) rather than passed through verbatim. - workable: parseWorkableMarkdown now validates each [View] link the same way (hostname must be apply.workable.com, protocol must be https). Rows that fail validation are skipped (continue), matching the existing "skip rows with no resolvable URL" semantic. - smartrecruiters: fetch() now paginates the /postings endpoint instead of returning only the first 100 results. Added resolveSlug() and buildPostingsUrl(slug, offset) helpers, refactored resolveApiUrl() to delegate to them, and the fetch loop walks offsets 0, SR_PAGE_SIZE, 2*SR_PAGE_SIZE, ... until either an empty page or a short page (less than SR_PAGE_SIZE). Safety cap SR_MAX_PAGES = 50 (= 5000 postings) prevents runaway loops against a broken API. - test-all.mjs: 4 new assertions - Workable: off-domain + non-https [View] links are dropped - Recruitee: off-domain + non-https + missing offer URLs → url='' - SmartRecruiters: 2-page aggregation (150 items across 2 pages) - SmartRecruiters: stop on the first empty page (1 request) Refs santifer#651

jrojomartinez · 2026-05-16T17:10:56Z

Round 2 fixes pushed:

Recruitee parser: parseRecruiteeResponse now validates each offer URL via new URL() + https: + RECRUITEE_HOST_RE. Off-domain / non-HTTPS values become url: "" rather than leaking through.
Workable parser: parseWorkableMarkdown validates each [View] link the same way (apply.workable.com + https:). Rows that fail validation are skipped.
SmartRecruiters pagination: fetch() now loops offsets 0, 100, 200, ... (using SR_PAGE_SIZE) until it sees an empty page or a short page. Refactored into resolveSlug() + buildPostingsUrl(slug, offset) helpers. Safety cap at 50 pages (5000 postings).
Tests: 4 new assertions cover URL validation for both parsers and the 2-page-aggregation / empty-page-stop paths for SR pagination.

Ready for re-review.

jrojomartinez · 2026-05-16T17:12:32Z

@coderabbitai review

coderabbitai · 2026-05-16T17:12:37Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

jrojomartinez · 2026-05-16T17:20:17Z

@coderabbitai review — checking against latest commit (434375b) after the round-2 fixes for the 3 comments from 17:04.

coderabbitai · 2026-05-16T17:22:07Z

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
echo "=== providers/workable.mjs ==="
cat providers/workable.mjs

echo ""
echo "=== providers/smartrecruiters.mjs ==="
cat providers/smartrecruiters.mjs

echo ""
echo "=== providers/recruitee.mjs ==="
cat providers/recruitee.mjs

echo ""
echo "=== templates/portals.example.yml ==="
cat templates/portals.example.yml

echo ""
echo "=== test-all.mjs (tail: new sections only) ==="
# Show sections 11-13 of tests
grep -n "§1[123]\|workable\|smartrecruiters\|recruitee\|SR_PAGE\|SR_MAX\|pagination\|parseWorkable\|parseSmartRec\|parseRecruitee" test-all.mjs | head -120

Length of output: 65977

@jrojomartinez — reviewed against commit 434375b. All five original comments and the three round-2 comments are fully addressed. Here's the final assessment:

✅ Round-2 fixes verified

Issue	File	Status
Recruitee offer URL validation	`providers/recruitee.mjs`	✅ `new URL()` + `https:` + `RECRUITEE_HOST_RE` — off-domain/non-HTTPS → `url: ""`
Workable `[View]` link validation	`providers/workable.mjs`	✅ `new URL()` + `https:` + `hostname === 'apply.workable.com'` — invalid rows skipped
SmartRecruiters pagination	`providers/smartrecruiters.mjs`	✅ `buildPostingsUrl(slug, offset)`, stops on empty page and short page, capped at `SR_MAX_PAGES = 50`

✅ All earlier comments confirmed resolved

Path-spoof prevention — all three providers use new URL() + exact hostname check (not substring/regex on the full URL string). ✅
j.ref validation — hostname === 'api.smartrecruiters.com' + pathname.startsWith('/v1/companies/') before rewrite; untrusted refs fall through to the fallback. ✅
Fallback slug sanitisation — companyName is slugified ([^a-z0-9]+ → -, trim leading/trailing) before embedding in the fallback URL. ✅

🔍 One minor nit (non-blocking)

In parseSmartRecruitersResponse, if a posting has an id but an empty name, the fallback URL ends with a trailing dash:

// name = '' → slugified = '' → url = `…/${j.id}-`
url = `https://jobs.smartrecruiters.com/${companySlug}/${j.id}-${slugified}`;

You could guard this with:

url = slugified
  ? `https://jobs.smartrecruiters.com/${companySlug}/${j.id}-${slugified}`
  : `https://jobs.smartrecruiters.com/${companySlug}/${j.id}`;

Not a security concern — just cosmetic for the rare case of a nameless posting.

Overall

The implementation is solid — clean architecture, proper SSRF mitigations (redirect: 'error' + allowlists/regex on every fetch), good pagination design, and thorough test coverage (§11–§13 covers detect, parsing, URL validation, path-spoof, and pagination edge cases). The change is strictly additive and existing providers are untouched. Ready to merge.

[approve]

github-actions Bot added the 🔧 scripts label May 14, 2026

jrojomartinez mentioned this pull request May 14, 2026

scan.mjs: add Workable, SmartRecruiters, Recruitee ATS parsers #651

Open

2 tasks

jrojomartinez added 4 commits May 16, 2026 18:26

docs(portals): document Workable, SmartRecruiters, Recruitee URL patt…

a67e794

…erns

jrojomartinez force-pushed the feat/ats-parsers-workable-smartrecruiters-recruitee branch from f88f2a2 to a67e794 Compare May 16, 2026 16:29

jrojomartinez marked this pull request as ready for review May 16, 2026 16:29

coderabbitai Bot reviewed May 16, 2026

View reviewed changes

Comment thread providers/recruitee.mjs Outdated

Comment thread providers/smartrecruiters.mjs Outdated

Comment thread providers/smartrecruiters.mjs Outdated

Comment thread providers/workable.mjs Outdated

Comment thread test-all.mjs Outdated

coderabbitai Bot reviewed May 16, 2026

View reviewed changes

Comment thread providers/recruitee.mjs

Comment thread providers/smartrecruiters.mjs Outdated

Comment thread providers/workable.mjs

Uh oh!

Conversation

jrojomartinez commented May 14, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Files

Design note — fetchText is already there

SSRF defence (matches providers/greenhouse.mjs)

Tests

Validated downstream

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related issues

❌ Failed checks (1 warning)

Uh oh!

jrojomartinez commented May 16, 2026

Uh oh!

coderabbitai Bot commented May 16, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

jrojomartinez commented May 16, 2026

Uh oh!

jrojomartinez commented May 16, 2026

Uh oh!

coderabbitai Bot commented May 16, 2026

Uh oh!

coderabbitai Bot commented May 16, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jrojomartinez commented May 16, 2026

Uh oh!

jrojomartinez commented May 16, 2026

Uh oh!

coderabbitai Bot commented May 16, 2026

Uh oh!

jrojomartinez commented May 16, 2026

Uh oh!

coderabbitai Bot commented May 16, 2026

✅ Round-2 fixes verified

✅ All earlier comments confirmed resolved

🔍 One minor nit (non-blocking)

Overall

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jrojomartinez commented May 14, 2026 •

edited by coderabbitai Bot

Loading

Design note — `fetchText` is already there

SSRF defence (matches `providers/greenhouse.mjs`)

coderabbitai Bot commented May 14, 2026 •

edited

Loading