Skip to content

feat(scan): add optional always_allow tier to location_filter#652

Open
jrojomartinez wants to merge 1 commit into
santifer:mainfrom
jrojomartinez:feat/location-filter-always-allow
Open

feat(scan): add optional always_allow tier to location_filter#652
jrojomartinez wants to merge 1 commit into
santifer:mainfrom
jrojomartinez:feat/location-filter-always-allow

Conversation

@jrojomartinez
Copy link
Copy Markdown

@jrojomartinez jrojomartinez commented May 14, 2026

Rebased onto upstream main (1.8.0) — buildLocationFilter is unchanged in 1.8.0, so the always_allow tier applies cleanly. Marked ready for review.

Summary

Adds an optional always_allow list to location_filter, checked before block. A location matching always_allow passes regardless of block. Fully backward-compatible.

Motivation

The current filter checks block first and absolutely. A multi-location posting like "Remote, Belgium or France" is dropped when france is in block, even though Belgium is an acceptable option.

Worked example

Config: always_allow: ["belgium"], block: ["france"]

Job location before with always_allow
Remote, Belgium pass pass
Remote, Belgium or France reject pass
Remote, France reject reject

Changes

  • scan.mjs: buildLocationFilter reads always_allow, checks it before block (~2 lines + doc-comment refresh). Also adds export + main() guard so the function is unit-testable.
  • templates/portals.example.yml: commented always_allow: example.
  • test-all.mjs: new §11 with 6 unit tests covering home-region pass, always_allow-beats-block, block-still-rejects, empty location, case-insensitivity, and backward compatibility.

Test plan

  • node test-all.mjs --quick passes locally (all sections + the 6 new always_allow cases)
  • node scan.mjs --dry-run still runs correctly as a CLI (main() guard preserves the script-mode behaviour)

Summary by CodeRabbit

  • New Features

    • Added an always_allow option to location filtering that takes highest precedence and uses case-insensitive substring matching.
  • Bug Fixes

    • Clarified decision order: empty location passes; always_allow overrides block; block rejects; non-empty allow requires a match.
    • Filters now trim whitespace, ignore non-string entries, accept single-string or list inputs, and handle missing/empty safely.
  • Documentation

    • Updated config examples to describe always_allow semantics and precedence.
  • Tests

    • Added tests for always_allow, case-insensitivity, empty locations, mixed-type inputs, and backward compatibility.
  • Behavior

    • Startup now runs main only when executed directly.

Review Change Stack

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 14, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Exports buildLocationFilter and adds an always_allow tier with higher precedence than block; documents precedence and matching rules; adds tests for normalization/case-insensitivity and guards main() so it runs only when executed directly.

Changes

Location filter always_allow tier

Layer / File(s) Summary
always_allow filtering logic
scan.mjs
Adds normalizeKeywordList and exports buildLocationFilter; implements case-insensitive substring matching and precedence: missing/empty → pass; always_allow → pass (highest); block → reject; allow → empty passes / non-empty requires a match.
Module invocation guard
scan.mjs
Replaces unconditional main().catch(...) with an import.meta.url direct-execution guard so main() runs only when the script is executed directly, preserving fatal error handling.
Docs and tests
templates/portals.example.yml, test-all.mjs
Updates example location_filter docs to include always_allow and precedence; adds tests validating always_allow precedence, case-insensitive and trimmed matching, empty-location behavior, normalization of string/array/mixed inputs, and backward compatibility.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related issues

Possibly related PRs

  • santifer/career-ops#570: Modifies buildLocationFilter and location filtering semantics in scan.mjs, overlapping with this PR's core feature.
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically describes the main change: adding an optional always_allow tier to location_filter in the scan module.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@jrojomartinez
Copy link
Copy Markdown
Author

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 16, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@scan.mjs`:
- Around line 129-133: The buildLocationFilter function currently assumes
locationFilter.always_allow, .allow, and .block are arrays of strings and will
throw on bad YAML or nulls; normalize each by coercing to an array (e.g., if
value is a string wrap it in an array, if null/undefined use []), filter out
non-string entries, then map remaining items to lowercase before assigning to
alwaysAllow, allow, and block so the function no longer crashes on malformed
config data.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 86b2ea42-33bb-4a17-b0a8-594a9f6c4232

📥 Commits

Reviewing files that changed from the base of the PR and between 5d1f3a3 and 3490e02.

📒 Files selected for processing (3)
  • scan.mjs
  • templates/portals.example.yml
  • test-all.mjs

Comment thread scan.mjs Outdated
@jrojomartinez jrojomartinez force-pushed the feat/location-filter-always-allow branch from 3490e02 to 77feb8f Compare May 16, 2026 16:21
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
scan.mjs (1)

129-133: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Normalize location_filter inputs defensively to avoid runtime crashes.

Line 131–133 assumes arrays of strings; malformed YAML like scalar values, null entries, or mixed types will throw at runtime. Line 137 also assumes location is always a string. This can abort a scan instead of degrading gracefully.

Proposed fix
 export function buildLocationFilter(locationFilter) {
   if (!locationFilter) return () => true;
-  const alwaysAllow = (locationFilter.always_allow || []).map(k => k.toLowerCase());
-  const allow = (locationFilter.allow || []).map(k => k.toLowerCase());
-  const block = (locationFilter.block || []).map(k => k.toLowerCase());
+  const normalizeKeywords = (value) => {
+    const list = Array.isArray(value) ? value : (typeof value === 'string' ? [value] : []);
+    return list
+      .filter((k) => typeof k === 'string')
+      .map((k) => k.toLowerCase().trim())
+      .filter(Boolean);
+  };
+
+  const alwaysAllow = normalizeKeywords(locationFilter.always_allow);
+  const allow = normalizeKeywords(locationFilter.allow);
+  const block = normalizeKeywords(locationFilter.block);

   return (location) => {
-    if (!location) return true;
-    const lower = location.toLowerCase();
+    if (typeof location !== 'string' || location.trim() === '') return true;
+    const lower = location.toLowerCase();
     if (alwaysAllow.length > 0 && alwaysAllow.some(k => lower.includes(k))) return true;
     if (block.length > 0 && block.some(k => lower.includes(k))) return false;
     if (allow.length === 0) return true;
     return allow.some(k => lower.includes(k));
   };
 }

As per coding guidelines, **/*.mjs: “Ensure scripts handle missing data/ directories gracefully.”

Also applies to: 136-138

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scan.mjs` around lines 129 - 133, The buildLocationFilter function assumes
locationFilter.always_allow/allow/block are arrays of strings and that the
passed-in location is a string; to fix, defensively normalize those fields by
coercing missing fields to empty arrays, filtering out non-string items,
trimming and lowercasing each entry (use the symbols alwaysAllow, allow, block
within buildLocationFilter to locate the arrays), and when building the returned
predicate validate the incoming location (in the predicate closure) is a string
before calling toLowerCase — either coerce it to a safe string or return false
for non-strings so malformed YAML or nulls do not throw at runtime.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In `@scan.mjs`:
- Around line 129-133: The buildLocationFilter function assumes
locationFilter.always_allow/allow/block are arrays of strings and that the
passed-in location is a string; to fix, defensively normalize those fields by
coercing missing fields to empty arrays, filtering out non-string items,
trimming and lowercasing each entry (use the symbols alwaysAllow, allow, block
within buildLocationFilter to locate the arrays), and when building the returned
predicate validate the incoming location (in the predicate closure) is a string
before calling toLowerCase — either coerce it to a safe string or return false
for non-strings so malformed YAML or nulls do not throw at runtime.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: e2e61344-0200-49ca-a8f4-e8c79a40c4a7

📥 Commits

Reviewing files that changed from the base of the PR and between 3490e02 and 77feb8f.

📒 Files selected for processing (3)
  • scan.mjs
  • templates/portals.example.yml
  • test-all.mjs

@jrojomartinez jrojomartinez force-pushed the feat/location-filter-always-allow branch from 77feb8f to e2e04a3 Compare May 16, 2026 16:32
@jrojomartinez
Copy link
Copy Markdown
Author

Thanks — fixed in the latest force-push. Added a normalizeKeywordList helper that handles bare strings, null/undefined, and non-string entries across all three lists (always_allow, allow, block). Three new test cases (§11 cases 8–10) cover string-instead-of-array, mixed-type entries, and all-non-string lists. Ready for re-review.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
scan.mjs (1)

129-136: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Trim normalized keywords to avoid silent non-matches.

normalizeKeywordList lowercases but does not trim, so entries like " belgium " from YAML will never match and silently break filtering behavior.

Proposed fix
 function normalizeKeywordList(value) {
   if (value == null) return [];
   const arr = Array.isArray(value) ? value : [value];
-  return arr.filter(k => typeof k === 'string').map(k => k.toLowerCase());
+  return arr
+    .filter(k => typeof k === 'string')
+    .map(k => k.toLowerCase().trim())
+    .filter(Boolean);
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scan.mjs` around lines 129 - 136, normalizeKeywordList currently lowercases
entries but doesn't trim them, so values like " belgium " won't match; update
normalizeKeywordList to first coerce value to an array (as it already does),
then map each entry to a trimmed string, filter out non-strings and empty
results (after trimming), and finally lowercase the survivors; reference the
function name normalizeKeywordList and ensure trimming happens before filtering
empties and lowercasing so whitespace-only entries are dropped.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@templates/portals.example.yml`:
- Around line 43-46: Update the commented example under the always_allow setting
so it matches the surrounding description that targets US-based locations:
replace the placeholder entries currently showing "Belgium" and "Brussels" with
appropriate US examples (e.g., "United States", "New York" or another home
region/city), keeping the comment style and the key name always_allow intact so
the sample and surrounding text are consistent.

---

Duplicate comments:
In `@scan.mjs`:
- Around line 129-136: normalizeKeywordList currently lowercases entries but
doesn't trim them, so values like " belgium " won't match; update
normalizeKeywordList to first coerce value to an array (as it already does),
then map each entry to a trimmed string, filter out non-strings and empty
results (after trimming), and finally lowercase the survivors; reference the
function name normalizeKeywordList and ensure trimming happens before filtering
empties and lowercasing so whitespace-only entries are dropped.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 7022bbf5-0583-48ff-9807-4931a347d2c9

📥 Commits

Reviewing files that changed from the base of the PR and between 77feb8f and e2e04a3.

📒 Files selected for processing (3)
  • scan.mjs
  • templates/portals.example.yml
  • test-all.mjs

Comment thread templates/portals.example.yml Outdated
@jrojomartinez jrojomartinez force-pushed the feat/location-filter-always-allow branch from e2e04a3 to 025c314 Compare May 16, 2026 16:45
@jrojomartinez
Copy link
Copy Markdown
Author

Thanks — fixed in the latest force-push. The commented always_allow example now uses "United States" / "New York" to match the surrounding US-centric allow/block example. The (replace with your home region) hint is kept so a user copy-pasting the block knows it's illustrative.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@scan.mjs`:
- Around line 132-136: normalizeKeywordList currently preserves empty or
whitespace-only strings which, when later used in matching (e.g.,
lower.includes(k)), makes an empty k always match and can bypass block/allow
rules; update normalizeKeywordList to trim each string and filter out those that
are === '' (or only whitespace) so it returns only non-empty lowercase keywords,
and apply the same trimming+empty-filtering where keywords from always_allow /
block / allow are normalized before matching to ensure no empty entries can
cause a pass-all match.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 176bec03-f5ca-4a2b-a5af-58cd739795b7

📥 Commits

Reviewing files that changed from the base of the PR and between e2e04a3 and 025c314.

📒 Files selected for processing (3)
  • scan.mjs
  • templates/portals.example.yml
  • test-all.mjs

Comment thread scan.mjs
jrojomartinez added a commit to jrojomartinez/career-ops-contrib that referenced this pull request May 16, 2026
Pre-emptive hardening following the same defensive pattern CodeRabbit
flagged on PR santifer#652. All changes are within the providers shipped in
this PR; no scan.mjs / framework changes.

- All three providers: `careers_url` is now type-checked before .match()
  so a non-string YAML value (number, object, array) returns null from
  detect() rather than throwing.

- smartrecruiters: ref-rewrite uses an anchored regex
  (`/^https:\/\/api\.smartrecruiters\.com\/v1\/companies\//`) so the
  replacement only fires at the URL prefix. The fallback URL path (when
  both j.ref AND j.id are missing) now returns an empty string instead
  of synthesising a URL containing the literal "undefined" — the empty
  string is the contract-allowed default for url per _types.js > Job.
  Magic 100 in the postings limit is now a named SR_PAGE_SIZE constant.

- workable: parseWorkableMarkdown now extracts URLs via a line-level
  regex `/\[View\]\(([^)]+)\)/` rather than a column-position match,
  so a title containing a stray `|` doesn't shift cols[7] and silently
  drop the URL. Rows that still don't resolve a URL are skipped (no
  empty-URL entries leak into the dedup tracker).

- test-all.mjs: 6 new assertions covering the defensive paths
  (non-string careers_url across all 3 providers, the SR no-ref/no-id
  fallback, the Workable stray-pipe survival, and a real Workable
  fetch() rejection test against an unresolvable careers_url).

Refs santifer#651
@jrojomartinez jrojomartinez force-pushed the feat/location-filter-always-allow branch from 025c314 to 75a1595 Compare May 16, 2026 16:53
@jrojomartinez
Copy link
Copy Markdown
Author

Thanks — fixed in the latest force-push. normalizeKeywordList now applies .trim() to each survivor and drops empties via .filter(Boolean) so a malformed YAML entry like always_allow: [""] or [" "] no longer makes String.includes("") match every location. Two new test cases (§11 cases 11–12) cover the empty-keyword-dropped and whitespace-trimmed-but-still-matches paths.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@scan.mjs`:
- Around line 150-153: The predicate currently calls location.toLowerCase()
without ensuring location is a string; update the returned function (the
(location) => { ... } predicate that uses alwaysAllow) to guard non-string
values before calling toLowerCase by checking typeof location === 'string' and
if it's not a string, return true (or otherwise handle gracefully) so truthy
non-string payloads won't throw; then use const lower = location.toLowerCase()
for the subsequent alwaysAllow.some(...) check.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 4358df07-24bd-479f-bc5c-50a7ceab2b32

📥 Commits

Reviewing files that changed from the base of the PR and between 025c314 and 75a1595.

📒 Files selected for processing (3)
  • scan.mjs
  • templates/portals.example.yml
  • test-all.mjs

Comment thread scan.mjs
@jrojomartinez
Copy link
Copy Markdown
Author

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 16, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Adds an optional always_allow list checked BEFORE block. A location
matching always_allow passes regardless of block. Fully backward
compatible: a config without always_allow: behaves exactly as today.

Motivation: the current filter checks block first and absolutely, so
a multi-location posting like "Remote, Belgium or France" is dropped
the moment "france" is in block — even though Belgium is an
acceptable location in the same string. always_allow is the
home-region escape hatch.

Worked example with always_allow: ["belgium"], block: ["france"]:
  - "Remote, Belgium"           pass (unchanged)
  - "Remote, Belgium or France" PASS  (was REJECT)
  - "Remote, France"            reject (unchanged)

Also:
- Adds `export` to buildLocationFilter + gates main() behind an
  import.meta.url check so the function is importable from tests
  without running scan.mjs as a script.
- Adds test-all.mjs §11 covering the 6 boundary cases (home-region
  match, always_allow beats block, block still rejects when no
  always_allow hit, empty location, case-insensitivity, backward
  compatibility when always_allow is omitted).
- templates/portals.example.yml documents the commented
  always_allow: example with an ordering note.

Refs santifer#650
@jrojomartinez jrojomartinez force-pushed the feat/location-filter-always-allow branch from 75a1595 to e0d2a0c Compare May 16, 2026 17:02
@jrojomartinez
Copy link
Copy Markdown
Author

Thanks — fixed in the latest force-push. The returned filter closure now guards location with typeof location !== "string" || location.trim() === "" before calling .toLowerCase(), so a non-string provider payload (number, object) or a whitespace-only string passes through to downstream evaluation rather than crashing. Three new test cases (§11 cases 13–15) cover whitespace-only, non-string types (42, {}, null, undefined), and pass-through-not-silently-dropped semantics. Doc-comment above buildLocationFilter updated to mention the new behaviour.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant