Skip to content

feat(search): add search-tavily.mjs — Tavily search/extract utility for liveness, company research, and job scanning#674

Open
Schlaflied wants to merge 2 commits into
santifer:mainfrom
Schlaflied:feat/search-tavily
Open

feat(search): add search-tavily.mjs — Tavily search/extract utility for liveness, company research, and job scanning#674
Schlaflied wants to merge 2 commits into
santifer:mainfrom
Schlaflied:feat/search-tavily

Conversation

@Schlaflied
Copy link
Copy Markdown

@Schlaflied Schlaflied commented May 16, 2026

Closes #673

What this adds

search-tavily.mjs gives career-ops a model-agnostic web search layer. Other scripts can import it directly; it also works as a standalone CLI.

Why Tavily

career-ops currently has no search provider outside of Claude's built-in tool calls. This creates three gaps:

  • Liveness — Playwright fails in headless/batch mode with no fallback
  • Research — deep mode company intel varies unpredictably by model/CLI
  • Discovery — scan.mjs only hits ATS APIs; jobs on Indeed/company career pages are invisible

Tavily fills all three with a single, deterministic REST call. Free tier is 1,000 searches/month — more than enough for a job search.

Exported API

import { tavilySearch, tavilyExtract, checkLiveness, researchCompany, searchJobs } from './search-tavily.mjs';
Export Purpose
tavilySearch(query, opts) General web search
tavilyExtract(urls) Extract structured content from a URL
checkLiveness(url) Job posting liveness → 'active' | 'closed' | 'unknown'
researchCompany(company, role) Deep company research for interview prep
searchJobs(role, location) Find job postings via web search

CLI

node search-tavily.mjs "query"
node search-tavily.mjs "query" --depth advanced --max 10 --answer
node search-tavily.mjs --extract https://jobs.acme.com/123
node search-tavily.mjs --liveness https://boards.greenhouse.io/acme/jobs/123
node search-tavily.mjs --company "Acme Corp" --role "AI Enablement"
node search-tavily.mjs --jobs "Training Coordinator" --location "Toronto"
node search-tavily.mjs --json   # raw JSON output for piping
node search-tavily.mjs --auth   # test API key

Liveness detection logic

checkLiveness uses tavilyExtract to fetch the page, then scans the content for closed/active signals:

closed signals: "no longer accepting", "position has been filled", "posting has closed" …
active signals: "apply now", "job description", "responsibilities", "we are hiring" …

Returns 'unknown' when signals are ambiguous — callers can decide whether to fall back to Playwright.

Config

# config/profile.yml
tavily:
  api_key: tvly-xxx      # https://tavily.com — free tier available
  search_depth: basic    # basic (fast) | advanced (thorough, 2× cost)
  max_results: 5

Or via TAVILY_API_KEY env var.

Scope

  • 1 new file: search-tavily.mjs (318 lines)
  • 0 existing files modified — integration into check-liveness.mjs and deep mode is a follow-up
  • 0 new npm dependenciesjs-yaml (already in package.json) + native fetch

Test plan

  • --auth validates API key and returns OK
  • tavilySearch("query") returns structured results
  • tavilyExtract(url) returns page content
  • checkLiveness returns 'active' for a live job posting URL
  • checkLiveness returns 'closed' for a filled/expired posting URL
  • researchCompany returns enriched results with --depth advanced
  • searchJobs returns job-site-filtered results
  • Missing API key exits with clear setup instructions
  • --json flag outputs raw JSON for piping to other scripts

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features
    • Tavily-powered web search for career research and job discovery.
    • Job search with role and location filtering and adjustable depth/results.
    • Company research tool with optional role context and summarized answers.
    • Job listing liveness checker distinguishing active vs closed vs unknown.
    • Content extraction with validation and timeout handling.
    • CLI supporting search, extract, liveness, company/jobs workflows, auth testing, JSON output, and configurable result formatting.

Review Change Stack

Adds a dual-mode Tavily integration: importable module for other scripts
and standalone CLI tool. Fills the gap where career-ops relied entirely
on Claude's built-in web search with no standalone search provider.

Exports:
  tavilySearch(query, opts)         web search
  tavilyExtract(urls)               structured content extraction from URLs
  checkLiveness(url)                job posting liveness (active/closed/unknown)
  researchCompany(company, role)    deep company research for interview prep
  searchJobs(role, location)        find job postings via web search

CLI modes:
  node search-tavily.mjs "query"
  node search-tavily.mjs --extract <url>
  node search-tavily.mjs --liveness <url>
  node search-tavily.mjs --company "Acme" --role "AI Enablement"
  node search-tavily.mjs --jobs "Training Coordinator" --location "Toronto"
  node search-tavily.mjs --auth

Config via config/profile.yml under tavily: key (api_key, search_depth,
max_results) or TAVILY_API_KEY env var.

Zero new npm dependencies — js-yaml (already in package.json) + native fetch.

Closes #(TBD)
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 16, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 75423b17-b88a-4871-a1bf-e8a2ad3a1b06

📥 Commits

Reviewing files that changed from the base of the PR and between c3051c8 and f41518a.

📒 Files selected for processing (1)
  • search-tavily.mjs

📝 Walkthrough

Walkthrough

search-tavily.mjs adds a dual-mode Node.js utility (importable module + CLI) that loads Tavily config (env or YAML), provides fetch-with-timeout and URL validation, exposes tavilySearch/tavilyExtract, checkLiveness, researchCompany, searchJobs, formatResults, and a CLI with argument parsing and mode dispatch.

Changes

Tavily Search Module and CLI

Layer / File(s) Summary
Configuration loading and validation
search-tavily.mjs
Adds module header, constants, PROFILE_PATH; implements loadTavilyConfig() (env-first, YAML fallback) and requireConfig() for CLI failures.
Fetch timeout and URL validation
search-tavily.mjs
Adds fetchWithTimeout() (AbortController) and URL vetting helpers (isPrivateHost, validateExtractUrls) to reject non-HTTP(S) and private/local hosts before extraction.
Core Tavily HTTP request wrappers
search-tavily.mjs
Implements tavilySearch() and tavilyExtract() as timed POST clients that include config/options, throw on non-OK responses, and return parsed JSON.
Job liveness classification
search-tavily.mjs
Adds closed/active keyword lists and checkLiveness(url) which extracts content, normalizes text, matches signals, and returns `'active'
Domain-specific search helpers
search-tavily.mjs
Adds researchCompany(company, role) (advanced + includeAnswer) and searchJobs(role, location) (jobs-focused query, higher maxResults).
Human-readable result formatting
search-tavily.mjs
Implements formatResults(data) to render optional AI answer and numbered results with URLs and single-line truncated snippets.
CLI argument parsing and execution
search-tavily.mjs
Implements runCli() with --help, --auth, --liveness, --extract, --company, --jobs, --depth, --max, --answer, --json, default plain-search flow, and mode dispatch.
Module entrypoint
search-tavily.mjs
Wires isMain() to invoke runCli() on direct execution and exit with status 1 on uncaught errors.

Sequence Diagram(s)

sequenceDiagram
  participant User as CLI
  participant Module as search-tavily.mjs
  participant Tavily as Tavily API
  participant FS as config/profile.yml
  User->>Module: runCli() with args (--auth/--liveness/--company/--jobs/QUERY)
  Module->>FS: loadTavilyConfig() (env or YAML)
  Module->>Tavily: tavilySearch or tavilyExtract POST (with API key)
  Tavily-->>Module: JSON response (results or extracted content)
  Module-->>User: formatted text or JSON output
Loading

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 76.92% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the primary change: adding search-tavily.mjs with Tavily search/extract utility capabilities for liveness, company research, and job scanning.
Linked Issues check ✅ Passed The pull request fully implements all coding requirements from #673: tavilySearch, tavilyExtract, checkLiveness, researchCompany, and searchJobs functions; CLI support for all specified commands; config loading from TAVILY_API_KEY or config/profile.yml; and timeout/SSRF validation.
Out of Scope Changes check ✅ Passed All changes are within scope: only search-tavily.mjs is added (as required); no existing files are modified; integration points are deferred as follow-ups per the issue description.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ESLint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

ESLint skipped: no ESLint configuration detected in root package.json. To enable, add eslint to devDependencies.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@search-tavily.mjs`:
- Around line 247-253: The argument parsing currently finds flag positions
(depthIdx, maxIdx, roleIdx, locationIdx) but removes tokens by value which can
inadvertently drop legitimate query words and doesn’t validate numeric flags;
update parsing to build the cleaned query by filtering out args at the specific
indices (flag index and index+1 for flags that take values) rather than by
value, validate that max and depth (from depth and max variables) are present
only when their parsed values are valid integers (reject or set undefined on NaN
and return an error/exit early), and preserve other tokens while keeping asJson
from args.includes('--json'); apply the same index-based removal and validation
logic to the other occurrence at the noted location.
- Around line 101-107: The Tavily fetch calls (the POST to TAVILY_SEARCH_URL
that builds `body` and returns `res.json()`, and the second similar fetch around
lines 122-128) need request timeouts: create an AbortController, pass its signal
into fetch, start a timer (e.g. 5–15s) that calls controller.abort(), and clear
the timer after fetch resolves; ensure you catch aborts and surface a clear
timeout error before the existing `if (!res.ok)` handling and keep the same
response parsing (return `res.json()`), and apply the same change to the other
fetch call as well.
- Around line 116-121: In tavilyExtract, validate and sanitize the incoming urls
before building the body: ensure each item is a parseable URL (use the URL
constructor), only allow http or https schemes, and reject or filter hosts that
resolve to local addresses (localhost, 127.0.0.0/8, ::1) or private/rfc1918
ranges (10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16) and non-DNS/opaque hosts;
keep the existing Array.isArray(urls) handling but replace the unvalidated list
with the vetted list and throw or return an error if any URL is invalid/blocked
before calling the Tavily API (refer to tavilyExtract and the
body.api_key/body.urls construction).
- Around line 39-47: The module currently uses a CWD-relative PROFILE_PATH which
breaks when the package is imported from elsewhere; update loadTavilyConfig to
resolve the config file relative to the module using import.meta.url (e.g.
derive PROFILE_PATH via fileURLToPath(new URL('./config/profile.yml',
import.meta.url))) and then use that absolute path with existsSync/readFileSync
so config discovery is stable; keep the existing existsSync guard so missing
files are handled gracefully. Reference: PROFILE_PATH and loadTavilyConfig.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 603b08c8-b447-46be-b915-02a8aff3230f

📥 Commits

Reviewing files that changed from the base of the PR and between 5d1f3a3 and c3051c8.

📒 Files selected for processing (1)
  • search-tavily.mjs

Comment thread search-tavily.mjs Outdated
Comment thread search-tavily.mjs Outdated
Comment thread search-tavily.mjs
Comment thread search-tavily.mjs Outdated
- PROFILE_PATH: resolve relative to import.meta.url via fileURLToPath so
  config discovery works when the module is imported from other directories
- fetchWithTimeout: add AbortController with 10s timeout to both
  tavilySearch and tavilyExtract fetch calls; surface clear timeout error
- validateExtractUrls: validate URLs before sending to Tavily — require
  http/https scheme and reject localhost, 127.x, 10.x, 172.16-31.x,
  192.168.x (RFC1918/loopback) to prevent SSRF
- CLI arg parsing: filter query tokens by index (not by value) to avoid
  dropping legitimate query words that match flag values; validate --depth
  as 'basic' | 'advanced' and --max as a positive integer with early exit
@Schlaflied
Copy link
Copy Markdown
Author

@coderabbitai resolve

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 16, 2026

✅ Actions performed

Comments resolved. Approval is disabled; enable reviews.request_changes_workflow to allow explicit top-level @coderabbitai resolve or @coderabbitai approve commands.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: search-tavily.mjs — Tavily search/extract as shared utility (liveness fallback + company research + job scan)

1 participant