feat(search): add search-tavily.mjs — Tavily search/extract utility for liveness, company research, and job scanning#674
Conversation
Adds a dual-mode Tavily integration: importable module for other scripts and standalone CLI tool. Fills the gap where career-ops relied entirely on Claude's built-in web search with no standalone search provider. Exports: tavilySearch(query, opts) web search tavilyExtract(urls) structured content extraction from URLs checkLiveness(url) job posting liveness (active/closed/unknown) researchCompany(company, role) deep company research for interview prep searchJobs(role, location) find job postings via web search CLI modes: node search-tavily.mjs "query" node search-tavily.mjs --extract <url> node search-tavily.mjs --liveness <url> node search-tavily.mjs --company "Acme" --role "AI Enablement" node search-tavily.mjs --jobs "Training Coordinator" --location "Toronto" node search-tavily.mjs --auth Config via config/profile.yml under tavily: key (api_key, search_depth, max_results) or TAVILY_API_KEY env var. Zero new npm dependencies — js-yaml (already in package.json) + native fetch. Closes #(TBD)
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughsearch-tavily.mjs adds a dual-mode Node.js utility (importable module + CLI) that loads Tavily config (env or YAML), provides fetch-with-timeout and URL validation, exposes tavilySearch/tavilyExtract, checkLiveness, researchCompany, searchJobs, formatResults, and a CLI with argument parsing and mode dispatch. ChangesTavily Search Module and CLI
Sequence Diagram(s) sequenceDiagram
participant User as CLI
participant Module as search-tavily.mjs
participant Tavily as Tavily API
participant FS as config/profile.yml
User->>Module: runCli() with args (--auth/--liveness/--company/--jobs/QUERY)
Module->>FS: loadTavilyConfig() (env or YAML)
Module->>Tavily: tavilySearch or tavilyExtract POST (with API key)
Tavily-->>Module: JSON response (results or extracted content)
Module-->>User: formatted text or JSON output
🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 ESLint
ESLint skipped: no ESLint configuration detected in root package.json. To enable, add Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@search-tavily.mjs`:
- Around line 247-253: The argument parsing currently finds flag positions
(depthIdx, maxIdx, roleIdx, locationIdx) but removes tokens by value which can
inadvertently drop legitimate query words and doesn’t validate numeric flags;
update parsing to build the cleaned query by filtering out args at the specific
indices (flag index and index+1 for flags that take values) rather than by
value, validate that max and depth (from depth and max variables) are present
only when their parsed values are valid integers (reject or set undefined on NaN
and return an error/exit early), and preserve other tokens while keeping asJson
from args.includes('--json'); apply the same index-based removal and validation
logic to the other occurrence at the noted location.
- Around line 101-107: The Tavily fetch calls (the POST to TAVILY_SEARCH_URL
that builds `body` and returns `res.json()`, and the second similar fetch around
lines 122-128) need request timeouts: create an AbortController, pass its signal
into fetch, start a timer (e.g. 5–15s) that calls controller.abort(), and clear
the timer after fetch resolves; ensure you catch aborts and surface a clear
timeout error before the existing `if (!res.ok)` handling and keep the same
response parsing (return `res.json()`), and apply the same change to the other
fetch call as well.
- Around line 116-121: In tavilyExtract, validate and sanitize the incoming urls
before building the body: ensure each item is a parseable URL (use the URL
constructor), only allow http or https schemes, and reject or filter hosts that
resolve to local addresses (localhost, 127.0.0.0/8, ::1) or private/rfc1918
ranges (10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16) and non-DNS/opaque hosts;
keep the existing Array.isArray(urls) handling but replace the unvalidated list
with the vetted list and throw or return an error if any URL is invalid/blocked
before calling the Tavily API (refer to tavilyExtract and the
body.api_key/body.urls construction).
- Around line 39-47: The module currently uses a CWD-relative PROFILE_PATH which
breaks when the package is imported from elsewhere; update loadTavilyConfig to
resolve the config file relative to the module using import.meta.url (e.g.
derive PROFILE_PATH via fileURLToPath(new URL('./config/profile.yml',
import.meta.url))) and then use that absolute path with existsSync/readFileSync
so config discovery is stable; keep the existing existsSync guard so missing
files are handled gracefully. Reference: PROFILE_PATH and loadTavilyConfig.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 603b08c8-b447-46be-b915-02a8aff3230f
📒 Files selected for processing (1)
search-tavily.mjs
- PROFILE_PATH: resolve relative to import.meta.url via fileURLToPath so config discovery works when the module is imported from other directories - fetchWithTimeout: add AbortController with 10s timeout to both tavilySearch and tavilyExtract fetch calls; surface clear timeout error - validateExtractUrls: validate URLs before sending to Tavily — require http/https scheme and reject localhost, 127.x, 10.x, 172.16-31.x, 192.168.x (RFC1918/loopback) to prevent SSRF - CLI arg parsing: filter query tokens by index (not by value) to avoid dropping legitimate query words that match flag values; validate --depth as 'basic' | 'advanced' and --max as a positive integer with early exit
|
@coderabbitai resolve |
✅ Actions performedComments resolved. Approval is disabled; enable |
Closes #673
What this adds
search-tavily.mjsgives career-ops a model-agnostic web search layer. Other scripts can import it directly; it also works as a standalone CLI.Why Tavily
career-ops currently has no search provider outside of Claude's built-in tool calls. This creates three gaps:
Tavily fills all three with a single, deterministic REST call. Free tier is 1,000 searches/month — more than enough for a job search.
Exported API
tavilySearch(query, opts)tavilyExtract(urls)checkLiveness(url)'active' | 'closed' | 'unknown'researchCompany(company, role)searchJobs(role, location)CLI
Liveness detection logic
checkLivenessusestavilyExtractto fetch the page, then scans the content for closed/active signals:Returns
'unknown'when signals are ambiguous — callers can decide whether to fall back to Playwright.Config
Or via
TAVILY_API_KEYenv var.Scope
search-tavily.mjs(318 lines)check-liveness.mjsand deep mode is a follow-upjs-yaml(already inpackage.json) + nativefetchTest plan
--authvalidates API key and returns OKtavilySearch("query")returns structured resultstavilyExtract(url)returns page contentcheckLivenessreturns'active'for a live job posting URLcheckLivenessreturns'closed'for a filled/expired posting URLresearchCompanyreturns enriched results with--depth advancedsearchJobsreturns job-site-filtered results--jsonflag outputs raw JSON for piping to other scripts🤖 Generated with Claude Code
Summary by CodeRabbit