All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
- Documented cookie file, web crawling, CSS selector, PDF, and TLS preset features
0.1.6 - 2026-02-14
- Netscape HTTP Cookie File support via
--cookie-fileflag andAGENT_FETCH_COOKIE_FILEenv var - Cookie file cookies merge with
--cookieflags; explicit--cookievalues take precedence on conflicts
0.1.5 - 2026-02-14
- Mobile API extraction with configurable auth token and token type for publisher endpoints
- Web crawler with sliding-window concurrency
- 183 new integration tests; coverage thresholds enforced in CI
- Decomposed monolithic modules into focused sub-modules (content-extractors, http-fetch)
- E2E analysis scripts now group results by site name by default (
--no-groupto disable) - E2E tests default to
TEST_SET=all(previouslystable)
- CLI process hangs caused by httpcloak orphaned libuv references —
process.exit(0)in CLI entry point - E2E
minWordsthreshold bug - Type-safe
FetchError;TurndownServicecached as singleton for batch performance - Request correlation IDs added to fetch-layer logs for tracing concurrent requests
0.1.4 - 2026-02-05
- CLI
--version/-vflag to display package version - CLI warns on unknown flags instead of silently ignoring them
- Configurable request timeout with
--timeout <ms>flag (default: 20s)
- Prevent pino-pretty crash when running via npx by moving to dependencies and adding availability check
0.1.3 - 2026-02-05
- Send logs to stderr in JSON mode (#18)
0.1.2 - 2026-02-05
- npm publish workflow triggered on
v*tag push with provenance attestation - Releasing checklist in CONTRIBUTING.md
0.1.1 - 2026-02-04
- Nuxt 3 payload extraction strategy (#5)
- React Router / Remix hydration data extraction strategy (#10)
- WP AJAX content extraction strategy (#9)
- Arc XP Prism content API extraction strategy (#2)
isAccessibleForFreedetection from JSON-LD structured data (#3)- Next.js
__NEXT_DATA__extraction made unconditional (#7) - README badges, CI node matrix, extraction pipeline diagram (#13)
- Responsible use disclaimer (#15)
- Catastrophic regex backtracking in content validator (#14)
- HTTP 304 handling and GET request headers (#11)
- Truncated WP REST API response detection (#4)
- Removed unused
allowCookiesconfig property (#12)
0.1.0 - 2026-02-03
Initial release.
- HTTP client with Chrome TLS fingerprinting via httpcloak
- SSRF protection with DNS validation
- Multi-strategy content extraction (Readability, JSON-LD, Text Density, Next.js RSC, CSS selectors)
- WordPress REST API auto-detection and extraction
- Per-field metadata composition across strategies
- Markdown output via Turndown
- CLI with 5 output modes (default, --json, --raw, --text, -q)
- Site-specific configuration via
AGENT_FETCH_SITES_JSON - E2E test framework with SQLite database recording