Skrobak

Resilient web scraper with automatic retry, proxy rotation, and strategy cascade. Supports fetch, Playwright browsers, and custom fetch implementations.

Why Skrobak?

When scraping multiple websites, you face unpredictable challenges:

Unknown accessibility requirements - You never know whether a page will be accessible through a simple HTTP fetch or if it blocks requests and requires a headless browser
IP-based restrictions - Sites might block your server's IP address, requiring proxy rotation as a fallback

Skrobak solves these problems by:

Optimizing for cost and efficiency - Start with the most efficient methods first (simple fetch), gradually falling back to resource-intensive options (headless browser) or paid solutions (proxies) only when necessary
Eliminating trial and error - Automatically test different browser engines and proxy combinations to find which ones work, without manual intervention

Define your own list of strategies to try in order, and Skrobak automatically cascades through them until one succeeds.

Quick Start

Installation

npm install skrobak

Usage

import { scrape } from 'skrobak'

const result = await scrape('https://example.com', {
  strategies: [{ mechanism: 'fetch' }]
})

// When response is HTML, use Cheerio for parsing
if (result.mechanism === 'fetch') {
  const title = result.$('title').text()
  console.log(title)
}

// When response is JSON, use .json() to retrieve data
if (result.mechanism === 'fetch') {
  const data = await result.json()
  console.log(data)
}

Core Concepts

Strategy Cascade

Skrobak tries strategies in order until one succeeds. If a strategy fails, it automatically moves to the next one:

const result = await scrape('https://example.com', {
  strategies: [
    { mechanism: 'fetch' },    // Try simple fetch first
    { mechanism: 'browser' },  // Fallback to browser if fetch fails
  ]
})

Mechanisms

Mechanism	Description
`fetch`	Fast HTTP requests with lazy-loaded Cheerio for HTML parsing
`browser`	Full browser rendering with Playwright (chromium/firefox/webkit)
`custom`	Your own fetch implementation

API Reference

scrape(url, config)

Main scraping function with automatic retry and strategy cascade.

scrape(url: string, config: ScrapeConfig): Promise<ScrapeResult>

Parameters

Parameter	Type	Description
`url`	`string`	The URL to scrape
`config`	`ScrapeConfig`	Configuration object

Returns

Promise<ScrapeResult> - Result object with mechanism-specific properties. See Return Types.

Complete Configuration Example

const result = await scrape('https://example.com', {
  // Strategy cascade - tries in order until one succeeds
  strategies: [
    { mechanism: 'fetch', useProxy: true },
    { mechanism: 'fetch', useProxy: false },
    { mechanism: 'browser' }
  ],

  // Global options applied to all strategies
  options: {
    timeout: 30000,
    retries: {
      count: 3,
      delay: 2000,
      type: 'exponential',
      statusCodes: [408, 429, 500, 502, 503, 504]
    },
    proxies: [
      'http://proxy1.example.com:8080',
      'http://proxy2.example.com:8080'
    ],
    userAgents: [
      'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36'
    ],
    headers: {
      'Accept-Language': 'en-US,en;q=0.9'
    },
    validateResponse: ({ mechanism, response }) => {
      if (mechanism === 'fetch') return response.ok
      if (mechanism === 'browser') return response.status() === 200
      return true
    }
  },

  // Browser-specific configuration
  browser: {
    engine: 'chromium',
    waitUntil: 'networkidle',
    resources: ['document', 'script', 'xhr', 'fetch']
  },

  // Custom fetch implementation
  custom: {
    fn: async (url, options) => {
      // Your custom fetch logic
      const response = await customFetch(url, options)
      return response
    }
  },

  // Event hooks for monitoring and logging
  hooks: {
    onRetryAttempt: ({ attempt, maxAttempts, nextRetryDelay }) => {
      console.log(`Retry ${attempt}/${maxAttempts}, waiting ${nextRetryDelay}ms`)
    },
    onRetryExhausted: ({ totalAttempts }) => {
      console.log(`All ${totalAttempts} retries exhausted`)
    },
    onStrategyFailed: ({ strategy, strategyIndex, totalStrategies }) => {
      console.log(`Strategy ${strategyIndex + 1}/${totalStrategies} (${strategy.mechanism}) failed`)
    },
    onAllStrategiesFailed: ({ strategies }) => {
      console.error(`All ${strategies.length} strategies failed`)
    }
  }
})

Configuration Reference

ScrapeConfig

Root configuration object passed to scrape(). Main configuration object for the scraping operation.

Property	Type	Required	Description
`strategies`	`ScrapeStrategy[]`	Yes	List of strategies to try in order
`options`	`ScrapeOptions`	No	Global scraping options
`browser`	`BrowserConfig`	No	Browser-specific configuration
`custom`	`CustomConfig`	No	Custom fetch function configuration
`hooks`	`ScrapeHooks`	No	Event hooks for monitoring and logging

ScrapeStrategy (`config.strategies`)

Individual strategy in the cascade. Skrobak tries each strategy in order until one succeeds.

Property	Type	Default	Description
`mechanism`	`'fetch'` `'browser'` `'custom'`	-	Scraping mechanism to use
`useProxy`	`boolean`	`false`	Whether to use proxy for this strategy

Example:

strategies: [
  { mechanism: 'fetch', useProxy: false },  // Try without proxy first (if available)
  { mechanism: 'fetch', useProxy: true },   // Fallback with proxy
  { mechanism: 'browser' }                  // Last resort: full browser
]

ScrapeOptions (`config.options`)

Global options applied across all strategies.

Property	Type	Default	Description
`timeout`	`number`	-	Request timeout in milliseconds
`retries`	`RetryConfig`	`{ count: 0, delay: 5000, type: 'exponential' }`	Retry configuration
`proxies`	`string[]`	-	Proxy pool (randomly selected per request)
`userAgents`	`string[]`	-	User agent pool (randomly selected per request)
`viewports`	`ViewportSize[]`	-	Viewport pool (randomly selected per request)
`headers`	`object`	-	HTTP headers as key-value pairs
`validateResponse`	`ValidateResponse`	-	Custom response validation function

Example:

options: {
  timeout: 30000,
  retries: { count: 3, delay: 2000, type: 'exponential' },
  proxies: ['http://proxy1.com:8080', 'http://proxy2.com:8080'],
  userAgents: ['Mozilla/5.0...'],
  headers: { 'Accept-Language': 'en-US' }
}

RetryConfig (`config.options.retries`)

Controls retry behavior when requests fail.

Property	Type	Default	Description
`count`	`number`	`0`	Number of retry attempts
`delay`	`number`	`5000`	Base delay between retries in milliseconds
`type`	`'exponential'` `'linear'` `'constant'`	`'exponential'`	Retry delay calculation strategy
`statusCodes`	`number[]`	`[408, 429, 500, 502, 503, 504]`	HTTP status codes that trigger retry

Retry delay calculation:

exponential: (1000ms → 2000ms → 4000ms)
linear: (1000ms → 2000ms → 3000ms)
constant: (1000ms → 1000ms → 1000ms)

Status code retry behavior:

Skrobak automatically retries only on specific HTTP status codes:

Retriable status codes (in statusCodes list) → Retry the same mechanism
Non-retriable status codes (NOT in list, e.g., 404, 401) → Skip to next strategy immediately
Network errors (no status code) → Always retry (temporary issues)

Example:

retries: {
  count: 3,
  delay: 2000,
  type: 'exponential',
  statusCodes: [429, 503]  // Only retry on rate limiting and service unavailable
}

ViewportSize (`config.options.viewports`)

Viewport dimensions for browser-based scraping.

Property	Type	Description
`width`	`number`	Viewport width in pixels
`height`	`number`	Viewport height in pixels

Example:

viewports: [
  { width: 1920, height: 1080 },
  { width: 1366, height: 768 },
  { width: 390, height: 844 }
]

ValidateResponse (`config.options.validateResponse`)

Custom validation function to verify response before accepting it.

Type: (context) => boolean

Function receives a context object with mechanism and response properties. Return true to accept the response, false to retry or move to the next strategy.

Example:

validateResponse: ({ mechanism, response }) => {
  if (mechanism === 'fetch') {
    return response.status === 200 && response.headers.get('content-type')?.includes('json')
  }

  if (mechanism === 'browser') {
    return response.status() === 200
  }

  return true
}

BrowserConfig (`config.browser`)

Browser-specific configuration for the browser mechanism.

Property	Type	Default	Description
`engine`	`'chromium'` `'firefox'` `'webkit'`	`'chromium'`	Browser engine to use
`resources`	`ResourceType[]`	- (allows all)	Allowed resource types (blocks all others)
`waitUntil`	`'load'` `'domcontentloaded'` `'networkidle'` `'commit'`	-	When to consider navigation successful

Example:

browser: {
  engine: 'chromium',
  waitUntil: 'networkidle',
  resources: ['document', 'script', 'xhr', 'fetch']
}

ResourceType (`config.browser.resources`)

Types of resources that can be loaded by the browser. When specified, all other resource types are blocked.

Type: Playwright's ResourceType (string)

Common values: 'document' 'stylesheet' 'image' 'script' 'xhr' 'fetch'

See Playwright's ResourceType for all available options.

Example:

// Only allow essential resources, block images/CSS for faster loading
resources: ['document', 'script', 'xhr', 'fetch']

CustomConfig (`config.custom`)

Configuration for custom fetch implementation when using mechanism: 'custom'.

Property	Type	Description
`fn`	`(url, options) => Promise<TCustomResponse>`	Custom fetch function

Function parameters:

url (string): The URL to fetch
options (object): Request options composed from global config
- proxy? (string): Proxy URL (when useProxy: true)
- userAgent? (string): User agent string
- viewport? (object): Viewport dimensions with width and height
- headers? (object): HTTP headers as key-value pairs
- timeout? (number): Request timeout in milliseconds

Example: See Custom Fetch Function example.

ScrapeHooks (`config.hooks`)

Event hooks for monitoring scraping progress, logging, metrics, or debugging. All hooks are optional.

Property	Type	Description
`onRetryAttempt`	`(context) => void`	Called when a retry attempt fails
`onRetryExhausted`	`(context) => void`	Called when all retry attempts are exhausted
`onStrategyFailed`	`(context) => void`	Called when a strategy fails
`onAllStrategiesFailed`	`(context) => void`	Called when all strategies fail

onRetryAttempt

Context object:

{
  error: unknown              // The error that occurred
  attempt: number             // Current attempt number (1-indexed)
  maxAttempts: number         // Total number of attempts
  nextRetryDelay: number      // Delay before next retry in ms
  retryConfig: RetryConfig    // Retry configuration
}

onRetryExhausted

Context object:

{
  error: unknown              // The final error
  totalAttempts: number       // Total number of attempts made
  retryConfig: RetryConfig    // Retry configuration
}

onStrategyFailed

Context object:

{
  error: unknown              // The error that occurred
  strategy: ScrapeStrategy    // The strategy that failed
  strategyIndex: number       // Index of failed strategy (0-indexed)
  totalStrategies: number     // Total number of strategies
}

onAllStrategiesFailed

Context object:

{
  lastError: unknown          // The last error encountered
  strategies: Array<ScrapeStrategy>  // All strategies that were tried
  totalAttempts: number       // Number of strategies attempted
}

Example:

const result = await scrape('https://example.com', {
  strategies: [
    { mechanism: 'fetch', useProxy: true },
    { mechanism: 'browser' }
  ],
  options: {
    retries: { count: 3, delay: 1000, type: 'exponential' }
  },
  hooks: {
    onRetryAttempt: ({ attempt, maxAttempts, nextRetryDelay, error }) => {
      console.log(`Retry ${attempt}/${maxAttempts} failed, waiting ${nextRetryDelay}ms`)
      console.error('Error:', error)
    },
    onRetryExhausted: ({ totalAttempts }) => {
      console.log(`All ${totalAttempts} retries exhausted`)
    },
    onStrategyFailed: ({ strategy, strategyIndex, totalStrategies }) => {
      console.log(`Strategy ${strategyIndex + 1}/${totalStrategies} (${strategy.mechanism}) failed`)
    },
    onAllStrategiesFailed: ({ strategies }) => {
      console.error(`All ${strategies.length} strategies failed`)
    }
  }
})

Return Types

The result of scrape() depends on which mechanism succeeded. Use the mechanism property to determine the type.

ScrapeResultFetch

When: mechanism: 'fetch' strategy succeeds

Property	Type	Description
`mechanism`	`'fetch'`	Indicates fetch mechanism was used
`response`	`Response`	Standard fetch Response object
`$`	`CheerioAPI`	Lazy-loaded Cheerio instance for HTML parsing

Example:

if (result.mechanism === 'fetch') {
  const title = result.$('title').text()
  const links = result.$('a').map((_, element) => result.$(element).attr('href')).get()
}

ScrapeResultBrowser

When: mechanism: 'browser' strategy succeeds

Property	Type	Description
`mechanism`	`'browser'`	Indicates browser mechanism was used
`response`	`PlaywrightResponse`	Playwright Response object
`page`	`Page`	Playwright Page instance for interaction
`cleanup`	`() => Promise<void>`	Function to close browser context (must be called)

Example:

if (result.mechanism === 'browser') {
  await result.page.screenshot({ path: 'screenshot.png' })

  const text = await result.page.textContent('h1')

  // Important: cleanup resources after use to avoid memory leaks
  await result.cleanup()
}

ScrapeResultCustom

When: mechanism: 'custom' strategy succeeds

Property	Type	Description
`mechanism`	`'custom'`	Indicates custom mechanism was used
`response`	`TCustomResponse`	Your custom response type

Example:

if (result.mechanism === 'custom') {
  // Your custom response type
  console.log(result.response)
}

Examples

Basic Fetch

const result = await scrape('https://example.com', {
  strategies: [{ mechanism: 'fetch' }]
})

if (result.mechanism === 'fetch') {
  const links = result.$('a').map((_, el) => result.$(el).attr('href')).get()
}

Browser with Custom Viewport

const result = await scrape('https://example.com', {
  strategies: [{ mechanism: 'browser' }],
  options: {
    viewports: [{ width: 1920, height: 1080 }]
  },
  browser: {
    engine: 'chromium',
    waitUntil: 'networkidle'
  }
})

if (result.mechanism === 'browser') {
  await result.page.screenshot({ path: 'screenshot.png' })
  // Important: cleanup resources after use to avoid memory leaks
  await result.cleanup()
}

Basic Custom Fetch

import axios from 'axios'

const result = await scrape('https://api.example.com/data', {
  strategies: [{ mechanism: 'custom' }],
  custom: {
    fn: async (url, options) => {
      // Use any HTTP client: axios, got, ofetch, etc.
      const response = await axios.get(url, {
        headers: options.headers,
        timeout: options.timeout,
        proxy: options.proxy ? { host: options.proxy } : undefined
      })
      return response.data
    }
  }
})

if (result.mechanism === 'custom') {
  console.log(result.response) // Your custom response data
}

Proxy Rotation with Retry

const result = await scrape('https://example.com', {
  strategies: [
    { mechanism: 'fetch', useProxy: true }
  ],
  options: {
    proxies: [
      'http://proxy1.example.com:8080',
      'http://proxy2.example.com:8080'
    ],
    retries: {
      count: 3,
      delay: 2000,
      type: 'exponential'
    }
  }
})

Custom User Agents

const result = await scrape('https://example.com', {
  strategies: [{ mechanism: 'fetch' }],
  options: {
    userAgents: [
      'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
      'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36'
    ]
  }
})

Block Images and Media

const result = await scrape('https://example.com', {
  strategies: [{ mechanism: 'browser' }],
  browser: {
    // Only allow these
    resources: ['document', 'script', 'xhr', 'fetch']
  }
})

if (result.mechanism === 'browser') {
  // Images and media are blocked, page loads faster
  await result.cleanup()
}

Response Validation

const result = await scrape('https://api.example.com/data', {
  strategies: [{ mechanism: 'fetch' }],
  options: {
    validateResponse: ({ mechanism, response }) => {
      if (mechanism === 'fetch') {
        return response.status === 200 && response.headers.get('content-type')?.includes('json')
      }

      return true
    },
    retries: { count: 3, delay: 1000 }
  }
})

Monitoring with Hooks

const result = await scrape('https://example.com', {
  strategies: [
    { mechanism: 'fetch', useProxy: true },
    { mechanism: 'browser' }
  ],
  options: {
    retries: { count: 3, delay: 1000, type: 'exponential' }
  },
  hooks: {
    onRetryAttempt: ({ attempt, maxAttempts, nextRetryDelay }) => {
      console.log(`Retry ${attempt}/${maxAttempts}, waiting ${nextRetryDelay}ms`)
    },
    onStrategyFailed: ({ strategy, strategyIndex, totalStrategies }) => {
      console.log(`Strategy ${strategyIndex + 1}/${totalStrategies} failed: ${strategy.mechanism}`)
    }
  }
})

Custom Fetch Function

import { ofetch } from 'ofetch'

const result = await scrape('https://example.com', {
  strategies: [{ mechanism: 'custom' }],
  custom: {
    fn: async (url, options) => {
      const response = await ofetch(url, {
        headers: options.headers,
        timeout: options.timeout
      })

      return { data: response }
    }
  }
})

if (result.mechanism === 'custom') {
  console.log(result.response.data)
}

Strategy Cascade with Fallback

const result = await scrape('https://example.com', {
  strategies: [
    { mechanism: 'fetch', useProxy: true },   // Try with proxy first
    { mechanism: 'fetch', useProxy: false },  // Fallback to no proxy
    { mechanism: 'browser' }                  // Last resort: full browser
  ],
  options: {
    proxies: ['http://proxy.example.com:8080'],
    retries: { count: 2, delay: 1000 }
  }
})

Complete Example

const result = await scrape('https://example.com/products', {
  strategies: [
    { mechanism: 'fetch', useProxy: true },
    { mechanism: 'browser' }
  ],
  options: {
    timeout: 30000,
    retries: {
      count: 3,
      delay: 2000,
      type: 'exponential'
    },
    proxies: [
      'http://proxy1.example.com:8080',
      'http://proxy2.example.com:8080'
    ],
    userAgents: [
      'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36'
    ],
    headers: {
      'Accept-Language': 'en-US,en;q=0.9'
    },
    validateResponse: ({ mechanism, response }) => {
      if (mechanism === 'fetch') {
        return response.ok
      }

      if (mechanism === 'browser') {
        return response.status() === 200
      }

      return true
    }
  },
  browser: {
    engine: 'chromium',
    waitUntil: 'networkidle',
    resources: ['document', 'script', 'xhr', 'fetch']
  }
})

if (result.mechanism === 'fetch') {
  const products = result.$('.product').map((_, element) => ({
    title: result.$(element).find('.title').text(),
    price: result.$(element).find('.price').text()
  })).get()
}

if (result.mechanism === 'browser') {
  const products = await result.page.$$eval('.product', (elements) =>
    elements.map((element) => ({
      title: element.querySelector('.title')?.textContent,
      price: element.querySelector('.price')?.textContent
    }))
  )

  // Important: cleanup resources after use to avoid memory leaks
  await result.cleanup()
}

Bulk Scraping

scrapeMany(urls, scrapeConfig, config?)

Scrape multiple URLs sequentially with automatic browser cleanup, random delays, dynamic URL discovery, and error resilience.

Perfect for:

Batch scraping multiple pages
Web crawling with link discovery

scrapeMany<TCustomResponse = unknown>(
  urls: string[],
  scrapeConfig: ScrapeConfig<TCustomResponse>,
  scrapeManyConfig?: ScrapeManyConfig<TCustomResponse>
): Promise<ScrapeManyResult>

Parameters

Parameter	Type	Description
`urls`	`string[]`	Array of URLs to scrape
`scrapeConfig`	`ScrapeConfig`	Same config as `scrape()` - strategies, options, etc.
`scrapeManyConfig`	`ScrapeManyConfig`	Optional batch scraping configuration

ScrapeManyConfig

Property	Type	Description
`onSuccess`	`(context) => Promise<void>`	Called after each successful scrape
`onError`	`(context) => Promise<void>`	Called after each failed scrape
`delays`	`{ min: number; max: number }`	Random delay range between requests (in milliseconds)

Success Context:

{
  result: ScrapeResult        // Scrape result (fetch/browser/custom)
  url: string                 // Current URL
  index: number               // URL index (0-based)
  addUrls: (urls) => void     // Add more URLs to queue (for crawling)
  stats: {
    initial: number           // Initial URL count
    discovered: number        // URLs added via addUrls()
    processed: number         // Completed (success + failed)
    remaining: number         // URLs left in queue
    succeeded: number         // Successful scrapes
    failed: number            // Failed scrapes
  }
}

Error Context:

{
  error: unknown              // The error that occurred
  url: string                 // Current URL
  index: number               // URL index (0-based)
  addUrls: (urls) => void     // Add more URLs to queue
  stats: { /* same as above */ }
}

Returns

{
  total: number      // Total URLs processed
  succeeded: number  // Successful scrapes
  failed: number     // Failed scrapes
}

Bulk Scraping Examples

Basic Batch Scraping

import { scrapeMany } from 'skrobak'

const result = await scrapeMany(
  [
    'https://example.com/page1',
    'https://example.com/page2',
    'https://example.com/page3'
  ],
  {
    strategies: [{ mechanism: 'fetch' }],
    options: { retries: { count: 3 } }
  },
  {
    delays: { min: 2000, max: 5000 },  // 2-5 second delays
    onSuccess: async ({ result, url }) => {
      if (result.mechanism === 'fetch') {
        const title = result.$('title').text()
        await saveToDatabase({ url, title })
      }
    }
  }
)

console.log(`Scraped ${result.succeeded}/${result.total} pages`)

Web Crawling with Link Discovery

const result = await scrapeMany(
  ['https://example.com/category'],
  { strategies: [{ mechanism: 'fetch' }] },
  {
    delays: { min: 1000, max: 3000 },
    onSuccess: async ({ result, addUrls }) => {
      if (result.mechanism === 'fetch') {
        // Extract and save data
        // ...

        // Discover more URLs to scrape
        const productLinks = result.$('.product a').map((_, el) => result.$(el).attr('href')).get()

        addUrls(productLinks)
      }
    }
  }
)

console.log(`Crawl complete: ${result.total} pages`)

Error Handling and Monitoring

const errors: Array<{ url: string; error: unknown }> = []

const result = await scrapeMany(
  [/* … */],
  {
    strategies: [
      { mechanism: 'fetch' },
      { mechanism: 'browser' }  // Fallback to browser
    ],
    options: { retries: { count: 3 } }
  },
  {
    delays: { min: 2000, max: 5000 },
    onSuccess: async ({ result, url, stats }) => {
      console.log(`✓ ${url} (${stats.succeeded}/${stats.processed})`)

      if (result.mechanism === 'fetch') {
        await processData(result.$('body').html())
      }

      if (result.mechanism === 'browser') {
        await processData(await result.page.content())
        // Browser cleanup happens automatically!
      }
    },
    onError: async ({ error, url, stats }) => {
      console.error(`✗ ${url} (${stats.failed} failures)`)
      errors.push({ url, error })
    }
  }
)

console.log(`Completed: ${result.succeeded} succeeded, ${result.failed} failed`)

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
.github		.github
src		src
.editorconfig		.editorconfig
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
biome.json		biome.json
bun.lock		bun.lock
commitlint.json		commitlint.json
lefthook.json		lefthook.json
package.json		package.json
release.json		release.json
tsconfig.json		tsconfig.json

License

macieklamberski/skrobak

Folders and files

Latest commit

History

Repository files navigation

Skrobak

Why Skrobak?

Quick Start

Installation

Usage

Core Concepts

Strategy Cascade

Mechanisms

API Reference

scrape(url, config)

Parameters

Returns

Complete Configuration Example

Configuration Reference

ScrapeConfig

ScrapeStrategy (config.strategies)

ScrapeOptions (config.options)

RetryConfig (config.options.retries)

ViewportSize (config.options.viewports)

ValidateResponse (config.options.validateResponse)

BrowserConfig (config.browser)

ResourceType (config.browser.resources)

CustomConfig (config.custom)

ScrapeHooks (config.hooks)

onRetryAttempt

onRetryExhausted

onStrategyFailed

onAllStrategiesFailed

Return Types

ScrapeResultFetch

ScrapeResultBrowser

ScrapeResultCustom

Examples

Basic Fetch

Browser with Custom Viewport

Basic Custom Fetch

Proxy Rotation with Retry

Custom User Agents

Block Images and Media

Response Validation

Monitoring with Hooks

Custom Fetch Function

Strategy Cascade with Fallback

Complete Example

Bulk Scraping

scrapeMany(urls, scrapeConfig, config?)

Parameters

ScrapeManyConfig

Returns

Bulk Scraping Examples

Basic Batch Scraping

Web Crawling with Link Discovery

Error Handling and Monitoring

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 5

Languages

ScrapeStrategy (`config.strategies`)

ScrapeOptions (`config.options`)

RetryConfig (`config.options.retries`)

ViewportSize (`config.options.viewports`)

ValidateResponse (`config.options.validateResponse`)

BrowserConfig (`config.browser`)

ResourceType (`config.browser.resources`)

CustomConfig (`config.custom`)

ScrapeHooks (`config.hooks`)