Skip to content

fix(website): crawl_raw uses raw sitemap chain on chrome builds#393

Merged
j-mendez merged 1 commit into
spider-rs:mainfrom
zanmato:fix/raw-sitemap
Jun 1, 2026
Merged

fix(website): crawl_raw uses raw sitemap chain on chrome builds#393
j-mendez merged 1 commit into
spider-rs:mainfrom
zanmato:fix/raw-sitemap

Conversation

@zanmato

@zanmato zanmato commented Jun 1, 2026

Copy link
Copy Markdown
Contributor

What does this PR do?

Previously spider would use chrome when running crawl_raw for sitemap crawling. This PR makes it use the sitemap_crawl_raw for crawl_raw like it would without the chrome feature enabled.

Co-Authored by Claude Opus 4.8

Checklist

  • cargo test passes (seems to be some preexisting failures)
  • cargo fmt applied
  • New public APIs have doc comments
  • Feature-gated behind a flag (if adding optional functionality)

crawl_raw runs the main crawl over plain HTTP via crawl_concurrent_raw,
but then calls sitemap_crawl_chain for the sitemap pass. On a
chrome-enabled build that chain resolves to sitemap_crawl_chrome, so an
HTTP-only crawl unexpectedly launches a browser (this also affects the
CLI's --http flag). Add sitemap_crawl_chain_raw and call it from
crawl_raw so the sitemap pass stays on the HTTP path in every build.
@zanmato zanmato force-pushed the fix/raw-sitemap branch from 70eb935 to 401527c Compare June 1, 2026 07:58
@j-mendez j-mendez merged commit 6d496c0 into spider-rs:main Jun 1, 2026
j-mendez added a commit that referenced this pull request Jun 1, 2026
Includes #394 (on_link_blocked_callback for robots.txt-blocked URLs, with the
robots check kept short-circuited behind a clean else-if branch) and #393
(crawl_raw uses a raw sitemap chain so HTTP-only crawls never launch chrome).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@zanmato zanmato deleted the fix/raw-sitemap branch June 1, 2026 12:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants