Fix Brave search result classification so snippet-only results are sent to the scraper instead of being treated as already-fetched full content.
Brave search results populate body with snippet text, not full-page content. In ResearchConductor._search_relevant_source_urls(), results were classified as already-fetched whenever len(raw_content) > 100, where raw_content fell back to result["body"]. That caused Brave results with long snippets to bypass scraping entirely.
In the failing reproduction, this manifested as every sub-query logging "Scraping content from 0 URLs" because all Brave hits were treated as prefetched content before the scraper ever saw them.
This PR intentionally contains only the snippet-classification fix. The separate scraped-count tracking used by local safeguard logic belongs to the safeguards PR, not this one.
- Stop treating
bodyas full page content in_search_relevant_source_urls() - Use only explicit
raw_contentto identify retriever results that already contain full text - Leave snippet-only results in the URL list so the normal scraper path runs
- Add focused tests covering:
- snippet-only retriever results are queued for scraping
- explicit
raw_contentresults remain prefetched
Verified locally with:
uv run python -c "from gpt_researcher.skills.researcher import ResearchConductor; print('ok')"uv run python -m unittest tests.test_research_conductor_retrieval
None.
- Potentially related: #1263