Make it trivial to use Pydoll inside Scrapy without custom glue code. The plugin should let a spider opt-in per request to drive a headless tab, run small actions (clicks, waits), and return a rendered HtmlResponse that plays nicely with Scrapy selectors. It should feel like standard Scrapy, just powered by Pydoll when needed.
Proposed API
- Installable optional plugin:
pip install scrapy-pydoll
- Enable via settings:
PYDOLL_ENABLED = True
PYDOLL_CONCURRENCY = 2
PYDOLL_BROWSER_OPTIONS = { "geolocation": "GB", "headless": True }
- Per-request opt-in (meta) or helper Request:
yield scrapy.Request(
url,
meta={
"pydoll": {
"actions": [
{"type": "wait", "for": "networkidle"},
{"type": "click", "selector": "#show-more"},
],
"timeout": 15000,
},
"cookiejar": "sessionA",
},
callback=self.parse_page,
)
# or
yield PydollRequest(url, actions=[...], timeout=15000)
Requirements (MVP)
Follow-ups
Example Spider
class ExampleSpider(scrapy.Spider):
name = "example"
def start_requests(self):
yield scrapy.Request(
"https://example.com/products",
meta={"pydoll": {
"actions": [{"type": "wait", "for": "networkidle"}],
"timeout": 15000
}},
callback=self.parse_list
)
def parse_list(self, response):
for href in response.css(".item a::attr(href)").getall():
yield scrapy.Request(
response.urljoin(href),
meta={"pydoll": {"actions": [{"type": "click", "selector": "#accept"}]}},
callback=self.parse_item
)
def parse_item(self, response):
yield {
"title": response.css("h1::text").get(),
"price": response.css(".price::text").get(),
}
Make it trivial to use Pydoll inside Scrapy without custom glue code. The plugin should let a spider opt-in per request to drive a headless tab, run small actions (clicks, waits), and return a rendered
HtmlResponsethat plays nicely with Scrapy selectors. It should feel like standard Scrapy, just powered by Pydoll when needed.Proposed API
pip install scrapy-pydollRequirements (MVP)
HtmlResponsecompatible with.css()/.xpath()networkidle,selector,sleep(ms)click,type,scrollcookiejar; graceful shutdown onspider_closedIgnoreRequestor similarFollow-ups
return_markdown=True) once exporter existsTabor rendered HTMLExample Spider