Skip to content

Get current URL in customCrawl() #364

Open
@popstas

Description

@popstas

What is the current behavior?
No information about current URL in customCrawl()

What is the motivation / use case for changing the behavior?
I'm want to skip request, but add URL to csv for some files like zip, doc, pdf.
My code that do it - https://github.com/viasite/sites-scraper/blob/59449b1b03/src/scrap-site.js#L240-L255

Proposal
Add crawler to customCrawl:
customCrawl: async (page, crawl, crawler)

I tried to store currentURL with requeststarted event, but it fail when more when concurrency > 1.

What do you think about it? I can make PR.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions