Skip to content

Sitemap Extractor: Make sitemap discovery in parallel to http requests #263

@nicklamonov

Description

@nicklamonov

(Maybe it's affected by the memory limit. Needs to be researched first.)

Current flow is: discover all sitemaps first, ping all pages afterwards.

Update flow to have sitemaps discovery in parallel to HTTP requests to have results sooner.

Edit:
well, it happens even now, but sitemap discovery works not so fast and takes quite a lot of memory, so less memory is spent on http requests.
Example run: https://console.apify.com/organization/ZscMwFR5H7eCtWtyh/actors/rGeTNESChDZ65EbYh/runs/3ojy96GAEjvOQDwZU#output

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions