maxPagesToFetch is misleading

Hello, 

We have a use-case where some URLs are prioritised (boosted), but the crawler terminates after XX URLs are fetched. To implement this, we planned to use the CrawlConfig.maxPagesToFetch, whose javadoc states "Maximum number of pages to fetch". However, this documentation and variable name is misleading, as it actually limits the number of URLs to schedule (i.e. be added to the frontier). If you agree, I would propose renaming this option and adding another that limits the number fetched. If all URLs have equal priority, then the two options will be equivalent in semantics.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

maxPagesToFetch is misleading #137

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

maxPagesToFetch is misleading #137

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions