The get_revisions in perceval/backends/core/mediawiki.py doesn't actually paginate yet. If a page has a massive edit history, the tool only pulls the first 500 results and stops because it doesn't account for the API limits.
There’s actually a TODO right in the code that points this out:
TODO: Iterate if more than self.max reviews (500)
The goal is to:
- Loop through the API responses until the full history is fetched using the continue parameter.
- Keep the existing last_date filtering intact so we don't over-fetch.
- Ensure it stays compatible with the current request flow.
I think this would be a big plus for anyone using Perceval on high-activity wikis where the current partial datasets might lead to inaccurate analysis.
@sduenas please assign me this issue.
The get_revisions in perceval/backends/core/mediawiki.py doesn't actually paginate yet. If a page has a massive edit history, the tool only pulls the first 500 results and stops because it doesn't account for the API limits.
There’s actually a TODO right in the code that points this out:
TODO: Iterate if more than self.max reviews (500)
The goal is to:
I think this would be a big plus for anyone using Perceval on high-activity wikis where the current partial datasets might lead to inaccurate analysis.
@sduenas please assign me this issue.