Description
Last year, I used this repository as part of my research of analysing release practices of all Java repositories on GitHub. During this, I discovered that this repository had a few issues, partially to just not being updated in a while. I hope it is not too presumptuous of me to suggest a rework, but I think it could be a nice thing to do and am willing to take it on myself.
This is a tracking issue, documenting all the things I've found (and still remember).
When I encounter/remember more, I'll add them to this issue.
- Rate Limits not being respected, Rate Limits not being respected #122Retrying is not done correctly (sort of related to rate limits)Outdated Dependencies and Rust Edition
- This also includes using libraries like
failure
which are deprecated
The final scraper I have implemented for Java can be found here, specifically in src/scraper
. I'd mostly want to port that code to rust-repos
as I've verified it to work and should be mostly applicable.
A natural issue I ran into when scraping millions of repositories is that it can take weeks to scrape all of GitHub when respecting the rate-limits (while using some tricks even).
There are different solutions to this, but importantly it is good to find out how much of an issue this is with Rust, as there are far fewer repositories than Java. This is also related to #65, in its current state it may simply not be feasible to do that, but I can look into if it is.
Activity
NULLx76 commentedon Oct 15, 2024
@rustbot claim
rustbot commentedon Oct 15, 2024
Error: This repository is not enabled to use triagebot.
Add a
triagebot.toml
in the root of the default branch to enable it.Please file an issue on GitHub at triagebot if there's a problem with this bot, or reach out on #t-infra on Zulip.