Skip to content

Tracking issue for rework of rust-repos scraper #121

Open
@NULLx76

Description

@NULLx76

Last year, I used this repository as part of my research of analysing release practices of all Java repositories on GitHub. During this, I discovered that this repository had a few issues, partially to just not being updated in a while. I hope it is not too presumptuous of me to suggest a rework, but I think it could be a nice thing to do and am willing to take it on myself.

This is a tracking issue, documenting all the things I've found (and still remember).
When I encounter/remember more, I'll add them to this issue.

  • Rate Limits not being respected, Rate Limits not being respected #122
    Retrying is not done correctly (sort of related to rate limits)
  • Outdated Dependencies and Rust Edition
    • This also includes using libraries like failure which are deprecated

The final scraper I have implemented for Java can be found here, specifically in src/scraper. I'd mostly want to port that code to rust-repos as I've verified it to work and should be mostly applicable.

A natural issue I ran into when scraping millions of repositories is that it can take weeks to scrape all of GitHub when respecting the rate-limits (while using some tricks even).
There are different solutions to this, but importantly it is good to find out how much of an issue this is with Rust, as there are far fewer repositories than Java. This is also related to #65, in its current state it may simply not be feasible to do that, but I can look into if it is.

Activity

NULLx76

NULLx76 commented on Oct 15, 2024

@NULLx76
Author

@rustbot claim

rustbot

rustbot commented on Oct 15, 2024

@rustbot

Error: This repository is not enabled to use triagebot.
Add a triagebot.toml in the root of the default branch to enable it.

Please file an issue on GitHub at triagebot if there's a problem with this bot, or reach out on #t-infra on Zulip.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @NULLx76@rustbot

        Issue actions

          Tracking issue for rework of rust-repos scraper · Issue #121 · rust-lang/rust-repos