Optimize GitHub API usage by leveraging search response data for repository information [fixes #11] #15
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Problem
The GitHub REST API Search endpoint, when searching for repositories, returns a lot more data than expected. Currently, we're making multiple API requests per repository to fetch information that is already available in the search response. This is inefficient and unnecessarily consumes our API rate limits.
Solution
This PR optimizes our GitHub API usage by:
Changes
fetchRepositoriesWithTopic
: Efficiently fetches repositories with paginationextractRepositoryData
: Extracts needed data directly from search resultsBenefits
Notes
As discussed in issue #11, we're no longer collecting first commit date and first release date as this information isn't critical and would require additional API calls.
For the json-schema topic on GitHub, there are under 2k results, and we can now request up to 1000 results per query with the search API. The response contains all the initial data required without needing additional requests.
Testing
The changes have been tested with: