Skip to content

Optimize GitHub API usage by leveraging search response data for repository information [fixes #11] #15

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

adityajha2005
Copy link

Problem

The GitHub REST API Search endpoint, when searching for repositories, returns a lot more data than expected. Currently, we're making multiple API requests per repository to fetch information that is already available in the search response. This is inefficient and unnecessarily consumes our API rate limits.

Solution

This PR optimizes our GitHub API usage by:

  1. Leveraging the rich data already available in the Search API response
  2. Configuring the Search API to fetch up to 1000 results per query (GitHub's maximum)
  3. Extracting repository information (name, topics, creation date) directly from search results
  4. Eliminating additional API requests per repository

Changes

  • Replaced individual API calls with a single efficient approach using pagination
  • Added new functions:
    • fetchRepositoriesWithTopic: Efficiently fetches repositories with pagination
    • extractRepositoryData: Extracts needed data directly from search results
  • Removed unnecessary API calls for repository details
  • Updated tests to verify the new functionality

Benefits

  • Significantly reduced API calls: From 4 requests per repository to just 10 total for 1000 repositories
  • Faster execution: No waiting for multiple sequential API calls per repository
  • Better rate limit usage: More efficient use of our GitHub API quota
  • Simplified code: Cleaner implementation with fewer API-specific functions

Notes

As discussed in issue #11, we're no longer collecting first commit date and first release date as this information isn't critical and would require additional API calls.

For the json-schema topic on GitHub, there are under 2k results, and we can now request up to 1000 results per query with the search API. The response contains all the initial data required without needing additional requests.

Testing

The changes have been tested with:

  • Unit tests for the new functions
  • Manual testing with the "json-schema" topic
  • Verification that the CSV output maintains the expected format

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant