Skip to content

Conversation

@trishanu-init
Copy link

Pull Request: Fix Project details handling for forked repositories and Unicode Encoding

Related Issues

Closes #162 and #116

Summary

This PR resolves two key issues in github.py:

  1. Repository Filtering & Fork Handling (Bug: Project details of forked repository are not accounted for and skipped from all_repositories because of false positive of forks_count<5 #162)

    • Removed unnecessary filtering of forked repositories with low forks_count.
    • Ensures data is fetched from the parent repository when available.
    • Adds fallback to child repo details if parent data fails.
    • Corrects inaccurate metrics for forked open-source projects.
  2. Unicode Encoding Fix ('charmap' codec can't encode characters when caching GitHub data #116)

    • Explicitly writes cache files using UTF-8 encoding.
    • Prevents encoding errors on Windows systems.
    • Maintains consistent handling of Unicode characters across platforms.

Impact

  • Accurate and complete repository metadata.
  • Improved open-source project evaluation reliability.
  • Cross-platform stability and encoding consistency.

Testing

  • Verified forked project data accuracy.
  • Confirmed UTF-8 cache writing on both Windows and Linux.

@trishanu-init
Copy link
Author

resolves #155 as well

@trishanu-init
Copy link
Author

evaluating against userId PavitKaur05 (https://github.com/PavitKaur05)

Previous

image

Current

image

@trishanu-init
Copy link
Author

@sp2hari Can you please review this PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

Bug: Project details of forked repository are not accounted for and skipped from all_repositories because of false positive of forks_count<5

1 participant