Skip to content

Bug: Project details of forked repository are not accounted for and skipped from all_repositories because of false positive of forks_count<5 #162

@trishanu-init

Description

@trishanu-init

Bug Report: Incorrect Repository Filtering and Fork Data Handling

Description

The current code incorrectly skips repositories with forks_count < 5, including valid open-source project details.
This issue occurs because the script uses the child fork’s GitHub details (stars, forks, issues) instead of fetching data from the upstream parent repository.
As a result, certain repositories appear to have incomplete or incorrect metadata, leading to false negatives in open-source contribution evaluations.

Files Affected

  • github.py

Impact

This bug directly affects the open-source project evaluation metrics, as the Jinja prompt templates rely on:

  • stars
  • forks_count
  • open_issues

Projects that are forks of active upstream repositories get undervalued or skipped entirely.


Comparison: Current vs Expected Behavior

Expected Output

{
  "name": "Indiekart",
  "description": null,
  "github_url": "https://github.com/trishanu-init/Indiekart",
  "live_url": "https://indiekart.vercel.app/",
  "technologies": ["TypeScript"],
  "project_type": "open_source",
  "contributor_count": 27,
  "author_commit_count": 97,
  "total_commit_count": 174,
  "github_details": {
    "forked_from": "https://github.com/Indie-Kart/ecommerce-store",
    "parent_full_name": "Indie-Kart/ecommerce-store",
    "stars": 34,
    "forks": 63,
    "language": "TypeScript",
    "description": "Repo Owner: Trishanu Nayak",
    "topics": ["gssoc", "gssoc24"],
    "open_issues": 88,
    "created_at": "2024-02-22T08:48:17Z",
    "updated_at": "2025-09-02T18:38:35Z",
    "size": 3029,
    "fork": true,
    "archived": false,
    "default_branch": "main"
  }
}

Current Output

{
  "name": "Indiekart",
  "description": null,
  "github_url": "https://github.com/trishanu-init/Indiekart",
  "live_url": "https://indiekart.vercel.app/",
  "technologies": ["TypeScript"],
  "project_type": "open_source",
  "contributor_count": 27,
  "author_commit_count": 97,
  "total_commit_count": 174,
  "github_details": {
    "stars": 0,
    "forks": 0,
    "language": "TypeScript",
    "description": null,
    "created_at": "2024-06-22T08:00:36Z",
    "updated_at": "2024-07-29T11:07:24Z",
    "topics": [],
    "open_issues": 0,
    "size": 3029,
    "fork": true,
    "archived": false,
    "default_branch": "main",
    "contributors": 27
  }
}

Root Cause

The logic currently skips any repository where:

if repo.get("fork") and repo.get("forks_count", 0) < 5:
    continue

This leads to premature exclusion of legitimate projects

Proposed Solution

  1. Update the forks_count logic
    Avoid skipping forks based solely on count.

  2. Fetch details from the parent repository
    If the project is a fork, use the parent repository's data:

  3. Add a fallback mechanism
    Use parent data if available.
    If the parent is not found (API error or missing data), fall back to the child repository details.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions