Skip to content

fix: Improve renamed package detection #575

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

behalshabnam
Copy link

This PR solves #441 by improving the package detection logic.

Package Detection Improvements

The package verification now uses multiple identifiers to establish package identity:

  1. Repository URL Matching

    • Primary identifier for package identity
    • Handles cases where package names change but repository remains the same
  2. Author Verification

    • Checks for common authors between versions
    • Helps verify package lineage across renames
  3. Version Correlation

    • Matches exact versions to ensure continuity
    • Prevents false positives from similarly named packages
  4. Description Similarity Analysis

    • Uses text similarity matching for package descriptions
    • Threshold-based comparison (30%) to accommodate minor description updates
    • Helps confirm package identity when other metadata changes

Implementation Details

  • Added is_same_package_except_name function for comprehensive package verification
  • Introduced SIMILARITY_SCALE (100) and SIMILARITY_THRESHOLD (30) constants for description matching
  • Implemented helper functions:
    • have_common_author: Checks author overlap
    • high_similarity: Performs description similarity analysis

Testing

  • Added integration test renamed_package_not_flagged to verify behavior
  • Test uses the icu-rename fixture to validate package detection
  • Confirms that renamed packages are not incorrectly flagged as unmaintained

Example

A package like icu_locid that was previously incorrectly flagged as "not in repository" is now properly recognized when:

  • The repository URL matches
  • Authors overlap with the original package
  • Package descriptions are sufficiently similar

@behalshabnam behalshabnam requested a review from smoelius as a code owner April 9, 2025 12:59
@CLAassistant
Copy link

CLAassistant commented Apr 9, 2025

CLA assistant check
All committers have signed the CLA.

@behalshabnam
Copy link
Author

Hello @smoelius,

Here is the PR for #441. Please review and let me know if anything needs to be changed.

Thank You!

@smoelius
Copy link
Collaborator

@behalshabnam Thanks very much for working on this. I have a little bit of a PR backlog, but I will try to get to this soon.

@smoelius
Copy link
Collaborator

@behalshabnam Sorry I haven't had a chance to review this. I am getting ready to travel, and I will get back early next week. I will make it a priority to review this then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants