Skip to content

Implement HIP progression analysis framework and enhance evaluation logic#86

Draft
danielmarv wants to merge 9 commits intohiero-hackers:mainfrom
danielmarv:hip-an-br
Draft

Implement HIP progression analysis framework and enhance evaluation logic#86
danielmarv wants to merge 9 commits intohiero-hackers:mainfrom
danielmarv:hip-an-br

Conversation

@danielmarv
Copy link
Copy Markdown
Contributor

This pull request introduces documentation and output updates for the HIP Progression pipeline pilot, specifically for the hiero-ledger/hiero-sdk-js repository. The changes add detailed instructions for running the pipeline, and generate new CSV and markdown outputs summarizing HIP-related evidence, artifact features, repo status, and evaluation results. The outputs are structured to support both automated and manual review, enabling ongoing feedback and annotation.

The most important changes are:

Documentation improvements:

  • Added a new "HIP Progression Pilot" section to README.md with instructions for running the HIP progression pipeline, including usage examples, output descriptions, and details on the manual feedback loop for reviewing predictions.

Generated pipeline outputs:

  • Created artifact-level outputs:

    • artifact_features.md: Summarizes feature extraction results for issues and pull requests, including HIP mentions, keywords, code changes, and author roles.
    • hip_evidence.md: Details evidence and scoring for each artifact-HIP pair, including confidence levels and rationale.
    • pr_evaluation.md and issue_evaluation.md: Provide tables for manual review of predictions, with fields for human feedback and correctness annotation. [1] [2]
  • Created repo-level outputs:

    • hip_repo_status.md: Summarizes the inferred status and confidence for each HIP in the repository, with supporting artifact references and rationale.
    • repo_evaluation.md: Enables manual review of repo-level HIP status predictions, including links to supporting evidence and annotation columns.
  • Added evaluation and review summary outputs:

    • evaluation_summary.md: Aggregates prediction and review coverage, accuracy, precision, and recall by artifact type and dataset split.
    • prediction_review_breakdown.md: Presents a compact breakdown of review outcomes (matches, misses, overcalls, etc.) for each scope and split.

…rt functionality, and pipeline execution

Signed-off-by: Daniel Ntege <danientege785@gmail.com>
Signed-off-by: Daniel Ntege <danientege785@gmail.com>
@danielmarv danielmarv marked this pull request as draft March 26, 2026 22:22
Copy link
Copy Markdown
Contributor

@exploreriii exploreriii left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in your tests/output you can add debug outputs, in .csv form. The most important part is to figure out how accurate your HIP categorisation is.

In your outputs/ file should be the end-user content, which should be .csv and e.g. a bar chart. There should be as few outputs as possible, providing the easiest, clearest view for what the end-user is interested in.

Here are some ideas, but this is quite a difficult problem, we should debate this:

For a given repo:

  1. csv of all raised issues, HIP likelihood score, development status, url to that issue.

Sorted by HIP id (largest to smallest)

stacked bar plot: HIP raised by development status

For a group of repos:
A group of repos meaning e.g. 'sdks' (for now, i'd suggest to limit this proof of work to just sdks)

  1. csv of HIP issue ID, status flag python, status flag js, status flag go..

Sorted by HIP id (largest to smallest)

  1. csv of HIP issue ID, number of repos with this HIP issue raised, completion rate across group

Sorted by HIP id (largest to smallest)

stacked bar plot: HIP raised by development status (not raised, issue raised, in progress, completed)
bar plot: HIP raised by % of repos completing that HIP

For the org:

  1. csv of approved HIP, count of repos without issues raised, count of repos with issues raised, count of repos in progress, count of repos completed. List of repos completed.

Sorted by HIP id (largest to smallest)

then:
stacked bar: approved HIP by raised, in progress, completed

bonus:
eventually, we can create a manual mapping between HIP from the website, to repo group it impacts (and should be implemented). But I don't think I have the expertise to correctly categorise them by repo group, and there will be some challenages at creating the right repo group, so until we do or find someone who does, I think we can rely on issues-raised within that repo and limit to the group of sdks.

This is quite a hard issue and some of the points above are ideas and need to be planned out.

…uses for artifacts and repositories.

Signed-off-by: Daniel Ntege <danientege785@gmail.com>
…uses for artifacts and repositories.

Signed-off-by: Daniel Ntege <danientege785@gmail.com>
Copy link
Copy Markdown
Contributor

@exploreriii exploreriii left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it possible to cut the code by 85% by removing low-value features

also, how accurate is this model?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not understand what the 0's show here

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All these chart titles need to be a lot clearer,
for example in this case is this total HIPs? just last 2 months?

"HIP Issues: Development Status Across SDKs (Last 3 Months)"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we use existing colour schemes or styling where possible?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these numbers are not necessary imo
the legend overlaps your count
clearer title please

how did you find out which hips were not raised?

MAINTAINER_AUTHOR_ASSOCIATIONS = {"OWNER", "MEMBER"}
COMMITTER_AUTHOR_ASSOCIATIONS = {"OWNER", "MEMBER", "COLLABORATOR"}

SOURCE_DIR_HINTS = {"src", "lib", "app", "package", "packages", "sdk", "client", "clients"}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is so much hard coded in here and these fiels are very long,will be hard to maintain, can we simply by reducing the things we search for to the most high signal? cut files to 1/3?

must you search for doc files to idenify a HIP? i'm sure you could achieve very similar signal on much less searching

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does all this need to be in src/export?
some is for testing, right?
also again can cut this majorly, to the essentials to track the accuracy

linked_issue_bonus: float = 5.0
linked_pr_bonus: float = 5.0
maintainer_linked_bonus: float = 5.0
merged_bonus: float = 10.0
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

narrow down on most high value stuff, cut the code by 2/3

if not feature_vector.has_code_evidence:
uncertainty_reasons.append("No implementation file changes were detected.")
if feature_vector.has_code_evidence and not feature_vector.has_test_evidence:
uncertainty_reasons.append("Implementation exists, but test corroboration is missing.")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for example a hip most definitely will include test changes!

your model can be simplified to identify a hip accurately.

Signed-off-by: Daniel Ntege <danientege785@gmail.com>
Signed-off-by: Daniel Ntege <danientege785@gmail.com>
Signed-off-by: Daniel Ntege <danientege785@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants