Implement HIP progression analysis framework and enhance evaluation logic#86
Implement HIP progression analysis framework and enhance evaluation logic#86danielmarv wants to merge 9 commits intohiero-hackers:mainfrom
Conversation
…rt functionality, and pipeline execution Signed-off-by: Daniel Ntege <danientege785@gmail.com>
Signed-off-by: Daniel Ntege <danientege785@gmail.com>
exploreriii
left a comment
There was a problem hiding this comment.
I think in your tests/output you can add debug outputs, in .csv form. The most important part is to figure out how accurate your HIP categorisation is.
In your outputs/ file should be the end-user content, which should be .csv and e.g. a bar chart. There should be as few outputs as possible, providing the easiest, clearest view for what the end-user is interested in.
Here are some ideas, but this is quite a difficult problem, we should debate this:
For a given repo:
- csv of all raised issues, HIP likelihood score, development status, url to that issue.
Sorted by HIP id (largest to smallest)
stacked bar plot: HIP raised by development status
For a group of repos:
A group of repos meaning e.g. 'sdks' (for now, i'd suggest to limit this proof of work to just sdks)
- csv of HIP issue ID, status flag python, status flag js, status flag go..
Sorted by HIP id (largest to smallest)
- csv of HIP issue ID, number of repos with this HIP issue raised, completion rate across group
Sorted by HIP id (largest to smallest)
stacked bar plot: HIP raised by development status (not raised, issue raised, in progress, completed)
bar plot: HIP raised by % of repos completing that HIP
For the org:
- extract HIP approved list from https://hips.hedera.com (i think its approved, but check)
- csv of approved HIP, count of repos without issues raised, count of repos with issues raised, count of repos in progress, count of repos completed. List of repos completed.
Sorted by HIP id (largest to smallest)
then:
stacked bar: approved HIP by raised, in progress, completed
bonus:
eventually, we can create a manual mapping between HIP from the website, to repo group it impacts (and should be implemented). But I don't think I have the expertise to correctly categorise them by repo group, and there will be some challenages at creating the right repo group, so until we do or find someone who does, I think we can rely on issues-raised within that repo and limit to the group of sdks.
This is quite a hard issue and some of the points above are ideas and need to be planned out.
…uses for artifacts and repositories. Signed-off-by: Daniel Ntege <danientege785@gmail.com>
…uses for artifacts and repositories. Signed-off-by: Daniel Ntege <danientege785@gmail.com>
exploreriii
left a comment
There was a problem hiding this comment.
is it possible to cut the code by 85% by removing low-value features
also, how accurate is this model?
There was a problem hiding this comment.
I do not understand what the 0's show here
There was a problem hiding this comment.
All these chart titles need to be a lot clearer,
for example in this case is this total HIPs? just last 2 months?
"HIP Issues: Development Status Across SDKs (Last 3 Months)"
There was a problem hiding this comment.
can we use existing colour schemes or styling where possible?
There was a problem hiding this comment.
these numbers are not necessary imo
the legend overlaps your count
clearer title please
how did you find out which hips were not raised?
| MAINTAINER_AUTHOR_ASSOCIATIONS = {"OWNER", "MEMBER"} | ||
| COMMITTER_AUTHOR_ASSOCIATIONS = {"OWNER", "MEMBER", "COLLABORATOR"} | ||
|
|
||
| SOURCE_DIR_HINTS = {"src", "lib", "app", "package", "packages", "sdk", "client", "clients"} |
There was a problem hiding this comment.
There is so much hard coded in here and these fiels are very long,will be hard to maintain, can we simply by reducing the things we search for to the most high signal? cut files to 1/3?
must you search for doc files to idenify a HIP? i'm sure you could achieve very similar signal on much less searching
There was a problem hiding this comment.
does all this need to be in src/export?
some is for testing, right?
also again can cut this majorly, to the essentials to track the accuracy
| linked_issue_bonus: float = 5.0 | ||
| linked_pr_bonus: float = 5.0 | ||
| maintainer_linked_bonus: float = 5.0 | ||
| merged_bonus: float = 10.0 |
There was a problem hiding this comment.
narrow down on most high value stuff, cut the code by 2/3
| if not feature_vector.has_code_evidence: | ||
| uncertainty_reasons.append("No implementation file changes were detected.") | ||
| if feature_vector.has_code_evidence and not feature_vector.has_test_evidence: | ||
| uncertainty_reasons.append("Implementation exists, but test corroboration is missing.") |
There was a problem hiding this comment.
for example a hip most definitely will include test changes!
your model can be simplified to identify a hip accurately.
Signed-off-by: Daniel Ntege <danientege785@gmail.com>
Signed-off-by: Daniel Ntege <danientege785@gmail.com>
Signed-off-by: Daniel Ntege <danientege785@gmail.com>
This pull request introduces documentation and output updates for the HIP Progression pipeline pilot, specifically for the
hiero-ledger/hiero-sdk-jsrepository. The changes add detailed instructions for running the pipeline, and generate new CSV and markdown outputs summarizing HIP-related evidence, artifact features, repo status, and evaluation results. The outputs are structured to support both automated and manual review, enabling ongoing feedback and annotation.The most important changes are:
Documentation improvements:
README.mdwith instructions for running the HIP progression pipeline, including usage examples, output descriptions, and details on the manual feedback loop for reviewing predictions.Generated pipeline outputs:
Created artifact-level outputs:
artifact_features.md: Summarizes feature extraction results for issues and pull requests, including HIP mentions, keywords, code changes, and author roles.hip_evidence.md: Details evidence and scoring for each artifact-HIP pair, including confidence levels and rationale.pr_evaluation.mdandissue_evaluation.md: Provide tables for manual review of predictions, with fields for human feedback and correctness annotation. [1] [2]Created repo-level outputs:
hip_repo_status.md: Summarizes the inferred status and confidence for each HIP in the repository, with supporting artifact references and rationale.repo_evaluation.md: Enables manual review of repo-level HIP status predictions, including links to supporting evidence and annotation columns.Added evaluation and review summary outputs:
evaluation_summary.md: Aggregates prediction and review coverage, accuracy, precision, and recall by artifact type and dataset split.prediction_review_breakdown.md: Presents a compact breakdown of review outcomes (matches, misses, overcalls, etc.) for each scope and split.