Summary
Follow-up from planning M042.
We should evaluate whether Kodiai's contributor expertise/tier system needs broader calibration beyond the immediate CrystalP mislabeling bugfix. The current M042 scope is to fix stored tier truthfulness, review-surface correctness, cache/fallback consistency, and the real repro on xbmc/xbmc#28132.
This issue tracks the larger question we are intentionally leaving out of that bugfix milestone.
Why this is separate
The immediate bug appears to be a correctness problem in how stored contributor tiers and review-time classification interact. That can likely be fixed without reopening the whole scoring model.
A broader calibration pass is different work:
- sampling a wider set of contributors across the repo
- checking whether score decay, weights, or percentile thresholds match reality
- deciding whether the current tier bands are still the right shape
- validating the model against more than one obvious bad output
That is useful, but it should not block the focused correctness fix.
Questions to answer
- Are the current expertise weights (
commit, pr_review, pr_authored) still the right relative signals?
- Is percentile-based tiering the right mechanism for this repo's contributor distribution?
- Should tier recalculation happen on a schedule, on every meaningful update, or both?
- Do the current 4 stored tiers map well to the review-surface tone behavior?
- What real contributor samples should be used as calibration fixtures?
Out of scope for M042
- redesigning the contributor scoring model from scratch
- repo-wide threshold retuning unless required for the immediate correctness fix
- expanding the milestone from bugfix to product redesign
Done looks like
- we have concrete sample contributors to test against
- we understand whether the current model is structurally sound or just operationally buggy
- if recalibration is needed, it is split into its own milestone/slice with proof criteria
Summary
Follow-up from planning M042.
We should evaluate whether Kodiai's contributor expertise/tier system needs broader calibration beyond the immediate CrystalP mislabeling bugfix. The current M042 scope is to fix stored tier truthfulness, review-surface correctness, cache/fallback consistency, and the real repro on xbmc/xbmc#28132.
This issue tracks the larger question we are intentionally leaving out of that bugfix milestone.
Why this is separate
The immediate bug appears to be a correctness problem in how stored contributor tiers and review-time classification interact. That can likely be fixed without reopening the whole scoring model.
A broader calibration pass is different work:
That is useful, but it should not block the focused correctness fix.
Questions to answer
commit,pr_review,pr_authored) still the right relative signals?Out of scope for M042
Done looks like