Skip to content

Evaluate broader contributor-tier calibration beyond immediate mislabeling bugfix #78

@keithah

Description

@keithah

Summary

Follow-up from planning M042.

We should evaluate whether Kodiai's contributor expertise/tier system needs broader calibration beyond the immediate CrystalP mislabeling bugfix. The current M042 scope is to fix stored tier truthfulness, review-surface correctness, cache/fallback consistency, and the real repro on xbmc/xbmc#28132.

This issue tracks the larger question we are intentionally leaving out of that bugfix milestone.

Why this is separate

The immediate bug appears to be a correctness problem in how stored contributor tiers and review-time classification interact. That can likely be fixed without reopening the whole scoring model.

A broader calibration pass is different work:

  • sampling a wider set of contributors across the repo
  • checking whether score decay, weights, or percentile thresholds match reality
  • deciding whether the current tier bands are still the right shape
  • validating the model against more than one obvious bad output

That is useful, but it should not block the focused correctness fix.

Questions to answer

  • Are the current expertise weights (commit, pr_review, pr_authored) still the right relative signals?
  • Is percentile-based tiering the right mechanism for this repo's contributor distribution?
  • Should tier recalculation happen on a schedule, on every meaningful update, or both?
  • Do the current 4 stored tiers map well to the review-surface tone behavior?
  • What real contributor samples should be used as calibration fixtures?

Out of scope for M042

  • redesigning the contributor scoring model from scratch
  • repo-wide threshold retuning unless required for the immediate correctness fix
  • expanding the milestone from bugfix to product redesign

Done looks like

  • we have concrete sample contributors to test against
  • we understand whether the current model is structurally sound or just operationally buggy
  • if recalibration is needed, it is split into its own milestone/slice with proof criteria

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions