Description
In the typing world, we also used to face a problem of causing regressions for users with larger impact than we expected. One of the ways we mostly resolved this was by putting ecosystem impact analyses into the CI of various typing related projects, for instance, typeshed, mypy, pyright, etc
Given issues like #4910 or from a few months ago #4519 and a few others beyond that, it appears setuptools sometimes releases changes that have a broader impact than expected. Maybe ecosystem checks would be useful here too?
This could look like scripts that attempt to build/install a large number of third party packages.
Data-driven estimates help us quantify the benefit of the change compared to the impact e.g. if we knew it would break XYZ number of projects, we might decide enforcing an underscore vs hyphen isn't worth it. Backward compat in build tools is especially useful given the role it plays in reproducibility (Python is used in a lot of science!), so excited to explore the space of better quantifying backward compat concerns.
Thanks for everything!