Problem
We have three styles of tests — unit, selftest, and e2e — but their organization and quality have drifted:
- Selftests are embedded inside the e2e suite. They aren't isolated in source, so running e2e requires filtering them out. Discovery and targeted runs are harder than they should be.
- The e2e suite contains noise. A number of tests appear unused or duplicative, and it's unclear which ones are validating actual product functionality versus ones that were generated speculatively by AI assistants and landed without scrutiny.
- No consistent code coverage reporting across the three test styles. We can't currently answer "what does our combined test coverage actually look like?" with a single number.
The net effect: tests are padding a vanity metric rather than giving us confidence that the product works.
Proposed work
Out of scope
Adding new test coverage for untested features — that's a follow-up once the existing suite is trustworthy.
Problem
We have three styles of tests — unit, selftest, and e2e — but their organization and quality have drifted:
The net effect: tests are padding a vanity metric rather than giving us confidence that the product works.
Proposed work
Out of scope
Adding new test coverage for untested features — that's a follow-up once the existing suite is trustworthy.