At the moment, workload creates tests and we don't know their quality. We need to introduce a step as part of tests that requires the agent to see the test fail before calling it "good". This could be introducing a brown m&m to find. It could be creating a regression workload for known fixed issues and rolling back before the fix to make sure it actually is fixed.
I think this is probably at least 2 parts. There is the deeper knowledge we get from the above, but we should also do intentional simple changes to the tests themselves to see them fail. This simple test manipulation that works for example based tests and more focused "unit level properties" might not be appropriate in the context of a "we don't see data corruption" level property.
The how here is up in the air. But tests that we haven't seen fail are tests that we can't know have any value.
At the moment, workload creates tests and we don't know their quality. We need to introduce a step as part of tests that requires the agent to see the test fail before calling it "good". This could be introducing a brown m&m to find. It could be creating a regression workload for known fixed issues and rolling back before the fix to make sure it actually is fixed.
I think this is probably at least 2 parts. There is the deeper knowledge we get from the above, but we should also do intentional simple changes to the tests themselves to see them fail. This simple test manipulation that works for example based tests and more focused "unit level properties" might not be appropriate in the context of a "we don't see data corruption" level property.
The how here is up in the air. But tests that we haven't seen fail are tests that we can't know have any value.