Open
Description
When capturing traces from production environment with lots of processing, one often encounters scenarios where many failures are in fact caused by same bug/issue. And that small changes in inputs often are difference between triggering an issue or not, and that such differences can say a lot about the nature of the bug.
Interesting questions include:
- How many distinct failure classes exists, and how many of each are there
- What are the boundary conditions between fail and success
With the ability to upload traces to a service (#2), it becomes interesting to have cloud-based tools which help answer these problem at scale. Performance of making the needed analysis - both how quick to get results, and how much it costs, are key considerations.