I had a describe() section with several hundred entries, only two of them failed, but it was impossible to tell this from looking at the test tree.  Adding counter to the label would help this a lot, something like: > (2 failed out of 282)