Metrics

Which metrics to use for which test type

Productie (load test) -> monitor response times

Response times, use column 'evaluated' which is a percentile with extreme spikes filtered out
Threshold violations: how many transactions exceed their requirement
Response time grand averages (avg of avg, avg of 90 percentile): keep an eye on the trend over last x tests

Stresstest -> monitor breaking point

Breaking point stress: watch trend over tests to see the breaking point rise or decline. Decline means the application is becoming less capable of handling peak load

Don't mind the response times in this test. See to it that the application has 'enough' margin between normal production load level and breaking point.

Duurtest (endurance test) -> monitor breaking point

Breaking point endurance, should be 0% meaning the test runs flawlessly without hiccups (depending on the application)

Don't mind the response times in this test

Metrics generated by the performancetest report and how to use them

Breaking point stress (vusers) - evaluation of stress tests

This metric is an indication on which level of load the application in the test environment breaks. Usually a test environment has limitations compared to production, so use this metric only for trending in the development context. If this metric is low or declines over subsequent tests, the margins are low or decreasing. The stress breaking point is expressed in number of virtual users at which response times increase beyond a factor of baselined response times.

Algorithm The average response times in the first stage of the test (minute 5 - 10) are used as a baseline. When load is gradually increasing, average response times are monitored. As soon as they have increased above 2x baseline, the trend is considered broken. This breaking point is expressed in number of virtual users. Please be aware that a virtual user used in a performance test usually is more active than a real user.

Use The question is: do your margins give you enough headroom for processing spikes in traffic? This metric gives an answer to that question. The value of this metric should be compared to previous tests, so the trend is more useful to evaluate than the value itself. If the metric is decreasing, the ability of your application to digest traffic spikes is decreasing. Be sure to create a stresstest that will push the application beyond its limits to actually see the application break. If reports are shared with customers, be sure to explain that a virtual user is not the same as real user.

This algorythm does not work well for:

Applications that use buffers to handle overload situations, they are designed to never break
Tests that generate huge fluctuations in troughput or average response times, the algorythm will trigger too soon

Breaking point endurance (%) - evaluation of endurace or stability tests

The percentage of the progress of the test where a trendbreak is noticeable. If the application starts generating faults or high response times, the percentage of the duration at which the trendbreak occurs is reported. 100%=stable, 75%=breaks at 75% of planned duration. Evaluate this metric only for endurance testing. If this metric is less than 100%, usually some resource is exhausted during the test (memory, storage, ...).

Algorithm Evaluating the average response times and errors of the last 5 intervals, the next expected values shoud be within 25% range + or – of the historical average. When the measured values exceed this range, the trend is considered broken.

NEW: Evaluating the average throughput (number of transactions) and errors of the last 5 intervals the next interval average should be within the historical range, allowing for 50% margin positive or negative. So the trend will be considered broken as soon as the historical bandwith is exceeded. The 50% margin allows for variation in bandwith.

Use This measure is meant to monitor if an application will continue running stable during an x hour endurance test (duurtest). The Trendbreak 'stability' measure should be 100%. If response times or error rates rise this algorythm will put a percentage of the test this anomaly or break was detected. If the stability shows a trendbreak at or after 10% of the test, the Trendbreak 'stability' will show 10%. This measure should always be 100%.

This algorythm does not work well for all appllications or situations:

Applications with really fast interfaces (e.g. API’s with low response times)
Tests containing very long transactions executed with low frequency times during the test

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Metrics

Which metrics to use for which test type

Productie (load test) -> monitor response times

Stresstest -> monitor breaking point

Duurtest (endurance test) -> monitor breaking point

Metrics generated by the performancetest report and how to use them

Breaking point stress (vusers) - evaluation of stress tests

Breaking point endurance (%) - evaluation of endurace or stability tests

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally