Triage Safari differences between Azure Pipelines and Buildbot setup #646
Description
With web-platform-tests/wpt#14836 we've begun running Safari Technology Preview on Azure Pipelines, currently every 12 hours.
The first aligned run between Azure Pipelines and Buildbot has now occured:
https://wpt.fyi/results/?diff&run_id=5992310068215808&run_id=6012154763280384
The latest aligned run at any time will be:
https://wpt.fyi/results/?label=master&label=experimental&product=safari%5Bbuildbot%5D&product=safari%5Bazure%5D&aligned&diff
And all of the runs that could be compared are:
https://wpt.fyi/runs?label=master&label=experimental&max-count=100&product=safari%5Bbuildbot%5D&product=safari%5Bazure%5D&aligned
There are some differences in results, and the differences in webdriver/ is the largest.
We should understand the differences before committing to relying on only Azure Pipelines.
@mariestaver we'll to discuss when to do this work, vs. the cost of keeping Buildbot running. Waiting has its benefits because it gives us more data to compare, which could help filter out flakiness.