We currently only use data from autoland, which is a subset of the failure data.
Often, developers push to try, see failures, and then fix them before pushing to autoland. This means there is a large set of failures we are currently ignoring when training the model.