Previously, jenkins only uses testfailure results to decide whether the patch will pass or fail. Since it is not very accurate and testselect is accurate, a better algorithm using its prediction is used to pass or fail a patch.
testoverall is proposed to integrate testselect’s predictions into testfailure. Compared to testfailure, its failure recall significantly increases from 54% to 71%, while pass recall slightly drops from 70% to 65%. Since failure recall is much more important than pass recall, the model is a huge improvement.
Besides, a new condition is added to decide whether the patch should pass or fail. Originally, it only looks at whether the overall failing probability has reached a threshold (0.4). Now, the number of failed unit tests are counted. If it reaches the threshold (10), then the patch is also considered to be failed. With the improved algorithm, the inference is able to recall 91% failures, while reducing computation by 57%.
Currently, the model is integrated into Jenkins job gerrit_master_ml. It first runs the machine learning model to predict whether the patch will pass or fail. If the patch is likely to fail, then the fast track will be run. If it is likely to fail, then the normal build will be run.