Jekyll2023-09-07T20:25:23+00:00https://baolef.github.io/libreoffice-ci/feed.xmlLibreOffice CI Test Selection with Machine LearningGSoC'23 ProjectBaole FangWeek 112023-08-10T00:00:00+00:002023-08-10T00:00:00+00:00https://baolef.github.io/libreoffice-ci/2023/08/10/week11<h3 id="model-improvement">Model improvement</h3>
<p>Now, the <a href="https://github.com/baolef/libreoffice-ci/blob/main/models/testfailure.py">testfailure</a> model no longer considers author features.</p>Baole FangModel improvementWeek 102023-08-03T00:00:00+00:002023-08-03T00:00:00+00:00https://baolef.github.io/libreoffice-ci/2023/08/03/week10<h3 id="smart-inference">Smart inference</h3>
<p>Previously, jenkins only uses <a href="https://github.com/baolef/libreoffice-ci/blob/main/models/testfailure.py">testfailure</a> results to decide whether the patch will pass or fail. Since it is not very accurate and <a href="https://github.com/baolef/libreoffice-ci/blob/main/models/testselect.py">testselect</a> is accurate, a better algorithm using its prediction is used to pass or fail a patch.</p>
<p><a href="https://github.com/baolef/libreoffice-ci/blob/main/models/testoverall.py">testoverall</a> is proposed to integrate <a href="https://github.com/baolef/libreoffice-ci/blob/main/models/testselect.py">testselect’s</a> predictions into <a href="https://github.com/baolef/libreoffice-ci/blob/main/models/testfailure.py">testfailure</a>. Compared to <a href="https://github.com/baolef/libreoffice-ci/blob/main/models/testfailure.py">testfailure</a>, its failure recall significantly increases from 54% to 71%, while pass recall slightly drops from 70% to 65%. Since failure recall is much more important than pass recall, the model is a huge improvement.</p>
<p>Due to <a href="https://github.com/baolef/libreoffice-ci/blob/main/models/testoverall.py">testoverall</a> outstanding performance, it replaces <a href="https://github.com/baolef/libreoffice-ci/blob/main/models/testfailure.py">testfailure</a> in inference.</p>
<p>Besides, a new condition is added to decide whether the patch should pass or fail. Originally, it only looks at whether the overall failing probability has reached a threshold (0.4). Now, the number of failed unit tests are counted. If it reaches the threshold (10), then the patch is also considered to be failed. With the improved algorithm, the inference is able to recall 91% failures, while reducing computation by 57%.</p>
<h3 id="jenkins-integration">Jenkins integration</h3>
<p>Currently, the model is integrated into Jenkins job <a href="https://ci.libreoffice.org/job/gerrit_master_ml/">gerrit_master_ml</a>. It first runs the machine learning model to predict whether the patch will pass or fail. If the patch is likely to fail, then the <a href="https://ci.libreoffice.org/job/gerrit_master_seq/">fast track</a> will be run. If it is likely to fail, then the <a href="https://ci.libreoffice.org/job/gerrit_master/">normal build</a> will be run.</p>Baole FangSmart inferenceWeek 92023-07-27T00:00:00+00:002023-07-27T00:00:00+00:00https://baolef.github.io/libreoffice-ci/2023/07/27/week9<h3 id="model-improvement">Model improvement</h3>
<p>To improve model performance, the model based on grouped unit tests is implemented. Originally, the model is trained to predict on the level of around 700 unit tests, which is too much. To reduce the number of predictions, unit tests are grouped into 80 groups based on their folder parents and functions in <a href="https://github.com/baolef/libreoffice-ci/blob/group/dataset/mapping.py">mapping.py</a>. The performance has improved to:</p>
<table>
<thead>
<tr>
<th> </th>
<th>Fail (Predicted)</th>
<th>Pass (Predicted)</th>
</tr>
</thead>
<tbody>
<tr>
<td>Fail (Actual)</td>
<td>3860</td>
<td>203</td>
</tr>
<tr>
<td>Pass (Actual)</td>
<td>191593</td>
<td>1109768</td>
</tr>
</tbody>
</table>
<p><a href="https://github.com/baolef/libreoffice-ci/blob/main/models/testselect.py">testselect</a> is now able to recognize 95% (94% previously) of all failures, while reducing computation by 85% (84% previously).</p>Baole FangModel improvementWeek 82023-07-20T00:00:00+00:002023-07-20T00:00:00+00:00https://baolef.github.io/libreoffice-ci/2023/07/20/week8<h3 id="result-archive">Result archive</h3>
<p>Every time Jenkins runs the model, the inference results will be saved to <code class="language-plaintext highlighter-rouge">probability.csv</code>, which is archived by Jenkins.</p>
<h3 id="jenkins-integration">Jenkins integration</h3>
<p>The model is integrated into a <a href="https://ci.libreoffice.org/job/gerrit_master_ml/">master job</a>. In this job, the model will first be run to decide whether the commit is likely to fail. If it is, then run <a href="https://ci.libreoffice.org/job/gerrit_linux_clang_dbgutil/">gerrit_linux_clang_dbgutil</a> first. If it fails, then return -1, else run the rest builds. If the model predicts that the commit is unlikely to fail, then run all the build in parallel like before.</p>Baole FangResult archiveWeek 72023-07-12T00:00:00+00:002023-07-12T00:00:00+00:00https://baolef.github.io/libreoffice-ci/2023/07/12/week7<h3 id="jenkins-integration">Jenkins integration</h3>
<p>Currently, the model is integrated into <a href="https://ci.libreoffice.org/job/machine_learning_model/">Jenkins</a>. The average build duration is around 15s, and it is able to support 5 builds in parallel.</p>
<p>Its <a href="https://ci.libreoffice.org/job/machine_learning_model/lastBuild/console">output log</a> mainly contains the probability of a patch to fail a unit test and its overall probability to fail any test. The overall probability is shown in the build summary page.</p>Baole FangJenkins integrationWeek 62023-07-06T00:00:00+00:002023-07-06T00:00:00+00:00https://baolef.github.io/libreoffice-ci/2023/07/06/week6<h3 id="jenkins-integration">Jenkins integration</h3>
<p>Currently, the model is integrated into <a href="https://ci.libreoffice.org/job/machine_learning_model/">Jenkins</a>. The average build duration is around 15s, and it is able to support 5 builds in parallel.</p>
<p>Its <a href="https://ci.libreoffice.org/job/machine_learning_model/lastBuild/console">output log</a> mainly contains the probability of a patch to fail a unit test and its overall probability to fail any test.</p>
<p>Further work will be done to better integrate the model into Jenkins.</p>Baole FangJenkins integrationWeek 52023-06-29T00:00:00+00:002023-06-29T00:00:00+00:00https://baolef.github.io/libreoffice-ci/2023/06/29/week5<h3 id="model-inference">Model inference</h3>
<p><a href="https://github.com/baolef/libreoffice-ci/blob/main/test.py">Inference</a> is completed using <a href="https://github.com/baolef/libreoffice-ci/blob/main/models/testfailure.py">testfailure</a> and <a href="https://github.com/baolef/libreoffice-ci/blob/main/models/testselect.py">testselect</a> to predict unit tests to be run for a commit.</p>
<h3 id="model-sharing">Model sharing</h3>
<p>More training results, such as models, intermediate data and metrics are shared in the <a href="https://github.com/baolef/libreoffice-ci">repository</a>.</p>Baole FangModel inferenceWeek 42023-06-22T00:00:00+00:002023-06-22T00:00:00+00:00https://baolef.github.io/libreoffice-ci/2023/06/22/week4<h3 id="model-training">Model training</h3>
<p>Two models are trained with the full dataset:</p>
<ul>
<li><a href="https://github.com/baolef/libreoffice-ci/blob/main/models/testfailure.py">testfailure</a> predicts whether a commit will fail any unit test. It only considers commit features.</li>
<li><a href="https://github.com/baolef/libreoffice-ci/blob/main/models/testselect.py">testselect</a> predicts which unit tests the commit will fail. It only considers unit tests features.</li>
</ul>
<p>These two models are based on <a href="https://github.com/mozilla/bugbug">bugbug</a>, but they have one main limitation. Commit and unit test features are considered independently. The better way to solve this problem is to consider these two kinds of features together to predict whether a <code class="language-plaintext highlighter-rouge">(commit, test)</code> pair will pass or fail.</p>Baole FangModel trainingWeek 32023-06-15T00:00:00+00:002023-06-15T00:00:00+00:00https://baolef.github.io/libreoffice-ci/2023/06/15/week3<h3 id="model-training">Model training</h3>
<p>Basic <a href="https://github.com/baolef/libreoffice-ci/blob/main/train.py">model training</a> pipeline is completed with <a href="https://github.com/baolef/libreoffice-ci/blob/main/models/testselect.py">testselect</a> model. Further optimization is needed to reduce memory and time cost, together with performance.</p>
<p>Currently, <a href="https://github.com/baolef/libreoffice-ci/blob/main/models/testselect.py">testselect</a> is trained on a subset of size 16384 (containing training and testing set) of the full dataset of size 122019 due to memory cost, and it has reached a failure recall of 91.4% and saving 90% of unit test computational cost. Its detailed confusion matrix is shown below:</p>
<table>
<thead>
<tr>
<th> </th>
<th>Fail (Predicted)</th>
<th>Pass (Predicted)</th>
</tr>
</thead>
<tbody>
<tr>
<td>Fail (Actual)</td>
<td>480</td>
<td>45</td>
</tr>
<tr>
<td>Pass (Actual)</td>
<td>556910</td>
<td>5045893</td>
</tr>
</tbody>
</table>Baole FangModel trainingWeek 22023-06-07T00:00:00+00:002023-06-07T00:00:00+00:00https://baolef.github.io/libreoffice-ci/2023/06/07/week2<h3 id="commit-feature-extraction">Commit feature extraction</h3>
<p><a href="https://github.com/baolef/libreoffice-ci/blob/data/dataset/mining.py">Commit feature extraction</a> is finished with multiprocessing. The commits come from the csv table. Features are based on the patch (what changes in the commit), code features, author features and so on. The output is saved in <code class="language-plaintext highlighter-rouge">data/commits.json</code>.</p>
<h3 id="unit-test-feature-extraction">Unit test feature extraction</h3>
<p><a href="https://github.com/baolef/libreoffice-ci/blob/data/dataset/test_history.py">Unit test feature extraction</a> is finished with single thread with speed up. It computes features of unit tests from <code class="language-plaintext highlighter-rouge">data/commits.json</code>.</p>Baole FangCommit feature extraction