Week 3

Model training

Basic model training pipeline is completed with testselect model. Further optimization is needed to reduce memory and time cost, together with performance.

Currently, testselect is trained on a subset of size 16384 (containing training and testing set) of the full dataset of size 122019 due to memory cost, and it has reached a failure recall of 91.4% and saving 90% of unit test computational cost. Its detailed confusion matrix is shown below:

	Fail (Predicted)	Pass (Predicted)
Fail (Actual)	480	45
Pass (Actual)	556910	5045893