C4.5, RainForest, decision trees, machine learning, performance optimization
Classification is an important machine learning problem, and decision tree construction algorithms are an important class of solutions to this problem. RainForest is a scalable way to implement decision tree construction algorithms. It consists of several algorithms, of which the best one is a hybrid between a traditional recursive implementation and an iterative implementation which uses more memory but involves less write operations. We propose an optimized algorithm inspired by RainForest. By using a more sophisticated switching criterion between the two algorithms, we are able to get a performance gain even when all statistical information fits in memory. Evaluations show that our method can achieve a performance boost of 2.8 times in average than the traditional recursive implementation.
Tsinghua University Press
Yi Yang, Wenguang Chen. Taiga: Performance Optimization of the C4.5 Decision Tree Construction Algorithm. Tsinghua Science and Technology 2016, 21(4): 415-425.