Adaptive Fast XGBoost for Binary Classification
Resumo
Modern machine learning algorithms must be able to fast consume data streams, maintaining accurate results, even with the presence of concept drift. This work proposes AFXGB, an Adaptive Fast binary classification algorithm using XGBoost, focusing on the fast induction of labeled data streams. AFXGB uses an alternate model training strategy to achieve lean models adapted to concept drift. We compared AFXGB with other data stream classifiers using synthetic and real datasets. The results showed that AFXGB is four times faster than ARF and 22 times faster than AXGB, maintaining the same accuracy and with the fastest recovery from concept drifts, thus preserving long-term accuracy.
Referências
H. M. Gomes, A. Bifet, J. Read, J. P. Barddal, F. Enembreck, B. Pfharinger, G. Holmes, and T. Abdessalem (2017). Adaptive random forests for evolving data stream classification, Machine Learning, vol. 106, no. 9, pp. 1469-1495, Oct 2017.
K. K. Wankhade, S. S. Dongre, and K. C. Jondhale (2020). Data stream classification: a review, Iran Journal of Computer Science, vol. 3, no. 4, pp. 239-260, Dec 2020.
H. M. Gomes, J. P. Barddal, L. Boiko Ferreira, and A. Bifet (2018). Adaptive random forests for data stream regression, 04 2018.
R. D. Baruah, P. Angelov, and D. Baruah (2014). Dynamically evolving fuzzy classifier for real-time classification of data streams, in 2014 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), 2014, pp. 383-389.
H. Guo, H. Li, Q. Ren, and W. Wang (2022). Concept drift type identification based on multi-sliding windows, Information Sciences, vol. 585, pp. 1-23, 2022.
H. Binder, O. Gefeller, M. Schmid, and A. Mayr (2014). The evolution of boosting algorithms, Methods of Information in Medicine, vol. 53, no. 06, pp. 419-427, 2014.
H. M. Gomes, J. P. Barddal, F. Enembreck, and A. Bifet (2017). A survey on ensemble learning for data stream classification, ACM Comput. Surv., vol. 50, no. 2, mar 2017.
T. Chen and C. Guestrin (2016). XGBoost: A scalable tree boosting system, in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ser. KDD ’16. New York, NY, USA: ACM, 2016, pp. 785-794.
R. Santhanam, S. Raman, N. Uzir, and S. Banerjeeb (2016). Experimenting XGBoost algorithm for prediction and classification of different datasets, International Journal of Control Theory and Applications, vol. 9, no. 40, 2016.
P. B. Dongre and L. G. Malik (2014). A review on real time data stream classification and adapting to various concept drift scenarios, in 2014 IEEE International Advance Computing Conference (IACC), 2014, pp. 533-537.
M. Datar and R. Motwani (2007). The Sliding-Window Computation Model and Results. Boston, MA: Springer US, 2007, pp. 149-167.
G. Hulten, L. Spencer, and P. Domingos (2001). Mining time-changing data streams, in Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ser. KDD ’01. New York, NY, USA: Association for Computing Machinery, 2001, p. 97-106.
A. Bifet and R. Gavaldà (2009). Adaptive learning from evolving data streams, in Advances in Intelligent Data Analysis VIII, N. M. Adams, C. Robardet, A. Siebes, and J.-F. Boulicaut, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, 2009, pp. 249-260.
A. Bifet and R. Gavalda (2007). Learning from Time-Changing Data with Adaptive Windowing, pp. 443-448.
J. Montiel, R. Mitchell, E. Frank, B. Pfahringer, T. Abdessalem, and A. Bifet (2020). Adaptive XGBoost for evolving data streams, in 2020 International Joint Conference on Neural Networks (IJCNN), 2020, pp. 1-8.
X. Wu, P. Li, and X. Hu (2012). Learning from concept drifting data streams with unlabeled data, Neurocomputing, vol. 92, pp. 145-155, 2012, data Mining Applications and Case Study.
B. Calvo and G. Santafé Rodrigo (2016). scmamp: Statistical comparison of multiple algorithms in multiple problems, The R Journal, Vol. 8/1, Aug. 2016, 2016.
J. Demsar (2006). Statistical comparisons of classifiers over multiple data sets, The Journal of Machine Learning Research, vol. 7, pp. 1-30, 2006.