BERT: Improving Text Classification with Extremely Random Trees, Bagging and Boosting

  • Raphael R. Campos Federal University of Minas Gerais
  • Marcos A. Gonçalves Federal University of Minas Gerais

Abstract


One of the most effective methods for text classification is the recently proposed BROOF classifier, a boosted version of Random Forest (RF). In this work, we propose to improve the BROOF strategy by exploiting Extremely Randomized Trees (Extra-Trees) as a “weak learner” in the boosting framework. In this context, we also introduce the Bagging procedure into the Extra-Trees models so that we can estimate a better Out-of-Bag (OOB) error when compared to the original BROOF. Our experiments with several textual datasets, comparing with up to nine state-of-the-art classifiers, show that our proposed method (a.k.a, BERT) is among the top performers classifiers in all tested datasets, outperforming the original BROOF in several cases.
Keywords: Sorting Methods, BROOF, Extremely Randomized Trees, Bagging

References

Breiman, L. (1996). Bagging predictors. Mach. Learn., 24(2):123–140.

Breiman, L. (2001). Random forests. Machine Learning, 45(1):5–32.

Fernández-Delgado, M., Cernadas, E., Barro, S., and Amorim, D. (2014). Do we need hundreds of classifiers to solve real world classification problems? J. Mach. Learn. Res., 15(1):3133–3181.

Freund, Y. and Schapire, R. E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci., 55(1):119–139.

Geurts, P., Ernst, D., and Wehenkel, L. (2006). Extremely randomized trees. Machine Learning, 63(1):3–42.

Hastie, T., Tibshirani, R., and Friedman, J. H. (2009). The Elements of Statistical Learning. Springer.

Salles, T., Gonçalves, M., Rodrigues, V., and Rocha, L. (2015). Broof: Exploiting out-of-bag errors, boosting and random forests for effective automated classification. In Proc. of the 38th International ACM SIGIR Conference on Inf. Retrieval, pages 353–362.

Segal, M. R. (2004). Machine learning benchmarks and random forest regression. Technical report, University of California.
Published
2016-10-04
CAMPOS, Raphael R.; GONÇALVES, Marcos A.. BERT: Improving Text Classification with Extremely Random Trees, Bagging and Boosting. In: BRAZILIAN SYMPOSIUM ON DATABASES (SBBD), 31. , 2016, Salvador/BA. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2016 . p. 127-132. ISSN 2763-8979. DOI: https://doi.org/10.5753/sbbd.2016.24316.