Evaluation of Probabilistic Approaches to Topic Extraction in Short Documents
Abstract
Short texts are very popular in social media. Comments and reviews are examples of common short texts found in the Web. Topics extraction from text is a challenging task for content analysis. Lately, probabilistic topic modelling has been used as a tool for topic extraction. To extract topics from short documents is more challenging since the word co-occurrence is more sparse. The aim of this work is, thus, evaluate some short documents topic modelling to identify which one is more suitable in the scenarios proposed. We conduct experiments on three short text collections, and results show that the approaches have similar performances.
References
Cheng, X., Yan, X., Lan, Y., and Guo, J. (2014). Btm: Topic modeling over short texts. IEEE Transactions on Knowledge and Data Engineering, 26(12):2928–2941.
Quan, X., Kit, C., Ge, Y., and Pan, S. J. (2015). Short and sparse text topic modeling via self-aggregation. In IJCAI, pages 2270–2276.
Röder, M., Both, A., and Hinneburg, A. (2015a). Exploring the space of topic coherence measures. In Proceedings of the eight International Conference on Web Search and Data Mining, Shanghai, February 2-6.
Röder, M., Both, A., and Hinneburg, A. (2015b). Exploring the space of topic coherence measures. In Proceedings of the eighth ACM international conference on Web search and data mining, pages 399–408. ACM.
Steyvers, M. and Griffiths, T. (2007). Probabilistic topic models. Handbook of latent semantic analysis, 427(7):424–440.
Zuo, Y., Wu, J., Zhang, H., Lin, H., Wang, F., Xu, K., and Xiong, H. (2016a). Topic modeling of short texts: A pseudo-document view. In Proceedings of the 22nd ACM SIGKDD, pages 2105–2114. ACM.
Zuo, Y., Zhao, J., and Xu, K. (2016b). Word network topic model: a simple but general solution for short and imbalanced texts. Knowledge and Information Systems, 48(2):379–398.
