CUR: Group Profiling with Community-based Users’ Representation
Resumo
Group profiling methods aim to construct a descriptive profile for communities in social networks. Before the application of a profiling algorithm, it is necessary to collect and preprocess the users’ content information, i.e., to build a representation of each user in the network. Usually, existing group profiling strategies define the users’ representation by uniformly processing the entire content information in the network, and then, apply traditional feature selection methods over the user features in a group. However, such strategy may ignore specific characteristics of each group. This fact can lead to a limited representation for some communities, disregarding attributes which are relevant to the network perspective and describing more clearly a particular community despite the others. In this context, we propose the community-based user’s representation method (CUR). In this proposal, feature selection algorithms are applied over user features for each network community individually, aiming to assign relevant feature sets for each particular community. Such strategy will avoid the bias caused by larger communities on the overall user representation. Experiments were conducted in a co-authorship network to evaluate the CUR representation on different group profiling strategies and were assessed by hu- man evaluators. The results showed that profiles obtained after the application of the CUR module were better than the ones obtained by conventional users’ representation on an average of 76.54% of the evaluations.
Referências
Barrera, A. and Verma, R. (2012). Computational Linguistics and Intelligent Text Processing: 13th International Conference, CICLing 2012, New Delhi, India, March 11-17, 2012, Proceedings, Part II, chapter Combining Syntax and Semantics for Automatic Extractive Single-Document Summarization, pages 366–377. Springer Berlin Heidelberg, Berlin, Heidelberg.
Blondel, V., Guillaume, J., Lambiotte, R., and Lefebvre, E. (2008). Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment, 10:8.
Fortunato, S. and Lancichinetti, A. (2009). Community detection algorithms: a comparative analysis: invited presentation, extended abstract. In Proceedings of the Fourth International ICST Conference on Performance Evaluation Methodologies and Tools, VALUETOOLS ’09, pages 27:1–27:2.
Getoor, L. and Diehl, C. P. (2005). Link mining: a survey. SIGKDD Exploration Newsletter, 7(2):3–12.
Gomes, J. E. A., Prudêncio, R. B. C., and Nascimento, A. C. A. (2018). Centrality-based group profiling: A comparative study in co-authorship networks. New Generation Computing, 36(1):59–89.
Gomes, J. E. A., Prudêncio, R. B. C., Meira, L., Azevedo Filho, A., Nascimento, A. C. A., and Oliveira, H. (2013). Profiling for understanding educational social networking. Software Engineering and Knowledge Engineering (SEKE 2013).
Gomes, J. E. A., Prudêncio, R. B. C., and Nascimento, A. C. A. (2016). A comparative study of group profiling techniques in co-authorship networks. In 2016 5th Brazilian Conference on Intelligent Systems (BRACIS), pages 373–378.
Kuhn, A., Ducasse, S., and Gírba, T. (2007). Semantic clustering: Identifying topics in source code. Inf. Softw. Technol., 49(3):230–243.
Lü, L. and Zhou, T. (2011). Link prediction in complex networks: A survey. Physica A, 390(6):11501170.
Maqbool, O. and Babri, H. (2005). Interpreting clustering results through cluster labeling. In Emerging Technologies, 2005. Proceedings of the IEEE Symposium on, pages 429–434.
Popescul, A. and Ungar, L. H. (2000). Automatic labeling of document clusters.
Tang, L., Liu, H., Zhang, J., Agarwal, N., and Salerno, J. J. (2008). Topic taxonomy adaptation for group profiling. ACM Trans. Knowl. Discov. Data, 1(4):1:1–1:28.
Tang, L., Wang, X., and Liu, H. (2011). Group profiling for understanding social structures. ACM Trans. Intell. Syst. Technol., 3:15:1–15:25.