An overview of Brazilian researches in the Computer Science field in last years

Leandro Peres; Pablo Cecilio; Francielly Rodrigues; Nícollas Silva; Leonardo Rocha

doi:10.5753/kdmile.2019.8783

Leandro Peres Universidade Federal de São João del Rei (UFSJ)
Pablo Cecilio Universidade Federal de São João del Rei (UFSJ)
Francielly Rodrigues Laboratório Nacional de Computação Científica (LNCC)
Nícollas Silva Universidade Federal de Minas Gerais (UFMG)
Leonardo Rocha Universidade Federal de São João del Rei (UFSJ)

DOI: https://doi.org/10.5753/kdmile.2019.8783

Resumo

Recently, most traditional market services have joined online service platforms. Despite the practicality achieved, such services eventually bring a large amount of data to the Web. In this sense, data analysis, data engineering, and data science activities have become extremely necessary. In general, they can extract extra information about systems and users, allowing the owners to produce insights and analyze patterns. Then, we propose an evaluation methodology to be applied in the online scenario of registration of publications and scientific productions, such as ResearchGate and Lattes Platform of CNPq. This methodology is unsupervised and divided into three main stages: (i) obtaining and representing the data; (ii) application of topic modeling; and (iii) the labeling of topics. This proposal diverges from the literature’s proposes that are based on collaborative networks and supervised techniques. We applied this methodology to a Lattes database and were able to observe the evolution of Computer Science research in Brazil. Based on this analysis, it is possible to identify the most popular and least explored research lines in order to direct public investments according to a certain interest.

Palavras-chave: Topic Modeling, Topic Labeling, Data Mining

Referências

Blei, D. M., Ng, A. Y., and Jordan, M. I. Latent dirichlet allocation. Journal of machine Learning research 3 (Jan): 993–1022, 2003.

Choi, T.-M., Chan, H. K., and Yue, X. Recent development in big data analytics for business operations and risk management. IEEE transactions on cybernetics 47 (1): 81–92, 2016.

Cichocki, A., Zdunek, R., Phan, A. H., and Amari, S.-i. Nonnegative matrix and tensor factorizations: applications to exploratory multi-way data analysis and blind source separation. John Wiley & Sons, 2009.

Cormode, G. and Krishnamurthy, B. Key differences between web 1.0 and web 2.0. First Monday 13 (6), 2008.

Coursey, K., Mihalcea, R., and Moen, W. Using encyclopedic knowledge for automatic topic identification. In Proceedings of CoNLL. pp. 210–218, 2009.

de Siqueira, G. O., Canuto, S., Gonçalves, M. A., and Laender, A. H. A pragmatic approach to hierarchical categorization of research expertise in the presence of scarce information. IJDL, 2018.

Dean, J. and Ghemawat, S. Mapreduce: simplified data processing on large clusters. Communications of the ACM 51 (1): 107–113, 2008.

Figueiredo, F., Rocha, L., Couto, T., Salles, T., Gonçalves, M. A., and Jr., W. M. Word co-occurrence features for text classification. Information Systems 36 (5): 843 – 858, 2011.

Golub, G. H. and Reinsch, C. Singular value decomposition and least squares solutions. In Linear Algebra. Springer, pp. 134–151, 1971.

Grácio, M. C. C. Colaboração científica: indicadores relacionais de coautoria. Brazilian Journal of Information Science: research trends 12 (2), 2018.

Hulpus, I., Hayes, C., Karnstedt, M., and Greene, D. Unsupervised graph-based topic labelling using dbpedia. In Proceedings of the sixth ACM international conference on Web search and data mining. ACM, pp. 465–474, 2013.

Kim, K., Chung, B.-S., Jung, J.-Y., and Park, J. Revenue maximizing itemset construction for online shopping services. Industrial Management & Data Systems 113 (1): 96–116, 2013.

Lee, D. D. and Seung, H. S. Learning the parts of objects by non-negative matrix factorization. Nature 401 (6755): 788, 1999.

Lin, C.-J. Projected gradient methods for nonnegative matrix factorization. Neural computation 19 (10): 2756–2779, 2007.

Luiz, W., Viegas, F., Alencar, R., Mourão, F., Salles, T., Carvalho, D., Gonçalves, M. A., and Rocha, L. A feature-oriented sentiment rating for mobile app reviews. In Proceedings of the 2018 World Wide Web Conference on World Wide Web. International World Wide Web Conferences Steering Committee, pp. 1909–1918, 2018.

Mei, Q., Shen, X., and Zhai, C. Automatic labeling of multinomial topic models. In Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp. 490–499, 2007.

Mena-Chalco, J. P. and Junior, R. M. C. Scriptlattes: an open-source knowledge extraction system from the lattes platform. Journal of the Brazilian Computer Society 15 (4): 31–39, 2009.

Nomoto, T. Wikilabel: an encyclopedic approach to labeling documents en masse. In Proceedings of the 20th ACM international conference on Information and knowledge management. ACM, pp. 2341–2344, 2011.

Salton, G. and Buckley, C. Term-weighting approaches in automatic text retrieval. Information processing & management 24 (5): 513–523, 1988.

Yildirim, K. S. and Kantarci, A. Time synchronization based on slow-flooding in wireless sensor networks. IEEE Transactions on Parallel and Distributed Systems 25 (1): 244–253, 2013.