A clustering algorithm to evaluate the attitude of Brazilian researchers regarding open access research data
Resumo
The core point of the research process are data. They are records from scientific investigation, which support the results published in journals and conferences. Making research data available in open access digital repositories has many advantages, such as increasing the visibility of associated publications, reproducing experiments, and validating results. In Brazil, full and unrestricted sharing of them is not yet accepted by most researchers. This paper presents an initial study to describe a model analyzing the attitude of Brazilian researchers concerning open access research data. A clustering algorithm was used to identify different research profiles. The achieved results indicate the main reasons why the researchers object to share their data.
Referências
Caregnato, S. E., Vanz, S. A. S., Pavao, C. G., Passos, P. C. S. J., Borges, E. N., Gabriel Junior, R. F., Azambuja, L. A. B., and Rocha, R. P. (2019). Práticas e percepções dos pesquisadores brasileiros sobre serviços de acesso aberto a dados de pesquisa. LIINC em Revista, 15(2):121–141.
Fayyad, U. M., Piatetsky-Shapiro, G., Smyth, P., and Uthurusamy, R., editors (1996). Advances in Knowledge Discovery and Data Mining. American Association for Artificial Intelligence, USA.
Hartigan, J. A. and Wong, M. A. (1979). Algorithm AS 136: A K-Means clustering algorithm. Applied Statistics, 28(1):100–108.
Jain, A. K. (2010). Data clustering: 50 years beyond k-means. Pattern Recognition Letters, 31(8):651 – 666.
King, G. (2007). An introduction to the dataverse network as an infrastructure for data sharing. Sociological Methods & Research, 36(2):173–199.
Lloyd, S. (1982). Least squares quantization in pcm. IEEE Transactions on Information Theory, 28(2):129–137.
RDP Brasil (2019). Práticas e percepções dos pesquisadores brasileiros. Repositórios da Rede Nacional de Ensino e Pesquisa, V2, UNF:6:0pnd8/Eg635y5sVLfSgBrg==.
McHugh, M. (2013). The chi-square test of independence. Biochemia medica, 23:143– 149.
Rousseeuw, P. (1987). Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics, 20(1):53–65.
Tansley, R., Bass, M., Stuve, D., Branschofsky, M., Chudnov, D., McClellan, G., and Smith, M. (2003). The dspace institutional digital repository system: Current funcIn ACM/IEEE 2003 Joint Conference on Digital Libraries (JCDL 2003), tionality. Houston, Texas, USA, Proceedings, pages 87–97. IEEE Computer Society.
Tomasini., C., Borges, E. N., Machado, K., and Emmendorfer, L. (2017). A study on the relationship between internal and external validity indices applied to partitioning and density-based clustering algorithms. In Proc. 19th Int. Conference on Enterprise Information Systems Volume 3: ICEIS,, pages 89–98. INSTICC, SciTePress.
Vanz, S. A. S., Passos, P. C. J., Caregnato, S. E., Pavão, C. G., Borges, E. N., Rocha, R. P., Gabriel Junior, R. F., and Azambuja, L. A. B. (2018). Acesso aberto a dados de pesquisa no brasil: práticas e percepções dos pesquisadores: relatório 2018. Available at: http://hdl.handle.net/10183/185195.