Dsadvisor: A Tool to Support Predictive Tasks in Data Science

  • José Augusto Câmara Filho Universidade Federal do Ceará (UFC)
  • José Maria Monteiro Universidade Federal do Ceará (UFC)

Resumo


Currently, professionals from the most diverse areas of knowledge need to explore their data repositories in order to extract knowledge and create new products or services. Several tools have been proposed in order to facilitate the tasks involved in the Data Science lifecycle. However, such tools require their users to have specific (and deep) knowledge in different areas of Computing and Statistics, making their use practically unfeasible for non-specialist professionals in data science. In this paper, we propose a tool, which aims to encourage non-expert users to build machine learning models to solve predictive tasks, extracting knowledge from their own data repositories. More specifically, DSAdvisor these professionals in predictive tasks involving regression and classification

Palavras-chave: Data Science

Referências

Alcalá-Fdez, J., Sanchez, L., Garcia, S., del Jesus, M. J., Ventura, S., Garrell, J. M., Otero, J., Romero, C., Bacardit, J., Rivas, V. M., et al. (2009). Keel: a software tool to assess evolutionary algorithms for data mining problems. Soft Computing, 13(3):307–318.

Buitinck, L., Louppe, G., Blondel, M., Pedregosa, F., Mueller, A., Grisel, O., Niculae,V., Prettenhofer, P., Gramfort, A., Grobler, J., et al. (2013). Api design for machine learning software: experiences from the scikit-learn project. arXiv preprint arXiv:1309.0238.

Chertchom, P. (2018). A comparison study between data mining tools over regression methods: Recommendation for smes. In2018 5th International Conference on Business and Industrial Research (ICBIR), pages 46–50. IEEE.

DEMŠAR, Janez et al. Orange: data mining toolbox in Python. The Journal of machine Learning research, v. 14, n. 1, p. 2349-2353, 2013.

Filho, J. A. C., Monteiro, J. M., Mattos, C. L. C., and Nobre, J. S. (2021). A practical guide to support predictive tasks in data science. In Filipe, J., Smialek, M., Brodsky, A., and Hammoudi, S., editors, Proceedings of the 23rd International Conference on Enterprise Information Systems, ICEIS 2021, Online Streaming, April 26-28, 2021, Volume 1, pages 248–255. SCITEPRESS.

Grinberg, M. (2018).Flask web development: developing web applications with python.”O’Reilly Media, Inc.”.

Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., and Witten, I. H. (2009).The weka data mining software: an update. ACM SIGKDD explorations newsletter,11(1):10–18.

Hasim, N. and Haris, N. A. (2015). A study of open-source data mining tools for forecasting. In Proceedings of the 9th International Conference on Ubiquitous information management and Communication, pages 1–4.

James, G., Witten, D., Hastie, T., and Tibshirani, R. (2013).An introduction to statistical learning, volume 112. Springer.

Jovic, A., Brkic, K., and Bogunovic, N. (2014). An overview of free software tools for general data mining. In 2014 37th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO), pages 1112–1117. IEEE.

Olorisade, B. K., Brereton, P., and Andras, P. (2017). Reproducibility in machine learning-based studies: An example of text mining.

Ozdemir, S. (2016). Principles of data science. Packt Publishing Ltd.

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., et al. (2011). Scikit-learn: Machine learning in python. The Journal of machine Learning research, 12:2825–2830.

Provost, F. and Fawcett, T. (2013). Data science and its relationship to big data and data-driven decision-making. Big data, 1(1):51–59.

Ramamohan, Y., Vasantharao, K., Chakravarti, C. K., Ratnam, A., et al. (2012). A study of data mining tools in knowledge discovery process. International Journal of Soft Computing and Engineering (IJSCE) ISSN, 2(3):2231–2307.

Sandve, G. K., Nekrutenko, A., Taylor, J., and Hovig, E. (2013). Ten simple rules for reproducible computational research. PLoS Comput Biol, 9(10):e1003285.
Publicado
04/10/2021
CÂMARA FILHO, José Augusto; MONTEIRO, José Maria. Dsadvisor: A Tool to Support Predictive Tasks in Data Science. In: DEMONSTRAÇÕES E APLICAÇÕES - SIMPÓSIO BRASILEIRO DE BANCO DE DADOS (SBBD), 36. , 2021, Rio de Janeiro. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2021 . p. 81-86. DOI: https://doi.org/10.5753/sbbd_estendido.2021.18167.