Bug Analysis in Jupyter Notebook Projects: An Empirical Study

Taijara L. Santana; Paulo A. da M. Silveira Neto; Eduardo S. Almeida; Iftekhar Ahmed

doi:10.5753/ctd.2025.8680

Taijara L. Santana UFBA
Paulo A. da M. Silveira Neto UFRPE
Eduardo S. Almeida UFBA
Iftekhar Ahmed University of California

DOI: https://doi.org/10.5753/ctd.2025.8680

Resumo

Computational Notebooks, such as Jupyter, have been widely adopted in data science for building data-driven code. Despite their popularity, challenges related to software development in these environments still need to be investigated. This study conducts a systematic analysis of bugs and difficulties faced by Jupyter practitioners. A total of 14,740 commits from 105 GitHub projects were mined, and 30,416 Stack Overflow posts were analyzed to identify common issues. Additionally, 19 interviews with data scientists were conducted to gather more details on these challenges. For validation, a survey with various professionals was carried out, along with an analysis based on the Apriori algorithm. Based on these findings, a taxonomy of bugs was proposed to classify different types of issues found in Jupyter projects.

Referências

Agrawal, R. and Srikant, R. (1994). Fast algorithms for mining association rules in large databases. In International Conference on Very Large Data Bases, pages 487–499.

Cao, L. (2017). Data science: A comprehensive overview. ACM Comput. Surv.

Chattopadhyay, S., Prasad, I., Henley, A. Z., Sarma, A., and Barik, T. (2020). What’s wrong with computational notebooks? pain points, needs, and design opportunities. In CHI ’20, pages 1–12.

Dhar, V. (2013). Data science and prediction. Commun. ACM.

Garcia, J., Feng, Y., Shen, J., Almanee, S., Xia, Y., and Chen, Q. A. (2020). A comprehensive study of autonomous vehicle bugs. In ICSE ’20, pages 385–396.

Head, A., Hohman, F., Barik, T., Drucker, S. M., and DeLine, R. (2019). Managing messes in computational notebooks. In CHI Conference, page 270.

Insider, B. (2020). Thousands of coronavirus cases were not reported for days in the uk because officials exceeded the data limit on their excel spreadsheet.

Islam, M. J., Nguyen, G., Pan, R., and Rajan, H. (2019). A comprehensive study on deep learning bug characteristics. In ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pages 510–520.

Kandel, S., Paepcke, A., Hellerstein, J. M., and Heer, J. (2012). Enterprise data analysis and visualization: An interview study. IEEE Trans. Vis. Comput. Graph., 18:2917–2926.

Kery, M. B., Radensky, M., Arya, M., John, B. E., and Myers, B. A. (2018). The story in the notebook: Exploratory data science using a literate programming tool. In CHI Conference, page 174.

Kery, M. B., Ren, D., Hohman, F., Moritz, D., Wongsuphasawat, K., and Patel, K. (2020). mage: Fluid moves between code and graphical work in computational notebooks. In UIST ’20, pages 140–151.

Koenzen, A. P., Ernst, N. A., and Storey, M. D. (2020). Code duplication and reuse in jupyter notebooks. In IEEE Symposium on Visual Languages and Human-Centric Computing, pages 1–9.

Makhshari, A. and Mesbah, A. (2021). Iot bugs and development challenges. In International Conference on Software Engineering, pages 460–472.

Patra, J. and Pradel, M. (2021). Nalin: Learning from runtime behavior to find name-value inconsistencies in jupyter notebooks. In International Conference on Software Engineering.

Pimentel, J. F., Murta, L., Braganholo, V., and Freire, J. (2019). A large-scale study about quality and reproducibility of jupyter notebooks. In International Conference on Mining Software Repositories, pages 507–517.

Pimentel, J. F., Murta, L., Braganholo, V., and Freire, J. (2021). Understanding and improving the quality and reproducibility of jupyter notebooks. Empir. Softw. Eng.

Rahman, A., Farhana, E., Parnin, C., and Williams, L. (2020). Gang of eight: A defect taxonomy for infrastructure as code scripts. In International Conference on Software Engineering, pages 752–764.

Rule, A., Tabard, A., and Hollan, J. D. (2018). Exploration and explanation in computational notebooks. In CHI Conference, page 32.

Tao, Y., Jiang, J., Liu, Y., Xu, Z., and Qin, S. (2020). Understanding Performance Concerns in the API Documentation of Data Science Libraries, pages 895–906.

Thung, F., Wang, S., Lo, D., and Jiang, L. (2012). An empirical study of bugs in machine learning systems. In International Symposium on Software Reliability Engineering, pages 271–280.

Wang, A. Y., Mittal, A., Brooks, C., and Oney, S. (2019). How data scientists use computational notebooks for real-time collaboration. Proc. ACM Hum.-Comput. Interact.

Wang, D., Li, S., Xiao, G., Liu, Y., and Sui, Y. (2021). An exploratory study of auto-pilot software bugs in unmanned aerial vehicles. In ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pages 20–31.

Wang, J., Li, L., and Zeller, A. (2020). Better code, better sharing: On the need of analyzing jupyter notebooks. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering: New Ideas and Emerging Results, pages 53–56.

Zhang, Y., Chen, Y., Cheung, S.-C., Xiong, Y., and Zhang, L. (2018). An empirical study on tensorflow program bugs. In ACM SIGSOFT International Symposium on Software Testing and Analysis, pages 129–140.