Proposal for Comparing the Efficiency and Scalability of Python Libraries for Data Manipulation and Analysis
Abstract
Esta proposta de pesquisa visa analisar a eficiência e escalabilidade de bibliotecas Python para manipulação e análise de dados. O objetivo central consiste em identificar as soluções mais adequadas para lidar com grandes volumes de dados, considerando o desempenho em termos de tempo de execução, uso de memória e capacidade de escalabilidade. O estudo se propõe a comparar bibliotecas como Pandas, Polars, Dask, Modin e PySpark, contribuindo para a criação de diretrizes mais eficazes no uso dessas bibliotecas.References
Dask Development Team (2024). Dask: Scale the python tools you love. [link]. Acesso em: 23 Nov 2024.
Foundation, A. S. (2024). Apache spark: A unified analytics engine for large-scale data processing. Acesso em: 23 Nov 2024.
Mckinney, W. (2011). pandas: a foundational python library for data analysis and statistics. Python High Performance Science Computer.
Modin (2024). Modin: Scale your pandas workflows by changing a single line of code. Acesso em: 23 Nov 2024.
Petersohn, D. (2018). Scaling interactive data science transparently with modin. Master’s thesis, EECS Department, University of California, Berkeley.
Pola-rs (2024). Polars: Lightning-fast dataframe library for rust and python. Acesso em: 23 Nov 2024.
Pöss, M. and Floyd, C. (2000). New tpc benchmarks for decision support and web commerce. ACM SIGMOD Record, 29(4):64–71.
Foundation, A. S. (2024). Apache spark: A unified analytics engine for large-scale data processing. Acesso em: 23 Nov 2024.
Mckinney, W. (2011). pandas: a foundational python library for data analysis and statistics. Python High Performance Science Computer.
Modin (2024). Modin: Scale your pandas workflows by changing a single line of code. Acesso em: 23 Nov 2024.
Petersohn, D. (2018). Scaling interactive data science transparently with modin. Master’s thesis, EECS Department, University of California, Berkeley.
Pola-rs (2024). Polars: Lightning-fast dataframe library for rust and python. Acesso em: 23 Nov 2024.
Pöss, M. and Floyd, C. (2000). New tpc benchmarks for decision support and web commerce. ACM SIGMOD Record, 29(4):64–71.
Published
2025-04-23
How to Cite
CIRILLO, Gabriel R. B.; GALANTE, Guilherme.
Proposal for Comparing the Efficiency and Scalability of Python Libraries for Data Manipulation and Analysis. In: REGIONAL SCHOOL OF HIGH PERFORMANCE COMPUTING FROM SOUTHERN BRAZIL (ERAD-RS), 25. , 2025, Foz do Iguaçu/PR.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2025
.
p. 133-134.
ISSN 2595-4164.
DOI: https://doi.org/10.5753/eradrs.2025.6563.
