Specification of a Data Science Experiment for Contaminated Area Analysis

  • Rosa Virginia Encinas Quille USP / IPT
  • Gabriela dos Santos Luchetti Vieira UNESP
  • Leandro Gomes de Freitas IPT
  • Pedro Luiz Pizzigatti Corrêa USP
  • Solange Nice Alves de Souza USP
  • Alexandre Muselli Barbosa IPT
  • Felipe Valencia de Almeida USP

Abstract


This work presents the methodological aspects of an experiment with Data Science to optimize the management of environmental data from Contaminated Sites management systems. Recent studies show that there is a lack of maturity models in the management of environmental data, such as the monitoring of contaminated sites. This scenario presents opportunities for innovation, through the application of computational tools associated with the digital transformation, intelligence, and data analytics. Therefore, a real case study is presented, involving the preliminary qualitative analysis of an environmental monitoring dataset from the EACH USP University Campus. It could be observed that, despite the quality of the database, there are management and availability problems, confirming the opportunity to carry out a proof of concept experiment with that methodology.

References

Anifowose, B. and Anifowose, F. (2024). Artificial intelligence and machine learning in environmental impact prediction for soil pollution management – case for eia process. Environmental Advances, 17:100554.

Appelgren, A., Bergstrom, U., Brittain, J., Gallego, D., Hakanson, L., Monte, L., et al. (1996). An outline of a model-based expert system to identify optimal remedial strategies for restoring contaminated aquatic ecosystems: the project moira. Technical report, ENEA.

Asante-Duah, K. (2019). Management of contaminated site problems. CRC Press.

ASTM (2016). Standard guide for evaluating potential hazard as a result of methane in the vadose zone. ASTM Standard E2993–16. ASTM Committee E50 on Environmental Assessment, Risk Management and Corrective Action.

Barbosa, M., Bertolo, R. A., and Hirata, R. (2017). A method for environmental data management applied to megasites in the state of sao paulo, brazil. Journal of Water Resource and Protection, 09(3):322–338.

Bi, Z., Sun, J., Xie, Y., Gu, Y., Zhang, H., Zheng, B., Ou, R., Liu, G., Li, L., Peng, X., Gao, X., and Wei, N. (2024). Machine learning-driven source identification and ecological risk prediction of heavy metal pollution in cultivated soils. Journal of Hazardous Materials, 476:135109.

Cao, L. (2016). Data science and analytics: a new era. International Journal of Data Science and Analytics, 1(1):1–2.

CETESB (2020). Relação de Áreas contaminadas e reabilitadas no estado de são paulo. Relatório Oficial - CETESB. Dados conforme CETESB, acessado em novembro de 2024.

CETESB (2022). Manual de gerenciamento de Áreas contaminadas. Technical report, Companhia Ambiental do Estado de São Paulo.

Few, S. (2009). Now You See It: Simple Visualization Techniques for Quantitative Analysis. Analytics Press.

Gouvêa Jr., J. C., Bertolo, R. A., and Hart, S. T. (2018). Intrusão de vapores do solo: breve histórico sobre desenvolvimento da tecnologia, cenário brasileiro e avanços recentes. Holos Environment, 18(2):240–270.

Horst, J., Welty, N., Schnobrich, M., Sinha, P., and Kulkarni, P. (2017). Digital innovation: The next disruptive but transformative remediation frontier. Groundwater Monitoring and Remediation, 37(3):19–27.

IPT (2011). Relatório técnico 123582-205/11 – final. avaliação de risco à saúde humana. gleba i da each/usp. Technical report, IPT.

Iskamto, D. (2023). Data science: Trends and its role in various fields. Adpebi International Journal of Multidisciplinary Sciences, 2(2):165–172.

Ismail, F. B., Xuan, A. T. Z., Rusilowati, U., and Williams, J. (2024). Exploring the frontier of data science: Innovations, challenges, and future directions. International Transactions on Education Technology (ITEE), 2(2):163–172.

Kumar, A., Kapil, A., and Ahlawat, D. (2023). Exploring the data science. In 2023 7th International Conference On Computing, Communication, Control And Automation (ICCUBEA), pages 1–7.

Moraes, S., Teixeira, C., and Maximiano, A. (2014). Guia de elaboração de planos de intervenção para o gerenciamento de áreas contaminadas. IPT - Instituto de Pesquisas Tecnológicas do Estado de São Paulo.

Palanisamy, G. (2016). Arm data file standards version 1.2. Technical report, DOE Office of Science Atmospheric Radiation Measurement (ARM) Program . . . .

Quille, R. V. E. and de Almeida, F. V. (2025). Evaluation of machine learning regression techniques for analyzing contaminated soils. In LatinX in AI Workshop at the International Conference on Machine Learning (ICML), Vancouver, Canada. Accepted for presentation.

Quille, R. V. E., de Almeida, F. V., Ohara, M. Y., Corrêa, P. L. P., de Freitas, L. G., Alves-Souza, S. N., de Almeida, J. R., Davis, M., and Prakash, G. (2023). Architecture of a data portal for publishing and delivering open data for atmospheric measurement. International Journal of Environmental Research and Public Health, 20(7).

Quille, R. V. E., dos Santos Luchetti Vieira, G., de Freitas, L. G., Corrêa, P. L. P., de Souza, S. N. A., Barbosa, A. M., and de Almeida, F. V. (2025). Dataset de poços de monitoramento de gases da usp leste. In Anais do Workshop de Computação Aplicada à Gestão do Meio Ambiente e Recursos Naturais (WCAMA 2025), Maceió/AL, Brasil. Sociedade Brasileira de Computação.

SERVMAR (2014). Relatório de investigação detalhada, avaliação de risco à saúde humana e plano de intervenção na ai-01 e investigação detalhada de gases – ma/12936/14/bls. Technical report, SERVMAR.

Thomas, J. J. and Cook, K. A. (2005). Visual Analytics: The Scope and Challenges, volume 3787 of Lecture Notes in Computer Science. Springer.

Tukey, J. W. (1977). Exploratory Data Analysis. Addison-Wesley.

USP (2004). Relatório ambiental preliminar: Campus usp leste. Technical report, USP.

WEBER (2019). Relatório técnico: Evolução do monitoramento de intrusão de gases e da operação do sistema de ventilação – 2º trimestre/2019. projeto: 311.1264.14/e21vmgs-vs.02 – usp leste. Technical report, WEBER AMBIENTAL.
Published
2025-07-20
QUILLE, Rosa Virginia Encinas; VIEIRA, Gabriela dos Santos Luchetti; FREITAS, Leandro Gomes de; CORRÊA, Pedro Luiz Pizzigatti; SOUZA, Solange Nice Alves de; BARBOSA, Alexandre Muselli; ALMEIDA, Felipe Valencia de. Specification of a Data Science Experiment for Contaminated Area Analysis. In: WORKSHOP ON COMPUTING APPLIED TO THE MANAGEMENT OF THE ENVIRONMENT AND NATURAL RESOURCES (WCAMA), 16. , 2025, Maceió/AL. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2025 . p. 316-325. ISSN 2595-6124. DOI: https://doi.org/10.5753/wcama.2025.9385.