Prediction of Air Pollutants Using Machine Learning Methods
Abstract
Exposure to fine particulate matter (PM2.5) poses a health risk in urban centers, demanding reliable forecasting systems. This paper proposes a predictive model based on machine learning applied to real-world data with 730,558 records collected by low-cost sensors in the city of Fortaleza, Brazil. We evaluated the performance of Random Forest, XGBoost, MLP, and SVR algorithms, following data preprocessing and calibration. The Random Forest model achieved the best performance, with an R2 = 0.988 and an RMSE = 0.125. SHAP analysis identified PM10 e O3 as the most relevant variables for prediction. The results suggest that artificial intelligence techniques can improve urban environmental monitoring and have strong potential to support data-driven e-Science platforms.
References
Asgari, M., Yang, W., and Farnaghi, M. (2022). Spatiotemporal data partitioning for distributed random forest algorithm: Air quality prediction using imbalanced big spatio-temporal data on spark distributed framework. Environmental Technology Innovation, 27:102776.
Biancofiore, F., Busilacchio, M., Verdecchia, M., Tomassetti, B., Aruffo, E., Bianco, S., Tommaso, S. D., Colangeli, C., Rosatelli, G., and Carlo, P. D. (2017). Recursive neural network model for analysis and forecast of pm10 and pm2.5. Atmospheric Pollution Research, 8:652–659.
Cengil, E. (2025). The power of machine learning methods and pso in air quality prediction. Applied Sciences, 15:2546.
Chojer, H., Branco, P., Martins, F., Alvim-Ferraz, M., and Sousa, S. (2020). Development of low-cost indoor air quality monitoring devices: Recent advancements. Science of The Total Environment, 727:138385.
DESA, U. (2023). The sustainable development goals report 2023: Special edition - july 2023. Technical report, Disponível em: [link]. Acesso em: 11 març 2025.
Deveer, L. and Minet, L. (2025). Real-time air quality prediction using traffic videos and machine learning. Transportation Research Part D, 142:104688.
Galli, L., Galvan, G., Sciandrone, M., Cantù, M., and Tomaselli, G. (2018). Machine learning methods for short-term bid forecasting in the renewable energy market: A case study in italy. Windy Energy, 21.
Goudarzi, G., Shirmardi, M., Naimabadi, A., Ghadiri, A., and Sajedifar, J. (2019). Chemical and organic characteristics of pm2.5 particles and their in-vitro cytotoxic effects on lung cells: The middle east dust storms in ahvaz, iran. Science of The Total Environment, 655:434–445.
Jairi, I., Ben-Othman, S., Canivet, L., and Zgaya-Biau, H. (2024). Enhancing air pollution prediction: A neural transfer learning approach across different air pollutants. Environmental Technology Innovation, 36.
Kawichai, S., Sripan, P., Rerkasem, A., Rerkasem, K., and Srisukkham, W. (2025). Long-term retrospective predicted concentration of pm2.5 in upper northern thailand using machine learning models. Toxics, 13:170.
Lakra, A. R., Gautam, S., Samuel, C., and Blaga, R. (2025). College bus commuter exposures to air pollutants in indian city: The urban-rural transportation exposure study. Geosystems and Geoenvironment, 4:100346.
Li, Y. and Sun, Y. (2021). Modeling and predicting city-level co2 emissions using open access data and machine learning. Environmental Science and Pollution Research, 28:19260–19271.
Rahman, M., Nayeem, E. H., Ahmed, S., Tanha, K. A., Sakib, S. A., Hafiz, K. M. M. U. ., and Babu, H. (2024). Airnet: predictive machine learning model for air quality forecasting using web interface. Environmental Systems Research, 13:1–19.
Zou, Y., Tian, H., Huang, Z., Liu, L., Xuan, Y., Dai, J., and Nie, L. (2025). Study on prediction models of oxygenated components content in biomass pyrolysis oil based on neural networks and random forests. Biomass and Bioenergy, 193:107601.
