A Comparative Analysis of Graph Construction Methods for Fairness in Graph-Based Semi-Supervised Learning

Willian Dihanster Gomes de Oliveira; Lilian Berton

doi:10.5753/eniac.2025.14296

Willian Dihanster Gomes de Oliveira UNIFESP
Lilian Berton UNIFESP

DOI: https://doi.org/10.5753/eniac.2025.14296

Resumo

Graph-based semi-supervised learning (GSSL) relies on the quality of the input graph, often constructed from tabular data. We compare KNN, Mutual KNN (M-KNN), and Sequential KNN (S-KNN) on fairness benchmark datasets. While KNN achieves the highest AUC-ROC, M-KNN and S-KNN, which promote more balanced degree distributions, yield fairer outcomes as measured by Disparate Impact and Average Absolute Odds Difference, with only a small loss in AUC-ROC. Additionally, our analysis reveals that degree disparities between protected groups are linked to unfair outcomes, and that more balanced connectivity can mitigate these effects, highlighting the importance of fairness-aware graph construction in GSSL.

Referências

Angwin, J., Larson, J., Mattu, S., and Kirchner, L. (2016). How we analyzed the compas recidivism algorithm.

Becker, B. and Kohavi, R. (1996). Adult. UCI Machine Learning Repository. DOI: 10.24432/C5XW20.

Bellamy, R. K. E., Dey, K., Hind, M., Hoffman, S. C., Houde, S., Kannan, K., Lohia, P., Martino, J., Mehta, S., Mojsilovic, A., Nagar, S., Ramamurthy, K. N., Richards, J., Saha, D., Sattigeri, P., Singh, M., Varshney, K. R., and Zhang, Y. (2018). AI Fairness 360: An extensible toolkit for detecting, understanding, and mitigating unwanted algorithmic bias.

Berton, L., de Andrade Lopes, A., and Vega-Oliveros, D. A. (2018). A comparison of graph construction methods for semi-supervised learning. In 2018 International Joint Conference on Neural Networks (IJCNN), pages 1–8. IEEE.

Bose, A. and Hamilton, W. (2019). Compositional fairness constraints for graph embeddings. In International conference on machine learning, pages 715–724. PMLR.

Chen, A., Rossi, R. A., Park, N., Trivedi, P., Wang, Y., Yu, T., Kim, S., Dernoncourt, F., and Ahmed, N. K. (2024). Fairness-aware graph neural networks: A survey. ACM Transactions on Knowledge Discovery from Data, 18(6):1–23.

de Oliveira, W. D. G. and Berton, L. (2023). A systematic review for class-imbalance in semi-supervised learning. Artificial Intelligence Review, 56(Suppl 2):2349–2382.

de Oliveira, W. D. G. and Berton, L. (2024). A comparative study of pre-processing algorithms for fair classification in few labeled data context. AI and Ethics, pages 1–21.

de Oliveira, W. D. G., Penatti, O. A., and Berton, L. (2020). A comparison of graph-based semi-supervised learning for data augmentation. In 2020 33rd SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI), pages 264–271. IEEE.

Dong, Y., Ma, J., Wang, S., Chen, C., and Li, J. (2023). Fairness in graph mining: A survey. IEEE Transactions on Knowledge and Data Engineering, 35(10):10583–10602.

Dong, Y., Wang, S., Wang, Y., Derr, T., and Li, J. (2022). On structural explanation of bias in graph neural networks. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 316–326.

Hofmann, H. (1994). Statlog (German Credit Data). UCI Machine Learning Repository. DOI: 10.24432/C5NC77.

Jebara, T., Wang, J., and Chang, S.-F. (2009). Graph construction and b-matching for semi-supervised learning. In Proceedings of the 26th annual international conference on machine learning, pages 441–448.

Kose, O. D. and Shen, Y. (2022). Fair node representation learning via adaptive data augmentation. arXiv preprint arXiv:2201.08549.

Liao, J. C. and Li, C.-T. (2023). Tabgsl: Graph structure learning for tabular data prediction. arXiv preprint arXiv:2305.15843.

Nadkarni, P. (2016). Chapter 10 - core technologies: Data mining and “big data”. In Nadkarni, P., editor, Clinical Research Computing, pages 187–204. Academic Press.

Ozaki, K., Shimbo, M., Komachi, M., and Matsumoto, Y. (2011a). Using the mutual k-nearest neighbor graphs for semi-supervised classification on natural language data. In Proceedings of the fifteenth conference on computational natural language learning, pages 154–162.

Ozaki, K., Shimbo, M., Komachi, M., and Matsumoto, Y. (2011b). Using the mutual k-nearest neighbor graphs for semi-supervised classification on natural language data. In Goldwater, S. and Manning, C., editors, Proceedings of the Fifteenth Conference on Computational Natural Language Learning, pages 154–162, Portland, Oregon, USA. Association for Computational Linguistics.

Shi, Y., Ke, G., Soukhavong, D., Lamb, J., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q., Liu, T.-Y., Titov, N., and Cortes, D. (2025). lightgbm: Light Gradient Boosting Machine. R package version 4.6.0.99.

Song, Z., Yang, X., Xu, Z., and King, I. (2022). Graph-based semi-supervised learning: A comprehensive review. IEEE Transactions on Neural Networks and Learning Systems, 34(11):8174–8194.

Spinelli, I., Scardapane, S., Hussain, A., and Uncini, A. (2021). Fairdrop: Biased edge dropout for enhancing fairness in graph representation learning. IEEE Transactions on Artificial Intelligence, 3(3):344–354.

Vega-Oliveros, D. A., Berton, L., Eberle, A. M., de Andrade Lopes, A., and Zhao, L. (2014). Regular graph construction for semi-supervised learning. Journal of physics: Conference series, 490(1):012022.

Wang, Y., Zhao, Y., Dong, Y., Chen, H., Li, J., and Derr, T. (2022). Improving fairness in graph neural networks via mitigating sensitive attribute leakage. In Proceedings of the 28th ACM SIGKDD conference on knowledge discovery and data mining, pages 1938–1948.

Wang, Z., Qiu, M., Chen, M., Salem, M. B., Yao, X., and Zhang, W. (2024). Toward fair graph neural networks via real counterfactual samples. Knowledge and Information Systems, 66(11):6617–6641.

Wu, Z., Pan, S., Chen, F., Long, G., Zhang, C., and Yu, P. S. (2020). A comprehensive survey on graph neural networks. IEEE Transactions on Neural Networks and Learning Systems.

Zhou, D., Bousquet, O., Lal, T., Weston, J., and Schölkopf, B. (2003). Learning with local and global consistency. Advances in neural information processing systems, 16.

Zhu, X., Ghahramani, Z., and Lafferty, J. (2002). Learning from labeled and unlabeled data with label propagation. In CMU CALD Tech Report.

Zhu, X., Ghahramani, Z., and Lafferty, J. D. (2003). Semi-supervised learning using gaussian fields and harmonic functions. ICML, 3(2):912–919.

A Comparative Analysis of Graph Construction Methods for Fairness in Graph-Based Semi-Supervised Learning

Resumo

Referências

Artigos mais lidos do(s) mesmo(s) autor(es)