Beyond Single Models: Leveraging LLM Ensembles for Human Value Detection in Text
Resumo
Every text may reflect its writer’s opinions, and these opinions, especially in political contexts, are often tied to specific human values that they either attain or constrain. Identifying these values can provide policymakers with deeper insights into the underlying factors that influence public discourse and decision-making. While current large language models (LLMs) have shown promise across various tasks, no single model may generalize sufficiently to excel in tasks like human value detection. In this work, we utilize data from the Human Value Detection task at CLEF 2024 and propose leveraging multiple ensembles of LLMs to enhance the identification of human values in text. Our results found that the ensemble models achieved higher F1 scores than all baseline models, suggesting that combining multiple models can offer performance comparable to very large models but at much lower memory requirements.
Referências
Bench-Capon, T. J. M. (2003). Persuasion in practical argument using value-based argumentation frameworks. Journal of Logic and Computation, 13(3):429–448. DOI: 10.1093/logcom/13.3.428
Dellaert, F., Polzin, T., and Waibel, A. (1996). Recognizing emotion in speech. In Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP ’96, volume 3, pages 1970–1973 vol.3. DOI: 10.1109/ICSLP.1996.608022
Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. [link] DOI: 10.48550/arXiv.1810.04805
He, P., Liu, X., Gao, J., and Chen, W. (2021). DEBERTA: Decodingenhanced BERT with disentangled attention. In International Conference on Learning Representations. [link] DOI: 10.48550/arXiv.2006.03654
Hoang, M., Bihorac, O. A., and Rouces, J. (2019). Aspect-based sentiment analysis using BERT. In Hartmann, M. and Plank, B., editors, Proceedings of the 22nd Nordic Conference on Computational Linguistics, pages 187–196, Turku, Finland. Linköping University Electronic Press. [link]
Jiang, D., Ren, X., and Lin, B. Y. (2023). Llm-blender: Ensembling large language models with pairwise ranking and generative fusion. [link] DOI: 10.48550/ARXIV.2306.02561
Kiesel, J., Alshomary, M., Handke, N., Cai, X., Wachsmuth, H., and Stein, B. (2022). Identifying the Human Values behind Arguments. In Muresan, S., Nakov, P., and Villavicencio, A., editors, 60th Annual Meeting of the Association for Computational Linguistics (ACL 2022), pages 4459–4471. Association for Computational Linguistics. DOI: 10.18653/v1/2022.acl-long.306
Kiesel, J., Çöltekin, Ç., Heinrich, M., Fröbe, M., Alshomary, M., De Longueville, B., Erjavec, T., Handke, N., Kopp, M., Ljubešić, N., Meden, K., Mirzhakhmedova, N., Morkevičius, V., Reitis-Münstermann, T., Scharfbillig, M., Stefanovitch, N.,Wachsmuth, H., Potthast, M., and Stein, B. (2024a). Overview of touché 2024: Argumentation systems. In Goharian, N., Tonellotto, N., He, Y., Lipani, A., McDonald, G., Macdonald, C., and Ounis, I., editors, Advances in Information Retrieval, pages 466–473, Cham. Springer Nature Switzerland.
Kiesel, J., Çöltekin, Ç., Heinrich, M., Fröbe, M., Alshomary, M., Longueville, B. D., Erjavec, T., Handke, N., Kopp, M., Ljubešić, N., Meden, K., Mirzakhmedova, N., Morkevičius, V., Reitis-Münstermann, T., Scharfbillig, M., Stefanovitch, N.,Wachsmuth, H., Potthast, M., and Stein, B. (2024b). Overview of Touché 2024: Argumentation Systems. In Goeuriot, L., Mulhem, P., Quénot, G., Schwab, D., Nunzio, G. M. D., Soulier, L., Galuscakova, P., Herrera, A. G. S., Faggioli, G., and Ferro, N., editors, Experimental IR Meets Multilinguality, Multimodality, and Interaction. 15th International Conference of the CLEF Association (CLEF 2024), Lecture Notes in Computer Science, Berlin Heidelberg New York. Springer.
Legkas, S., Christodoulou, C., Zidianakis, M., Koutrintzes, D., Petasis, G., and Dagioglou, M. (2024). Hierocles of alexandria at touché: Multi-task & multihead custom architecture with transformer-based models for human value detection. In Working Notes of the Conference and Labs of the Evaluation Forum (CLEF 2024), CEUR Workshop Proceedings, [link].
Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., and Stoyanov, V. (2019). RoBERTa: A robustly optimized BERT pretraining approach. DOI: 10.48550/arXiv.1907.11692
Quinlan, J. R. (1986). Induction of decision trees. Machine Learning, 1(1):81–106. DOI: 10.1007/bf00116251
Schwartz, S. H. (1994). Are there universal aspects in the structure and contents of human values? Journal of Social Issues, 50(4):19–45.
Schwartz, S. H., Cieciuch, J., Vecchione, M., Davidov, E., Fischer, R., Beierlein, C., Ramos, A., Verkasalo, M., Lönnqvist, J.-E., Demirutku, K., Dirilen-Gumus, O., and Konty, M. (2012). Refining the theory of basic individual values. Journal of Personality and Social Psychology, 103(4):663–688.
Sobhanam, H. and Prakash, J. (2023). Analysis of fine tuning the hyper parameters in RoBERTa model using genetic algorithm for text classification. International Journal of Information Technology, 15(7):3669–3677.
Sun, C., Qiu, X., Xu, Y., and Huang, X. (2019). How to fine-tune BERT for text classification? In Sun, M., Huang, X., Ji, H., Liu, Z., and Liu, Y., editors, Chinese Computational Linguistics, pages 194–206, Cham. Springer International Publishing. DOI: 10.1111/j.1540-4560.1994.tb01196.x
Tariq, Z., Shah, S. K., and Lee, Y. (2019). Speech emotion detection using iot based deep learning for health care. In 2019 IEEE International Conference on Big Data (Big Data), pages 4191–4196. DOI: 10.1037/a0029393
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., and Polosukhin, I. (2017). Attention is all you need. CoRR, abs/1706.03762. DOI: 10.1007/s41870-023-01395-4
Xian, G., Guo, Q., Zhao, Z., Luo, Y., and Mei, H. (2023). Short text classification model based on DeBERTa-DPCNN. In 2023 4th International Conference on Big Data, Artificial Intelligence and Internet of Things Engineering (ICBAIE), pages 56–59. DOI: 10.1109/ICBAIE59714.2023.10281320
Yeste, V., Ardanuy, M., and Rosso, P. (2024). Philo of alexandria at touché: A cascade model approach to human value detection. In Working Notes of the Conference and Labs of the Evaluation Forum (CLEF 2024), CEUR Workshop Proceedings, [link]. DOI: 10.1109/BigData47090.2019.9005638
Yunis, H. (2024). Arthur schopenhauer at touché 2024: Multi-lingual text classification using ensembles of large language models. In Working Notes of the Conference and Labs of the Evaluation Forum (CLEF 2024), CEUR Workshop Proceedings, [link]. DOI: 10.48550/arXiv.1706.03762