Beyond Single Models: Leveraging LLM Ensembles for Human Value Detection in Text

Diego Dimer Rodrigues; Mariana Recamonde-Mendoza; Viviane P. Moreira

doi:10.5753/stil.2024.245441

Diego Dimer Rodrigues UFRGS http://orcid.org/0000-0002-9544-117X
Mariana Recamonde-Mendoza UFRGS https://orcid.org/0000-0003-2800-1032
Viviane P. Moreira UFRGS https://orcid.org/0000-0003-4400-054X

DOI: https://doi.org/10.5753/stil.2024.245441

Resumo

Every text may reflect its writer’s opinions, and these opinions, especially in political contexts, are often tied to specific human values that they either attain or constrain. Identifying these values can provide policymakers with deeper insights into the underlying factors that influence public discourse and decision-making. While current large language models (LLMs) have shown promise across various tasks, no single model may generalize sufficiently to excel in tasks like human value detection. In this work, we utilize data from the Human Value Detection task at CLEF 2024 and propose leveraging multiple ensembles of LLMs to enhance the identification of human values in text. Our results found that the ensemble models achieved higher F1 scores than all baseline models, suggesting that combining multiple models can offer performance comparable to very large models but at much lower memory requirements.

Palavras-chave: LLM, LLM Ensembles, Human Value Detection, Text Classification, Natural Language Processing (NLP), Ensemble Learning, Text Analysis, Multi-Model Approaches

Referências

Ammanabrolu, P., Jiang, L., Sap, M., Hajishirzi, H., and Choi, Y. (2022). Aligning to social norms and values in interactive narratives. In Carpuat, M., de Marneffe, M.-C., and Meza Ruiz, I. V., editors, Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 5994–6017, Seattle, United States. Association for Computational Linguistics. [link] DOI: 10.18653/v1/2022.naacl-main.439

Bench-Capon, T. J. M. (2003). Persuasion in practical argument using value-based argumentation frameworks. Journal of Logic and Computation, 13(3):429–448. DOI: 10.1093/logcom/13.3.428

Dellaert, F., Polzin, T., and Waibel, A. (1996). Recognizing emotion in speech. In Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP ’96, volume 3, pages 1970–1973 vol.3. DOI: 10.1109/ICSLP.1996.608022

Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. [link] DOI: 10.48550/arXiv.1810.04805

He, P., Liu, X., Gao, J., and Chen, W. (2021). DEBERTA: Decodingenhanced BERT with disentangled attention. In International Conference on Learning Representations. [link] DOI: 10.48550/arXiv.2006.03654

Hoang, M., Bihorac, O. A., and Rouces, J. (2019). Aspect-based sentiment analysis using BERT. In Hartmann, M. and Plank, B., editors, Proceedings of the 22nd Nordic Conference on Computational Linguistics, pages 187–196, Turku, Finland. Linköping University Electronic Press. [link]

Jiang, D., Ren, X., and Lin, B. Y. (2023). Llm-blender: Ensembling large language models with pairwise ranking and generative fusion. [link] DOI: 10.48550/ARXIV.2306.02561

Kiesel, J., Alshomary, M., Handke, N., Cai, X., Wachsmuth, H., and Stein, B. (2022). Identifying the Human Values behind Arguments. In Muresan, S., Nakov, P., and Villavicencio, A., editors, 60th Annual Meeting of the Association for Computational Linguistics (ACL 2022), pages 4459–4471. Association for Computational Linguistics. DOI: 10.18653/v1/2022.acl-long.306

Kiesel, J., Çöltekin, Ç., Heinrich, M., Fröbe, M., Alshomary, M., De Longueville, B., Erjavec, T., Handke, N., Kopp, M., Ljubešić, N., Meden, K., Mirzhakhmedova, N., Morkevičius, V., Reitis-Münstermann, T., Scharfbillig, M., Stefanovitch, N.,Wachsmuth, H., Potthast, M., and Stein, B. (2024a). Overview of touché 2024: Argumentation systems. In Goharian, N., Tonellotto, N., He, Y., Lipani, A., McDonald, G., Macdonald, C., and Ounis, I., editors, Advances in Information Retrieval, pages 466–473, Cham. Springer Nature Switzerland.

Kiesel, J., Çöltekin, Ç., Heinrich, M., Fröbe, M., Alshomary, M., Longueville, B. D., Erjavec, T., Handke, N., Kopp, M., Ljubešić, N., Meden, K., Mirzakhmedova, N., Morkevičius, V., Reitis-Münstermann, T., Scharfbillig, M., Stefanovitch, N.,Wachsmuth, H., Potthast, M., and Stein, B. (2024b). Overview of Touché 2024: Argumentation Systems. In Goeuriot, L., Mulhem, P., Quénot, G., Schwab, D., Nunzio, G. M. D., Soulier, L., Galuscakova, P., Herrera, A. G. S., Faggioli, G., and Ferro, N., editors, Experimental IR Meets Multilinguality, Multimodality, and Interaction. 15th International Conference of the CLEF Association (CLEF 2024), Lecture Notes in Computer Science, Berlin Heidelberg New York. Springer.

Legkas, S., Christodoulou, C., Zidianakis, M., Koutrintzes, D., Petasis, G., and Dagioglou, M. (2024). Hierocles of alexandria at touché: Multi-task & multihead custom architecture with transformer-based models for human value detection. In Working Notes of the Conference and Labs of the Evaluation Forum (CLEF 2024), CEUR Workshop Proceedings, [link].

Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., and Stoyanov, V. (2019). RoBERTa: A robustly optimized BERT pretraining approach. DOI: 10.48550/arXiv.1907.11692

Quinlan, J. R. (1986). Induction of decision trees. Machine Learning, 1(1):81–106. DOI: 10.1007/bf00116251

Schwartz, S. H. (1994). Are there universal aspects in the structure and contents of human values? Journal of Social Issues, 50(4):19–45.

Schwartz, S. H., Cieciuch, J., Vecchione, M., Davidov, E., Fischer, R., Beierlein, C., Ramos, A., Verkasalo, M., Lönnqvist, J.-E., Demirutku, K., Dirilen-Gumus, O., and Konty, M. (2012). Refining the theory of basic individual values. Journal of Personality and Social Psychology, 103(4):663–688.

Sobhanam, H. and Prakash, J. (2023). Analysis of fine tuning the hyper parameters in RoBERTa model using genetic algorithm for text classification. International Journal of Information Technology, 15(7):3669–3677.

Sun, C., Qiu, X., Xu, Y., and Huang, X. (2019). How to fine-tune BERT for text classification? In Sun, M., Huang, X., Ji, H., Liu, Z., and Liu, Y., editors, Chinese Computational Linguistics, pages 194–206, Cham. Springer International Publishing. DOI: 10.1111/j.1540-4560.1994.tb01196.x

Tariq, Z., Shah, S. K., and Lee, Y. (2019). Speech emotion detection using iot based deep learning for health care. In 2019 IEEE International Conference on Big Data (Big Data), pages 4191–4196. DOI: 10.1037/a0029393

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., and Polosukhin, I. (2017). Attention is all you need. CoRR, abs/1706.03762. DOI: 10.1007/s41870-023-01395-4

Xian, G., Guo, Q., Zhao, Z., Luo, Y., and Mei, H. (2023). Short text classification model based on DeBERTa-DPCNN. In 2023 4th International Conference on Big Data, Artificial Intelligence and Internet of Things Engineering (ICBAIE), pages 56–59. DOI: 10.1109/ICBAIE59714.2023.10281320

Yeste, V., Ardanuy, M., and Rosso, P. (2024). Philo of alexandria at touché: A cascade model approach to human value detection. In Working Notes of the Conference and Labs of the Evaluation Forum (CLEF 2024), CEUR Workshop Proceedings, [link]. DOI: 10.1109/BigData47090.2019.9005638

Yunis, H. (2024). Arthur schopenhauer at touché 2024: Multi-lingual text classification using ensembles of large language models. In Working Notes of the Conference and Labs of the Evaluation Forum (CLEF 2024), CEUR Workshop Proceedings, [link]. DOI: 10.48550/arXiv.1706.03762