Open-World Text Classification by Combining Weak Models and Large Language Models

Daniel P. Zitei; Kenzo M. Sakiyama; Ricardo M. Marcacini

doi:10.5753/eniac.2024.245272

Daniel P. Zitei USP
Kenzo M. Sakiyama USP
Ricardo M. Marcacini USP

DOI: https://doi.org/10.5753/eniac.2024.245272

Resumo

Open-world classification presents significant challenges in text classification. Large Language Models (LLMs) have made advances in addressing these challenges by leveraging their contextual understanding to improve classification accuracy without requiring knowledge of the entire label space. However, current LLM-based approaches still encounter limitations, such as context size constraints and computational scalability. To overcome these issues, we draw inspiration from strategies like Retrieval-Augmented Generation (RAG) to adapt LLMs more effectively to open-world classification problems. Our proposed approach combines a Weak Classifier (WM) Model with LLMs. In this case, we use the WM to filter and identify the top-k most probable classes, and then use a LLM to make the final classification decision.

Palavras-chave: Weak Model, LLM, Classification

Referências

Abdin, M., Jacobs, S. A., Awan, A. A., Aneja, J., Awadallah, A., Awadalla, H., Bach, N., Bahree, A., Bakhtiari, A., Behl, H., et al. (2024). Phi-3 technical report: A highly capable language model locally on your phone. arXiv preprint arXiv:2404.14219.

Chang, Y., Wang, X., Wang, J., Wu, Y., Yang, L., Zhu, K., Chen, H., Yi, X., Wang, C., Wang, Y., et al. (2024). A survey on evaluation of large language models. ACM Transactions on Intelligent Systems and Technology, 15(3):1–45.

Chen, Z. and Liu, B. (2018). Open-world learning. In Lifelong Machine Learning, pages 77–89. Springer.

Dubey, A., Jauhri, A., Pandey, A., Kadian, A., Al-Dahle, A., Letman, A., Mathur, A., Schelten, A., Yang, A., Fan, A., et al. (2024). The llama 3 herd of models. arXiv preprint arXiv:2407.21783.

Fan, W., Ding, Y., Ning, L., Wang, S., Li, H., Yin, D., Chua, T.-S., and Li, Q. (2024). A survey on rag meeting llms: Towards retrieval-augmented large language models. In Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 6491–6501.

Geng, C., Huang, S.-j., and Chen, S. (2020). Recent advances in open set recognition: A survey. IEEE transactions on pattern analysis and machine intelligence, 43(10):3614–3631.

Kejriwal, M., Kildebeck, E., Steininger, R., and Shrivastava, A. (2024). Challenges, evaluation and opportunities for open-world learning. Nature Machine Intelligence, pages 1–9.

Krishnamurthy, A., Harris, K., Foster, D. J., Zhang, C., and Slivkins, A. (2024). Can large language models explore in-context? arXiv preprint arXiv:2403.15371.

Li, X., Jiang, J., Dharmani, R., Srinivasa, J., Liu, G., and Shang, J. (2024). Open-world multi-label text classification with extremely weak supervision. arXiv preprint arXiv:2407.05609.

Lin, Z., Yang, W., Wang, H., Chi, H., Lan, L., and Wang, J. (2024). Scaling few-shot learning for the open world. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 38, pages 13846–13854.

MistralAI and NVIDIA. Mistral nemo - technical report. Available at [link] (2024/08/22).

Parmar, J., Chouhan, S., Raychoudhury, V., and Rathore, S. (2023). Open-world machine learning: applications, challenges, and opportunities. ACM Computing Surveys, 55(10):1–37.

Petersen, F., Kuehne, H., Borgelt, C., and Deussen, O. (2022). Differentiable top-k classification learning. In International Conference on Machine Learning, pages 17656–17668. PMLR.

Reimers, N. and Gurevych, I. (2019). Sentence-bert: Sentence embeddings using siamese bert-networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics.

Team, G., Riviere, M., Pathak, S., Sessa, P. G., Hardin, C., Bhupatiraju, S., Hussenot, L., Mesnard, T., Shahriari, B., Ramé, A., et al. (2024). Gemma 2: Improving open language models at a practical size. arXiv preprint arXiv:2408.00118.

Yang, A., Yang, B., Hui, B., Zheng, B., Yu, B., Zhou, C., Li, C., Li, C., Liu, D., Huang, F., et al. (2024). Qwen2 technical report. arXiv preprint arXiv:2407.10671.

Zhang, Q., Shi, Z., Zhang, X., Chen, X., Fournier-Viger, P., and Pan, S. (2023). G2pxy: generative open-set node classification on graphs with proxy unknowns. In Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, pages 4576–4583.

Zhang, S., Roller, S., Goyal, N., Artetxe, M., Chen, M., Chen, S., Dewan, C., Diab, M., Li, X., Lin, X. V., et al. (2022). Opt: Open pre-trained transformer language models. arXiv preprint arXiv:2205.01068.

Zhang, Y., Wang, M., Ren, C., Li, Q., Tiwari, P., Wang, B., and Qin, J. (2024). Pushing the limit of llm capacity for text classification. arXiv preprint arXiv:2402.07470.

Open-World Text Classification by Combining Weak Models and Large Language Models

Resumo

Referências

Artigos mais lidos do(s) mesmo(s) autor(es)