CRF2: Context Reasoning Faithful Framework
Abstract
Large Language Models (LLMs) excel in tasks like text generation and question answering, but often rely on parametric knowledge, leading to hallucinations when required to reason strictly within a given context. In this paper, we propose a conceptual framework for context-faithful reasoning, ensuring responses are grounded solely in the provided information. We evaluate our approach using the RealTime QA dataset, which features open-domain, time-sensitive questions that require grounding in a specific document. Our experiments demonstrate that the proposed method outperforms existing prompt engineering techniques on abstention tasks, achieving an accuracy of 0.95 compared to baseline models.
References
Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. (2019). Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers), pages 4171–4186.
Dhuliawala, S., Komeili, M., Xu, J., Raileanu, R., Li, X., Celikyilmaz, A., and Weston, J. (2023). Chain-of-verification reduces hallucination in large language models.
Huang, G., Long, Y., Luo, C., Shen, J., and Sun, X. (2024). Prompting explicit and implicit knowledge for multi-hop question answering based on human reading process.
Huyen, C. (2025). AI Engineering: Building Applications with Foundation Models. O’Reilly.
Jurafsky, D. and Martin, J. H. (2025). Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition with Language Models. 3rd edition. Online manuscript released January 12, 2025.
Kasai, J., Sakaguchi, K., Takahashi, Y., Bras, R. L., Asai, A., Yu, X., Radev, D., Smith, N. A., Choi, Y., and Inui, K. (2024). Realtime qa: What’s the answer right now? Li, Y., Zhou, K., Qiao, Q., Nguyen, B., Wang, Q., and Li, Q. (2024). Investigating context-faithfulness in large language models: The roles of memory strength and evidence style.
Minaee, S., Mikolov, T., Nikzad, N., Chenaghlu, M., Socher, R., Amatriain, X., and Gao, J. (2024). Large language models: A survey.
Ming, Y., Purushwalkam, S., Pandit, S., Ke, Z., Nguyen, X.-P., Xiong, C., and Joty, S. (2025). Faitheval: Can your language model stay faithful to context, even if ”the moon is made of marshmallows”.
OpenAI (2022). Chatgpt: Optimizing language models for dialogue. Pan, Z., Luo, H., Li, M., and Liu, H. (2024). Chain-of-action: Faithful and multimodal question answering through large language models. arXiv preprint arXiv:2403.17359.
Sahoo, P., Singh, A. K., Saha, S., Jain, V., Mondal, S., and Chadha, A. (2024). A systematic survey of prompt engineering in large language models: Techniques and applications. arXiv preprint arXiv:2402.07927.
Saravia, E. (2022). Prompt Engineering Guide. [link].
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., and Polosukhin, I. (2023). Attention is all you need.
Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., and Zhou, D. (2023). Chain-of-thought prompting elicits reasoning in large language models.
Xiao, T. and Zhu, J. (2025). Foundations of large language models.
Xu, R., Qi, Z., Guo, Z., Wang, C., Wang, H., Zhang, Y., and Xu, W. (2024). Knowledge conflicts for llms: A survey.
Yao, S., Yu, D., Zhao, J., Shafran, I., Griffiths, T. L., Cao, Y., and Narasimhan, K. (2023). Tree of thoughts: Deliberate problem solving with large language models.
Yu, W., Zhang, H., Pan, X., Ma, K., Wang, H., and Yu, D. (2024). Chain-of-note: Enhancing robustness in retrieval-augmented language models.
Zhang, Y., Yang, J., Yuan, Y., and Yao, A. C.-C. (2024). Cumulative reasoning with large language models.
Zhou, W., Zhang, S., Poon, H., and Chen, M. (2023). Context-faithful prompting for large language models. In Bouamor, H., Pino, J., and Bali, K., editors, Findings of the Association for Computational Linguistics: EMNLP 2023, pages 14544–14556, Singapore. Association for Computational Linguistics.
