PeacemakerBot: A LLM-Powered Bot for Identifying and Reducing Signs of Incivility in GitHub Conversations
Resumo
Context: Developers’ interactions on collaborative software development platforms like GitHub are key to maintaining technical alignment and community engagement. However, uncivil behaviors such as disrespectful, sarcastic, or offensive comments can undermine these efforts, discouraging contributions and harming code quality. Goal: This study introduces PeacemakerBot, an automated moderation tool that detects and warns developers of incivility signs in GitHub conversations. Method: We leverage Large Language Models (LLMs) to analyze conversations, identify signals of incivility, and generate reformulation suggestions in real time. To evaluate it, we conducted a user study with six developers, followed by a survey based on the Technology Acceptance Model (TAM) to understand their perception of the tool’s usefulness. Results: Our results suggest that PeacemakerBot successfully identifies multiple types of incivility and promotes more constructive conversations. The moderation feedback loop allows users to revise flagged comments, enhancing awareness and reducing harmful language over time. Conclusion: Our tool fills a key gap in OSS by providing AI-assisted moderation to enhance the social climate and inclusiveness of developer interactions. Video link: https://doi.org/10.5281/zenodo.15485535
Referências
Victor R Basili-Gianluigi Caldiera and H Dieter Rombach. 1994. Goal question metric paradigm. Encyclopedia of software engineering 1, 528-532 (1994), 6.
Hui Hui Chen, Ming Che Lee, Yun LinWu, Jing Yao Qiu, Cheng He Lin, Hong Yong Tang, and Ching Hui Chen. 2012. An analysis of moodle in engineering education: The TAM perspective. In International Conference on Teaching, Assessment, and Learning for Engineering (TALE). IEEE, H1C–1.
Antônio Cruz Gomes, Eric Mesquita, Emanuel Ávila, Carlos Jefté, Arthur Mesquita, Lucas Sousa, Matheus Rabelo, Mairieli Wessel, and Anderson Uchôa. 2025. Replication package for the paper: "PeacemakerBot: A LLM-Powered Bot for Identifying and Reducing Signs of Incivility in GitHub Conversations". DOI: 10.5281/zenodo.15485535
Fred D Davis. 1989. Perceived usefulness, perceived ease of use, and user acceptance of information technology. MIS quarterly (1989), 319–340.
Ramtin Ehsani, Mia Mohammad Imran, Robert Zita, Kostadin Damevski, and Preetha Chatterjee. 2024. Incivility in Open Source Projects: A Comprehensive Annotated Dataset of Locked GitHub Issue Threads. In 21st MSR. IEEE, 1–5.
Ramtin Ehsani, Rezvaneh Rezapour, and Preetha Chatterjee. 2023. Exploring Moral Principles Exhibited in OSS: A Case Study on GitHub Heated Issues. In Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 2092–2096.
Isabella Ferreira, Bram Adams, and Jinghui Cheng. 2022. How heated is it? Understanding GitHub locked issues. In Proceedings of the 19th International Conference on Mining Software Repositories. 309–320.
Isabella Ferreira, Jinghui Cheng, and Bram Adams. 2021. The" shut the f** k up" phenomenon: Characterizing incivility in open source code review discussions. Proceedings of the ACM on Human-Computer Interaction 5, CSCW2 (2021), 1–35.
Isabella Ferreira, Jinghui Cheng, and Bram Adams. 2024. Incivility detection in open source code review and issue discussions. In Proceedings of the 21st International Conference on Mining Software Repositories (MSR). IEEE, 1–11. DOI: 10.1145/3620309.3620320
Daviti Gachechiladze, Filippo Lanubile, Nicole Novielli, and Alexander Serebrenik. 2017. Anger and its direction in collaborative software development. In 2017 IEEE/ACM 39th International Conference on Software Engineering: New Ideas and Emerging Technologies Results Track (ICSE-NIER). IEEE, 11–14.
Mia Mohammad Imran, Robert Zita, Rebekah Copeland, Preetha Chatterjee, Rahat Rizvi Rahman, and Kostadin Damevski. 2025. Understanding and Predicting Derailment in Toxic Conversations on GitHub. arXiv preprint arXiv:2503.02191 (2025).
Jigsaw and Google. 2017. Perspective API. [link]. accessed July 19, 2025.
R. Likert. 1932. A Technique for the Measurement of Attitudes. Number Nº 136-165 in A Technique for the Measurement of Attitudes. Archives of Psychology.
Courtney Miller, Sophie Cohen, Daniel Klug, Bogdan Vasilescu, and Christian KaUstner. 2022. " Did you miss my comment or what?" understanding toxicity in open source discussions. In Proceedings of the 44th International Conference on Software Engineering. 710–722.
Shyamal Mishra and Preetha Chatterjee. 2024. Exploring chatgpt for toxicity detection in github. In Proceedings of the 2024 ACM/IEEE 44th International Conference on Software Engineering: New Ideas and Emerging Results. 6–10.
Emerson Murphy-Hill, Jill Dicker, Delphine Carlson, Marian Harbach, Ambar Murillo, and Tao Zhou. 2024. Did Gerrit’s Respectful Code Review Reminders Reduce Comment Toxicity? In Equity, Diversity, and Inclusion in Software Engineering: Best Practices and Insights. Apress Berkeley, CA, 309–321.
Mario Patrício, Silas Eufrásio, Anderson Uchôa, Lincoln S. Rocha, Daniel Coutinho, Juliana Alves Pereira, Matheus Paixão, and Alessandro Garcia. 2025. Civility Not Found? Evaluating the Effectiveness of Small Language Models in Detecting Incivility in GitHub Conversations. In Proceedings of the XXXIX Brazilian Symposium on Software Engineering (SBES). SBC.
Huilian Sophie Qiu, Anna Lieb, Jennifer Chou, Megan Carneal, Jasmine Mok, Emily Amspoker, Bogdan Vasilescu, and Laura Dabbish. 2023. Climate coach: A dashboard for open-source maintainers to overview community dynamics. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems. 1–18.
Huilian Sophie Qiu, Bogdan Vasilescu, Christian Kästner, Carolyn Egelman, Ciera Jaspan, and Emerson Murphy-Hill. 2022. Detecting interpersonal conflict in issues and code review: cross pollinating open-and closed-source approaches. In Proceedings of the 2022 ACM/IEEE 44th International Conference on Software Engineering: Software Engineering in Society. 41–55.
Mohammad Masudur Rahman and Chanchal K Roy. 2014. An insight into the pull requests of github. In Proceedings of the 11th working conference on mining software repositories. 364–367.
Naveen Raman, Minxuan Cao, Yulia Tsvetkov, Christian Kästner, and Bogdan Vasilescu. 2020. Stress and burnout in open source: Toward finding, understanding, and mitigating unhealthy interactions. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering: New Ideas and Emerging Results. 57–60.
Jonan Richards and Mairieli Wessel. 2024. What You Need is what You Get: Theory of Mind for an LLM-Based Code Understanding Assistant. In 2024 IEEE International Conference on Software Maintenance and Evolution (ICSME). 666–671. DOI: 10.1109/ICSME58944.2024.00070
Steven I Ross, Fernando Martinez, Stephanie Houde, Michael Muller, and Justin D Weisz. 2023. The programmer’s assistant: Conversational interaction with a large language model for software development. In Proceedings of the 28th International Conference on Intelligent User Interfaces. 491–514.
Jaydeb Sarker, Asif Kamal Turzo, and Amiangshu Bosu. 2020. A benchmark study of the contemporary toxicity detectors on software engineering interactions. In 2020 27th Asia-Pacific Software Engineering Conference (APSEC). IEEE, 218–227.
Jaydeb Sarker, Asif Kamal Turzo, Ming Dong, and Amiangshu Bosu. 2023. Automated identification of toxic code reviews using toxicr. ACM Transactions on Software Engineering and Methodology 32, 5 (2023), 1–32.
Marcos Silva, Maykon Nunes, Carla Bezerra, Anderson Uchôa, and Mairieli Wessel. 2023. CIRef: A Tool for Visualizing the Historical Data of Software Refactorings in Java Projects. In Proceedings of the XXXVII Brazilian Symposium on Software Engineering. 174–179.
Igor Steinmacher, Tayana Uchoa Conte, Christoph Treude, and Marco Aurélio Gerosa. 2016. Overcoming open source project entry barriers with a portal for newcomers. In 38th International Conference on Software Engineering. 273–284.
Mark Turner, Barbara Kitchenham, Pearl Brereton, Stuart Charters, and David Budgen. 2010. Does the technology acceptance model predict actual use? A systematic literature review. Inf. Softw. Technol. 52, 5 (2010), 463–479.
