A TLA+ framework for the specification and verification of distributed algorithms using the Heard-Of model.

Abstract


Distributed fault-tolerant algorithms are essential in many critical systems, making their formal verification indispensable. The TLA+ has been increasingly adopted for this purpose, despite the challenges faced by developers, such as the inherent complexity of using formalisms and the choice of the appropriate abstraction level: too detailed models suffer from state space explosion, while simplified models can compromise the validity of formal verification. The Heard-Of model is an important abstraction for specifying and verifying round-based distributed algorithms, allowing both integrated reasoning about synchrony and failures and helping to mitigate state space explosion. The Heard-Of model represents the sets of messages that arrive at each process in a round, and not the order between them, leading to a significant reduction in the state space. Given the importance of the TLA+ and the Heard-Of model, in this paper we propose a framework that provides the main abstractions of the Heard-Of model in TLA+. This framework is exemplified and evaluated using the Uniform Voting algorithm, for which properties and their verifications through model checking are also discussed, evaluating the state space size and time taken for its construction.
Keywords: Distributed Algorithms, Modeling for Fault Tolerance, Fault Tolerance, Experimental Validation and Simulation

References

Agência Nacional de Vigilância Sanitária (ANVISA) (2020). Guia para Validação de Sistemas Computadorizados - Guia nº 33/2020. Agência Nacional de Vigilância Sanitária.

Barbosa, R., Fonseca, A., and Araujo, F. (2021). Reductions and abstractions for formal verification of distributed round-based algorithms. Software Quality Journal, 29(3): 705–731.

Biely, M., Widder, J., Charron-Bost, B., Gaillard, A., Hutle, M., and Schiper, A. (2007). Tolerating corrupted communication. In Proceedings of the twenty-sixth annual ACM symposium on Principles of distributed computing, pages 244–253.

Bögli, R., Lerena, L., Tsigkanos, C., and Kehrer, T. (2025). A systematic literature review on a decade of industrial TLA+ practice. In International Conference on Integrated Formal Methods, pages 24–34. Springer.

Castro, M., Liskov, B., et al. (1999). Practical Byzantine fault tolerance. In OSDI, volume 99, pages 173–186.

Chaouch-Saad, M., Charron-Bost, B., and Merz, S. (2009). A reduction theorem for the verification of round-based distributed algorithms. In International Workshop on Reachability Problems, pages 93–106. Springer.

Charron-Bost, B. and Schiper, A. (2009). The Heard-Of model: computing in distributed systems with benign faults. Distributed Computing, 22(1): 49–71.

Cimatti, A., Clarke, E., Giunchiglia, E., Giunchiglia, F., Pistore, M., Roveri, M., Sebastiani, R., and Tacchella, A. (2002). NuSMV 2: An open-source tool for symbolic model checking. In Computer Aided Verification: 14th International Conference, CAV 2002 Copenhagen, Denmark, July 27–31, 2002 Proceedings 14, pages 359–364. Springer.

Hutle, M. and Schiper, A. (2007). Communication predicates: A high-level abstraction for coping with transient and dynamic faults. In 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN’07), pages 92–101. IEEE.

Lamport, L. (2002). Specifying systems: The TLA+ language and tools for hardware and software engineers.

Lamport, L., Shostak, R. E., and Pease, M. C. (1982). The Byzantine generals problem. ACM Transactions on Programming Languages and Systems, 4: 382–401.

Newcombe, C., Rath, T., Zhang, F., Munteanu, B., Brooker, M., and Deardeuff, M. (2014). Use of formal methods at amazon web services. See [link], page 16.

RTCA (2011). DO-178C: Software Considerations in Airborne Systems and Equipment Certification. Supersedes DO-178B (1992).

Tsuchiya, T. and Schiper, A. (2007). Model checking of consensus algorithms. In 2007 26th IEEE International Symposium on Reliable Distributed Systems (SRDS 2007), pages 137–148. IEEE.

Van Steen, M. (2002). Distributed systems principles and paradigms. Network, 2(28): 1.

Zhai, S., Li, X., and Ge, N. (2023). HOME: Heard-Of based formal modeling and verification environment for consensus protocols. In 2023 IEEE/ACM 45th International Conference on Software Engineering: Companion Proceedings (ICSE-Companion), pages 16–20. IEEE.
Published
2025-05-19
PAZIN, Yuri de Souza; DOTTI, Fernando Luis. A TLA+ framework for the specification and verification of distributed algorithms using the Heard-Of model.. In: FAULT TOLERANCE WORKSHOP (WTF), 25. , 2025, Natal/RN. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2025 . p. 126-139. ISSN 2595-2684. DOI: https://doi.org/10.5753/wtf.2025.9510.