Adaptive Failure Detection in Logical Time: The Eventual Relative-Speed Model over Causal Blocks

Raimundo José de Araújo Macêdo

doi:10.5753/wtf.2026.23211

Raimundo José de Araújo Macêdo UFBA https://orcid.org/0000-0002-7672-2208

DOI: https://doi.org/10.5753/wtf.2026.23211

Resumo

In this paper, we introduce the Eventual Relative-Speed (ERS) system model, a logical-time abstraction for distributed systems that lack an absolute physical time reference. In the ERS model, the relative progress of processes is governed by a finite parameter Γ, which characterizes the logical-time divergence among correct processes and holds only after an unknown Global Stabilization Time (GST). We propose a unified Causal Progress and Failure Detection algorithm built atop the Causal Blocks framework. By employing an adaptive suspicion threshold η_i, our algorithm ensures causal consistency and satisfies the properties of an Eventually Perfect (⋄P) failure detector. Because it operates entirely in logical time, the system is intrinsically adaptive to fluctuations in relative speed— such as slowing or accelerating—without requiring the dynamic calibration of physical timeouts. We formally prove that the detector converges to Eventual Strong Accuracy as it adapts to transient delays, eventually stabilizing within the Γ bound. This provides a robust foundation for distributed computing in environments subject to arbitrary global load fluctuations.

Referências

Aguilera, M. K., Chen, W., and Toueg, S. (1997). Heartbeat: A timeout-free failure detector for quiescent reliable communication. In Proceedings of the 11th International Workshop on Distributed Algorithms (WDAG ’97), pages 126–140. Springer-Verlag.

Chandra, T. D. and Toueg, S. (1996). Unreliable failure detectors for reliable distributed systems. Journal of the ACM, 43(2):225–267.

Chen, W., Toueg, S., and Aguilera, M. K. (2002). On the quality of service of failure detectors. IEEE Trans. Comput., 51(1):13–32.

de Araújo Macêdo, R. J. (1994). Fault-tolerant group communication protocols for asynchronous systems. Phd thesis, University of Newcastle upon Tyne.

de Araújo Macêdo, R. J. and e Lima, F. R. L. (2004). Improving the quality of service of failure detectors with snmp and artificial neural networks. In Anais do XXII Simpósio Brasileiro de Redes de Computadores (SBRC 2004), Fortaleza, CE, Brazil. SBC.

de Araújo Macêdo, R. J., Ezhilchelvan, P. D., and Shrivastava, S. K. (1995). Flow control schemes for a fault-tolerant multicast protocol. In Proceedings of the 1995 Pacific Rim International Symposium on Fault-Tolerant Systems (PRFTS ’95), pages 15–21, Newport Beach, California, USA. IEEE Computer Society.

de Araújo Macêdo, R. J. (2000). Failure detection in asynchronous distributed systems. In Anais do II Workshop de Testes e Tolerância a Falhas, pages 76–81. SBC.

de Araújo Macêdo, R. J. (2025). Synchronous partitioning and its effects on fault tolerance. In Anais do XXVI Workshop de Testes e Tolerância a Falhas, pages 43–56, Porto Alegre, RS, Brasil. SBC.

de Araújo Macêdo, R. J. and Freitas, A. E. (2009). A generic group communication approach for hybrid distributed systems. In 9th IFIP International Conference on Distributed Applications and Interoperable Systems (LNCS, Springer).

Dwork, C., Lynch, N., and Stockmeyer, L. (1988). Consensus in the presence of partial synchrony. Journal of the ACM, 35(2):288–323.

Ezhilchelvan, P. D., Macêdo, R. A., and Shrivastava, S. K. (1995). Newtop: a fault-tolerant group communication protocol. In Proceedings of the 15th International Conference on Distributed Computing Systems, ICDCS ’95, USA. IEEE Computer Society.

Falai, L. and Bondavalli, A. (2005). Experimental evaluation of the qos of failure detectors on wide area network. In 2005 International Conference on Dependable Systems and Networks (DSN’05), pages 624–633.

Fischer, M. J., Lynch, N. A., and Paterson, M. S. (1985). Impossibility of distributed consensus with one faulty process. Journal of the ACM, 32(2):374–382.

Freiling, F. C., Guerraoui, R., and Kuznetsov, P. (2011). The failure detector abstraction. ACM Comput. Surv., 43(2).

Kenwright, L., Roop, P., Allen, N., Cascaval, C., and Malik, A. (2025). Timetide: A programming model for logically synchronous distributed systems. ACM Trans. Embed. Comput. Syst., 24(5s).

Lamport, L. (1978). Time, clocks, and the ordering of events in a distributed system. Communications of the ACM, 21(7):558–565.

Lamport, L., Shostak, R., and Pease, M. (1982). The byzantine generals problem. ACM Transactions on Programming Languages and Systems, 4(3):382–401.

Macêdo, R. A., Ezhilchelvan, P., and Shrivastava, S. K. (1993). Modeling group communication using causal blocks. In Proceedings of the 5th European Workshop on Dependable Computing, Lisbon, Portugal.

Raynal, M. (2005). A short introduction to failure detectors for asynchronous distributed systems. SIGACT News, 36(1):53–70.

Robinson, P. and Schmid, U. (2011). The asynchronous bounded-cycle model. Theoretical Computer Science, 412(40):5580–5601.

Sá, A. S. and de Araújo Macêdo, R. J. (2010). Qos self-configuring failure detectors for distributed systems. In Distributed Applications and Interoperable Systems (DAIS 2010), volume 6115 of LNCC, pages 126–140, Amsterdam, Netherlands. Springer.

Schneider, F. B. (1990). Implementing fault-tolerant services using the state machine approach: A tutorial. ACM Computing Surveys, 22(4):299–319.

Spirakis, P. and Tampakas, B. (1988). Efficient distributed algorithms by using the archimedean time assumption. In Cori, R. and Wirsing, M., editors, STACS 88, pages 248–263, Berlin, Heidelberg. Springer Berlin Heidelberg.

Veríssimo, P. and Casimiro, A. (2002). The timely computing base model and architecture. IEEE Trans. Comput., 51(8):916–930.

Widder, J. and Schmid, U. (2009). The Theta-model: achieving synchrony without clocks. Distributed Computing, 22(1):29–47.

Zhou, Y., Peng, S., Lyu, H., Tong, F., Huang, C., and Niu, J. (2025). Klotski: Towards consensus enabled collaborative vehicles in intelligent transportation. IEEE Transactions on Vehicular Technology, 74(12):18620–18634.