Improving the Quality of Service of Fault Detection in Distributed Platforms under Adverse Network Conditions

  • Fernando Tarlá Cardoso Lemos Universidade de São Paulo
  • Liria Matsumoto Sato Universidade de São Paulo

Resumo


Fault detection is core functionality required by most fault tolerance strategies, but it often depends on reliable communication between computing nodes exchanging monitoring information. We present techniques to improve the robustness of fault detectors for distributed platforms in situations where network connectivity is affected by packet loss and delays. Similar network conditions can be found in computing grids connecting geographically distant resources. We present results from experimental tests conducted in a simulated environment. The results show significant improvement over traditional approaches.
Palavras-chave: Detectors, Heart beat, Monitoring, Computational modeling, Payloads, Software, Biomedical monitoring
Publicado
17/10/2012
LEMOS, Fernando Tarlá Cardoso; SATO, Liria Matsumoto. Improving the Quality of Service of Fault Detection in Distributed Platforms under Adverse Network Conditions. In: SIMPÓSIO EM SISTEMAS COMPUTACIONAIS DE ALTO DESEMPENHO (SSCAD), 13. , 2012, Petrópolis. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2012 . p. 171-178.