Uma Estratégia de Testes Logarítmica para o Algoritmo Hi-ADSD

  • Vinicius K. Ruoso UFPR
  • Luis Carlos E. Bona UFPR
  • Elias P. Duarte Jr. UFPR

Abstract


The Hierarchical Adaptive Distributed System-level Diagnosis (Hi- ADSD) is a distributed diagnosis algorithm that creates a virtual topology based on a hypercube. A hypercube is a scalable structure by definition, presenting important topological features like: symmetry, logarithmic diameter and good fault tolerance properties. The latency of the algorithm executed on a system with N nodes is at most log2^2 N However, the number of executed tests in the worst case is quadratic. This work presents a new testing strategy for the Hi-ADSD algorithm, which guarantees that at most N logN tests are executed at each logN testing rounds. The maximum latency is mantained in log2^2 N testing rounds. The algorithm is adapted to the new testing strategy. Furthermore, the use of timestamps is adopted to allow each node to retrieve diagnosis information from several other nodes, thus reducing the average latency. The new algorithm is specified, formal proofs are given, and experimental results obtained by simulations are presented and compared with the Hi-ADSD algorithm.

References

Bona, L., Fonseca, K., and Duarte Jr., E. P. (2008). Hyperbone: A scalable overlay network based on a virtual hypercube. In Network Operations and Management Symposium, 2008. NOMS 2008. IEEE, pages 1025–1030.

Duarte Jr., E., Ziwich, R., and Albini, L. (2011). A survey of comparison-based system-level diagnosis. ACM Computing Surveys.

Duarte Jr., E. P., Albini, L., Brawerman, A., and Guedes, A. (2009). A hierarquical distributed fault diagnosis algorithm based on clusters with detours. In Network Operations and Management Symposium, 2009. LANOMS 2009. Latin American, pages 1–6.

Duarte Jr., E. P. and Nanya, T. (1998). A Hierarquical Adaptive Distributed System-Level Diagnosis Algotithm. IEEE Transactions on Computers., 47(1):34–45.

Hakimi, S. L. and Nakajima, K. (1984). On Adaptive System Diagnosis. IEEE Transactions on Computers., C-33(3):234–240.

Hosseini, S. H., Kuhl, J. G., and Reddy, S. M. (1984). A Diagnosis Algorithm for Distributed Computing Systems with Dynamic Failure and Repair. IEEE Transactions on Computers., C-33(3):223–233.

LaForge, L., Korver, K., and Fadali, M. (2003). What designers of bus and network architectures should know about hypercubes. Computers, IEEE Transactions on, 52(4):525–544.

MacDougall, M. (1987). Simulating computer systems: techniques and tools. MIT Press, Cambridge, MA, USA.

Masson, G., Blough, D., and Sullivan, G. (1996). Fault-tolerant computer system design. chapter System diagnosis, pages 478–536. Prentice-Hall, Inc., Upper Saddle River, NJ, USA.

Preparata, F., Metze, G., and Chien, R. T. (1967). On the Connection Assignment Problem of Diagnosable Systems. IEEE Transactions on Computers., 16:848–854.
Published
2014-05-05
RUOSO, Vinicius K.; BONA, Luis Carlos E.; DUARTE JR., Elias P.. Uma Estratégia de Testes Logarítmica para o Algoritmo Hi-ADSD. In: FAULT TOLERANCE WORKSHOP (WTF), 15. , 2014, Florianópolis/SC. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2014 . p. 31-44. ISSN 2595-2684. DOI: https://doi.org/10.5753/wtf.2014.22945.