Uma Nova Arquitetura para a Implementação de um Serviço de Detecção de Falhas na Internet
Abstract
This work describes a novel SNMP-based architecture for deploying a failure detection service in the Internet. The architecture is based on the f dM IB through which the state of processes and hosts is monitored using heartbeat messages. The f dM IB encompasses all the taskes required for monitoring a given LAN, without requiring any additional components. Monitors at different LANs communicate across the Internet using Web Services. A prototype was implemented and evaluated considering the quality of service in terms of the failure detection time, communication cost and CPU usage.References
Aguilera, M. K., 0007, W. C., and Toueg, S. (1997). Heartbeat: A timeout-free failure detector for quiescent reliable communication. In WDAG.
Bertier, M., Marin, O., and Sens, P. (2003). Performance analysis of a hierarchical failure detector. In DSN.
Borran, F., Hutle, M., Santos, N., and Schiper, A. (2012). Quantitative analysis of consensus algorithms. IEEE Trans. Dependable Sec. Comput., 9(2).
Chandra, T. D. and Toueg, S. (1996). Unreliable failure detectors for reliable distributed systems. J. ACM, 43(2). Springer.
Charron-Bost, B., Pedone, F., and Schiper, A. (2010). Replication: Theory and Practice.
Chen, W., Toueg, S., and Aguilera, M. K. (2000). On the quality of service of failure detectors. In DSN Proceedings of the 2000 International Conference on Dependable Systems and Networks. IEEE Computer Society.
de Lima, W. Q., Alves, R. S., Vianna, R. L., Almeida, M. J. B., Tarouco, L. M. R., and Granville, L. Z. (2006). Evaluating the performance of snmp and web services notications. In NOMS.
Dialani, V., Miles, S., Moreau, L., Roure, D. D., and Luck, M. (2002). Transparent fault tolerance for web services based architectures. In Euro-Par. Springer.
dos Santos Sá, A. and de Araújo Macêdo, R. J. (2005). An adaptive failure detection approach for real-time distributed control systems over shared ethernet. In COBEM2005.
Felber, P., Défago, X., Guerraoui, R., and Oser, P. (1999). Failure detectors as rst class objects. In DOA.
Fischer, M. J., Lynch, N. A., and Paterson, M. S. (1985). Impossibility of distributed consensus with one faulty process. Journal of the ACM, 32(2).
Jacobson, V. (1988). Congestion avoidance and control. In Symposium Proceedings on Communications Architectures and Protocols, SIGCOMM 88.
Moraes, D. M. and Duarte Jr., E. P. (2011). A failure detection service for internet-based multi-as distributed systems. In ICPADS. IEEE.
Net-SNMP (2014). Net-Snmp: http://www.net-snmp.org/. Acessado em 18/02/2014.
Nunes, R. C. and Jansch-Pôrto, I. (2004). Qos of timeout-based self-tuned failure detectors: The effects of the communication delay predictor and the safety margin. In DSN.
Renesse, R. V., Minsky, Y., and Hayden, M. (1998). A gossip-style failure detection service. Technical report, Cornell University, Ithaca, NY, USA.
Sergent, N., Défago, X., and Schiper, A. (2001). Impact of a failure detection mechanism on the performance of consensus. In Proc. IEEE Pacic Rim Symp. on Dependable Computing (PRDC), Seoul, Korea.
Wiesmann, M., Urbán, P., and Défago, X. (2006). An snmp based failure detection service. In SRDS. IEEE Computer Society.
Bertier, M., Marin, O., and Sens, P. (2003). Performance analysis of a hierarchical failure detector. In DSN.
Borran, F., Hutle, M., Santos, N., and Schiper, A. (2012). Quantitative analysis of consensus algorithms. IEEE Trans. Dependable Sec. Comput., 9(2).
Chandra, T. D. and Toueg, S. (1996). Unreliable failure detectors for reliable distributed systems. J. ACM, 43(2). Springer.
Charron-Bost, B., Pedone, F., and Schiper, A. (2010). Replication: Theory and Practice.
Chen, W., Toueg, S., and Aguilera, M. K. (2000). On the quality of service of failure detectors. In DSN Proceedings of the 2000 International Conference on Dependable Systems and Networks. IEEE Computer Society.
de Lima, W. Q., Alves, R. S., Vianna, R. L., Almeida, M. J. B., Tarouco, L. M. R., and Granville, L. Z. (2006). Evaluating the performance of snmp and web services notications. In NOMS.
Dialani, V., Miles, S., Moreau, L., Roure, D. D., and Luck, M. (2002). Transparent fault tolerance for web services based architectures. In Euro-Par. Springer.
dos Santos Sá, A. and de Araújo Macêdo, R. J. (2005). An adaptive failure detection approach for real-time distributed control systems over shared ethernet. In COBEM2005.
Felber, P., Défago, X., Guerraoui, R., and Oser, P. (1999). Failure detectors as rst class objects. In DOA.
Fischer, M. J., Lynch, N. A., and Paterson, M. S. (1985). Impossibility of distributed consensus with one faulty process. Journal of the ACM, 32(2).
Jacobson, V. (1988). Congestion avoidance and control. In Symposium Proceedings on Communications Architectures and Protocols, SIGCOMM 88.
Moraes, D. M. and Duarte Jr., E. P. (2011). A failure detection service for internet-based multi-as distributed systems. In ICPADS. IEEE.
Net-SNMP (2014). Net-Snmp: http://www.net-snmp.org/. Acessado em 18/02/2014.
Nunes, R. C. and Jansch-Pôrto, I. (2004). Qos of timeout-based self-tuned failure detectors: The effects of the communication delay predictor and the safety margin. In DSN.
Renesse, R. V., Minsky, Y., and Hayden, M. (1998). A gossip-style failure detection service. Technical report, Cornell University, Ithaca, NY, USA.
Sergent, N., Défago, X., and Schiper, A. (2001). Impact of a failure detection mechanism on the performance of consensus. In Proc. IEEE Pacic Rim Symp. on Dependable Computing (PRDC), Seoul, Korea.
Wiesmann, M., Urbán, P., and Défago, X. (2006). An snmp based failure detection service. In SRDS. IEEE Computer Society.
Published
2014-07-28
How to Cite
TURCHETTI, Rogério; DUARTE JR., Elias.
Uma Nova Arquitetura para a Implementação de um Serviço de Detecção de Falhas na Internet. In: PRE-IETF WORKSHOP (WPIETF), 1. , 2014, Brasília.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2014
.
p. 17-31.
ISSN 2595-6388.
