Desenvolvimento de um Detector de Defeitos para Sistemas Distribuídos baseado em Redes Neurais Artificiais
Abstract
Failure detectors are mechanisms used to implement fault tolerant distributed systems, i.e., systems that can provide continuous services even when failures may occur. In the so-called asynchronous distributed systems, without bounded and known time delay for message transfer and local processing, perfect - or reliable - failure detectors can not be implemented. In such systems, failures can only be suspected. In order to avoid false suspicions, adaptive timeouts can be calculated based on, for instance, the communication network load. In [2] we discussed the use of such adaptive timeouts to implement a mechanism called CTI (Connectivity Time Indicator) which in turn was used to implement a failure detector of the class <>S [1]. In another paper [11], we showed how the CTI could be used to control the levels of quality-of-service (QoS) of communication time between distributed processes at the application level. In this paper we show an implementation of the CTI mechanism through artificial neural network which interact with SNMP (Simple Network Management Protocol)[5] agents and MIB (Management Information Base)[6] in order to predict communication times based on the dynamic operational conditions of an IP (Internet Protocol) network.
References
Macêdo R., Failure Detection in Asynchronous Distributed Systems. Proc. of II Workshop on Tests and Fault-Tolerance, pp. 76-81, July 2000, Curitiba, Brazil.
Rietman, E.A. and Frye, R.C. Neural Control of a Nonlinear System with Inherent Time Delays. Conference on Analysis of Neural Network Applications, pp.140-145, 1991.
Chow, M. and Yee, S.O. Real Time Application of Artificial Neural Networks for Incipient Fault Detection of Induction Machines. Proceedings of the 3rd International Conference on Industrial and Engineering Applications of Artificial Intelligence and Expert Systems (IEA/AIE 90), pp. 1030-36, Charleston, USA, July 1990.
Case, J., Fedor, M., Schoffstall, M., Davin, J. Connected: An Internet Encyclopedia – A Simple Network Management Protocol at www address http://deese.univ-lemans.fr:8003/connected/RFC/1157/index.html
McCloghrie, K., Rose, M. Connected: An Internet Encyclopedia – Management Information Base for Network Management of TCP/IP-based internets: MIB-II at www address http://deese.univ-lemans.fr:8003/connected/RFC/1213/index.html
Russell, S. and Norvig, P. Artificial Intelligence - A Modern Approach. 1st ed. New Jersey, Prentice-Hall, 1995.
Haykin, S. Neural Networks – A Comprehensive Foundation. 1 st ed. New York, Macmillan, 1994.
Sotoma, I. and Madeira, E.R.M. DPCP (Discard Past Consider Present) – A Novel Approach to Adaptive Fault Detection in Distributed Systems. 8 th IEEE Workshop on Future Trends of Distributed Computing Systems (FTDCS’2001), vol.1, pp.76-82, Bologna, Italy, October 2001.
Hood, C. S. and Ji, C. Intelligent Agents for Proactive Fault Detection. Internet Computing, v.2, n.2, pp.65-72, March/April, 1998.
Batalha, M. and Macêdo R.J.A.Um Serviço Tolerante a Falhas para o Gerenciamento de Sistemas Distribuídos Sobre CORBA. Proceedings of the Latin-American Conference on Informatics (CLEI’2001). Mérida, Venezuela. September/2001.
Chen W., Toueg, S. and Aguilera, M.K. On the Quality of Service of Failure Detectors. Proceedings of the International Conference on Dependable Systems and Networks (DSN 2000), New York, June 2000.
Hiramatsu, A. Training Techinques for Neural Network Applications in ATM. IEEE Communications Magazine, pp.58-67, October 1995.
Shokri, E. and Beltas, P. An Experiment with Adaptive Fault Tolerance in Highly-Constraint Systems. Proceedings of the Fifth International Workshop on Object-Oriented Real-Time Dependable Systems, California, USA, November 1999.
