Cluster Monitoring Platform Based On Self Adaptable Probes
Resumo
Distributed systems based on cluster of workstations are now largely adopted in industries and universities, but they are more complex to manage. Some value added tools offer new services to the operating system to efficiently exploit the cluster power. Usually, such a tool nedds to monitor a set of resources with a variable granularity, and most existing tools use a fixed granularity. In this paper, we propose to use a generic monitoring platform with a variable granularity based on a self adaptable probes, which will offer an automatic granularity tuning to avoid a performance fine grain monitoring,and a reliable mechanism to handle a resource emergency state. Self adaptable monitoring also allows to reduce the overhead produced by the observation system.
Referências
C. Amza, A. L. Cox, S. Dwarkadas, P. Keleher, H. Lu, R. Rajamony, W. Yu, and W. Zwaenepoel. "TreadMarks: Shared Memory Computing on Networks of Workstations". IEEE Computer, Vol. 29, No. 2, pp. 18–28 (February 1996).
Xavier Bonnaire, Daniel Prun, Aline Baggio. "Intrusion Free Monitoring: An Observation Engine for Message Server Based Applications". In Proceedings of the 10th International Symposium on Computer and Information Science, pp. 541–548, Izmir, TURKEY, October 28th 1995.
T. Born, W. Obleöer, L. Schäfers, C. Scheidler. "The Monitoring Facilities of the Graphical Parallel Programming Environment TRAPPER". Proc. of EUROMICRO ’95, Sanremo, Italy, 25–27th January 1995, IEEE CS Press.
Alan Eustace, Amitabh Srivastava. "ATOM: A Flexible Interface for Building High Performance Program Analysis Tools". USENIX Winter 1995: 303–314.
Boutros Saab Céline, Bonnaire Xavier, Folliot Bertil. "A flexible Monitoring Platform to Build Cluster Management Services". IEEE International Conference on Cluster Computing Cluster2000.
Faugère Jean-Charles, Folliot Bertil, Boutrous Céline. "Execution platform for highly parallel applications: a case study for symbolic basis". International Workshop on Distributed Computing Applications on July 5–7, 1999.
Bertil Folliot and Pierre Sens. "Gatos: a Fault-Tolerant Sharing Facility for Parallel Applications". Lecture Notes in Computer Science 852, pages 598, October 1994.
Lu Honghui, Dwarkadas Sandhya, Cox Alan, and Zwaenepoel Willy. "Quantifying the Performance Differences Between PVM and TreadMarks". Journal of Parallel and Distributed Computing, No. 2, pp. 65–78, June 1997.
Masoud Mansour-Samani and Morris Sloman. "Monitoring Distributed Systems". Imperial College Research Report, April 1993.
Barton P. Miller, Mark D. Callaghan, Jonathan M. Cargille, Jeffrey K. Hollingsworth, R. Bruce Irvin, Karen L. Karavanic, Krishna Kunchithapadam and Tia Newhall. "The Paradyn Parallel Performance Measurement Tools". Special Issue on Performance Evaluation Tools for Parallel and Distributed Computer Systems, pages 37–46, November 1995.
Robert H. B. Netzer, Barton P. Miller. "Optimal Tracing and Replay for Debugging Message Passing Parallel Programs". Supercomputing ’92, November 1992, Minneapolis.
Roger J. Noe. "Pablo Instrumentation Environment User’s Guide". University of Illinois, Urbana 61801, April 1994.
Michiel Ronsse and Koen De Bosschere. "RecPlay: A Fully Integrated Practical Record/Replay System". Universiteit Gent, Belgium 1999.
Michiel Ronsse, Koen De Bosschere. "Work in progress: An On-the-fly Data Trace Detector for Recplay, a Record/Replay System for Parallel Programs". In the 16th ACM Symposium on Operating System Principles (Work in progress), October 1997.
Amitabh Srivastava, Alan Eustace. "ATOM – A System for Building Customized Program Analysis Tools". PLDI 1994: 196–205.
Gerard Tel. "Introduction to Distributed Algorithms". Cambridge University Press, 1994.
S. Zhou and X. Zheng and J. Wang and P. Delisle. "Utopia: a load sharing facility for large heterogeneous distributed computer systems". Software–practice and experience, 1993, Volume 23, Número 12, Pages 1305\1336.
