A Stall Metric to Track Communication Performance

  • Alan Mink National Institute of Standards and Technology
  • Wayne Salamon National Institute of Standards and Technology
  • Michael lndovina National Institute of Standards and Technology

Resumo


Probing the communication protocol stack in Linux PC-based clusters to investigate erratic TCP/IP performance has led to a new metric, data stream stall, which is analogous to instruction stream stall in CPUs. Data stream stalling correlates well with unexpected throughput performance dips; the dips are usually due to delayed ACKs or questionable handling of them. We illustrate the use of this data stream stall metric by isolating and correcting the cause of these communication throughput dips in our version of Linux (2.0.29). The availability of this data stream stall metric would provide useful feedback to users by indicating deficient communications performance.Probing the communication protocol stack in Linux PCbased clusters to investigate erratic TCP/IP performance has led to a new metric, data stream stall, which is analogous to instruction stream stall in CPUs. Data stream stalling correlates well with unexpected throughput performance dips; the dips are usually due to delayed ACKs or questionable handling of them. We illustrate the use of this data stream stall metric by isolating and correcting the cause of these communication throughput dips in our version of Linux (2.0.29). The availability of this data stream stall metric would provide useful feedback to users by indicating deficient communications performance.

Palavras-chave: ATM, Communication Protocols, Fast Ethernet, Linux, Performance Measurement, TCP/IP

Referências

D. Becker. T. Sterling, D. Savarse, U. Ranawake and C. Packer, BEOWULF: A Parallel Workstation for Scientific Computation, Proc. of the International Conf. on Parallel Processing, Urbana-Champaign, IL, Vol. 1: Architecture, pp 111-114, Aug. 1995.

J. K. Hollingsworth and B. Buck. DyninstAPI Programmer's Guide, CS-TR-3821, University of Maryland, Aug. 1997.

M. Indovina, A. Mink, R. Snelick and W. Salamon, Performance Measurement of ATM and Ethernet Computing Clusters, Proc. of ATM98 Developments Conf., Rennes, France, Vol. 11, pp 43-64, Mar. 1998.

J. Levine, An Algorithm to Synchronize the Time of a Compiller to Universal Time, IEEE Trans. on Networking, Vol. 3, No. 1. pp 42-50, Feb. 1995.

D. Mills, Network time protocol (version 3): specification, implementation and analysis, DARPA Network Working Group Rpt. RFC-I305, Univ. of Delaware, 1992.

A. Mink, Operating Principies of the MultiKron_II Performance Instrumentation for MIMD Computers, NISTIR 5571, National Institute of Standards and Technology, Dec. 1994.

A. Mink, Operating Principies of the SBus Multikron Interface Board, NISTIR 5652, National Institute of Standards and Technology, May 1995.

A. Mink and W Salamon, Operating Principies of the PCI Bus MultiKron Interface Board, NISTIR 5993, National Institute of Standards and Technology, Mar. 1997.

A. Mink, W. Salamon, J. Hollingsworth and R. Arunachalam. Performance Measurement Using Low Perturbation and High Precision Hardware Assists, Submitted to the 1998 IEEE Real-Time System Symposium.
Publicado
29/09/1999
MINK, Alan; SALAMON, Wayne; LNDOVINA, Michael. A Stall Metric to Track Communication Performance. In: INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING (SBAC-PAD), 11. , 1999, Natal. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 1999 . p. 193-196. DOI: https://doi.org/10.5753/sbac-pad.1999.19789.