The Performance of Cache Coherency in SCI-based Multiprocessors
Resumo
The Scalable Coherent Interface (SCI) is an IEEE/ANSI standard that defines a hardware platform for scalable shared-memory multiprocessors. This paper contains a quantitative performance evaluation of SCI-connected multiprocessors that assesses both the communication and cache coherence subsystems. 1D, 2D and 3D tori with 16 and 64 nodes are investigated. For the architecture (100MHz Sparc, 2 levels of caches) and workload simulated, it was found that raw network bandwidth seen by a processing element is under 100Mbytes/s. The 3-D toros is 10-15% faster than the 2-D toros for programs that generate high leveis of network traffic. Otherwise, the differences in performance between 2-D and 3-D tori are negligible.
Referências
J A C Bogaerts, R Divià, H Müller, and J F Renardy. SCI based data acquisition architectures. IEEE Trans. on Nuclear Sciences, 39(2), April 1992.
William J Dally and Charles L Seitz. Deadlock-Free message routing in multiprocessor interconnection networks. IEEE Trans. on Computers, C-36(5):547-553, May 1987.
N Deo, C Y Pang, and R E Lord. Two parallel algorithms for shortest path problems. Tech Report CS-80-059, Washington State Univ, March 1980.
D Grunwald, G J Nutt, D Wagner, and B Zorn. A parallel execution evaluation testbed. Tech Report CU-CS-560-91, Dept of Computer Science, Univ of Colorado, November 1991.
Roberto A Hexsel. A Quantitative Performance Evaluation of SCI Memory Hierarchies. PhD dissertation, Dept of Computer Science, Univ of Edinburgh, October 1994. Tech Report CST-112-94.
Roberto A Hexsel and Nigel P Topham. The performance of SCI multiprocessor rings. Journal of the Brazilian Computer Society, 1(2):24-37, July 1995.
IEEE. IEEE Std 1596-1992 - Standard for Scalable Coherent Interface. IEEE, 1992.
Ross E Johnson and James R Goodman. Interconnect topologies with point-to-point rings. Tech Report 1058, Computer Sciences Dept, Univ of Wisconsin-Madison, December 1991.
A Kägi, N Aboulenein, D C Burger, and J Goodman. An analysis of the interactions of overhead-reducing techniques for shared-memory multiprocessors. In Proc of the Intl Conf on Supercomputing (ICS95), pages 11-20, Barcelona, July 1995. ACM Press.
S T Kofuji, C A P da Silva, L G G Katake, M H S Cintra, and J A Zuffo. Anéis e hierarquias de anéis com interconexões ANSI/IEEE Sei. In VII Simp Brasileiro de Arquit de Computadores - Proc de Alto Desempenho, pages 11-25, julho 1995.
S T Kofuji, M X T Delgado, E D M Ordonez, and J A Zuffo. Efeito da migração de páginas no SPADE-I: um multiprocessador de larga escala com memória compartilhada. In XXII Semin Integrado de Software e Hardware, pages 61-73, julho 1995.
Leslie Lamport. How to make a multiprocessor that correctly executes multiprocess programs. IEEE Trans. on Computers, C-28(9):690-691, September 1979.
S L Scott, J R Goodman, and M K Vernon. Performance of the SCI ring. In Proc. 19th Intl. Symp. on Computer Architecture, pages 403-414. ACM SIGARCH Comp Arch News 20(2), May 1992.
Steven L Scott and James R Goodman. The impact of pipelined channels on k-ary n-Cube networks. IEEE Trans. on Parallel and Distributed Systems, 5(1):2-16, January 1994.
J P Singh, W-D Weber, and A Gupta. SPLASH: Stanford ParalleL Applications for SHared-memory. Technical Report eSL-TR-91-469, Computer Science Dept, Stanford Univ, April 1991. Also in ACM SIGARCH Comp Arch News 20(1).
Wolf-Dietrich Weber and Anoop Gupta. Analysis of cache invalidation patterns In multiprocessors. In: 3rd Intl. Conf. on Architectural Support for Progr. Lang. and Oper. Sys., pages 243-256. ACM SIGARCH Comp Arch News 17(2), April 1989.
Philip J Woest and James R Goodman. An analysis of synchronization mechanisms in shared-memory multiprocessors. Tech Report 1005, Computer Sciences Dept, Univ of Wisconsin-Madison, April 1991.