lntegrating Message-Passing with Vector Architectures
Resumo
Vector architecures proride excellent computational throughput, while successfully tolerating memory latency by pipelining memory accesses. In this paper, we propose a generalization of vector architectures to message-passing multicomputers, which combines the efficiency of vector computation with the scalablity of distributed-memory systems. In our proposed architecture, each node is a conventional vector processor (with chaining capability and pipelined functional units) augmented by native instructions to send and receive messages through vector registers. In this scheme, inter-node communication can be performed via vector-send/receive instructions, gaining the benefits of communication pipelining, reduced memory copies (memory-to-repter-to-register instead of memory-to-memory-to-cache), and lower communication latency (due to tight processor-communication coupling). We show that this strong integration between functional and communication units can lead to substantial performance improvement over conventional message-passing multicomputers. We model pipelined computation-communication systems both analytically and with a detailed construction-level simulation, and compare this simulation data with empirical results from an Intel Paragon. Preliminary data from a matrix multiplication example indicates our proposed vector-parallel architecture often significant scalability benefits over existing message-passing systems.
Referências
DICKENS, P. M., HEIDELBERGER, P., AND NICOL, D. M. Parallel Direct Ezecution Simulation of Message-Passing Parallel Programs. ICASE/NASA Langley Research Center, June 1994.
EICKEN, T. V., CULLER, D. E., GOLDSTEIN, S. C., AND SCHAUSER, K. E. Active messages: A mechanism for integrated communication and computation. In Proceedings of the 19th International Symposium on Computer Architecture (Gold Coast, Australia, May 1992), pp. 256-266.
FRANKE, H., HOCHSCHILD, P., PATTNAIK, P., PROST, J.-P., AND SNIR., M. MPI on IBM SP1/SP2: Current Status and Future Directions. IBM T. J. Watson Research Center, 1994.
HENNESSY, J. L., AND PATTERSON, D. A. Computer Architecture: A Quantitative Approach. Morgan Kaufmann Publishers Inc., 1990.
HOSTETLER, L. B., AND MIRTICH, B. DLXsim - A Simulator for DLX. University of California, 1990.
LEE, C. G., AND SMITH, J. E. A study of partitioned vector register files. In Proceedings of Supercomputing'92 (Minneapolis, November 1992), pp. 94-103.
MENDES, C. L. Extending DLXsim for parallel architectures. In Proceedings of the 6th Brazilian Symposium on Computer Architecture (Caxambu/ MG, August 1994).
MENDES, C. L., AND REED, D. A. Performance stability and prediction. In Proceedings of the IEEE/USP Workshop on High Performance Computing WHPC'94 (São Paulo, March 1994), pp. 1-15.
MONTRY, G. Panel: Massively parallel vs. parallel vector supercomputers: A user's perspective. In Proceedings of Supercomputing'93 (Portland, November 1993), pp. 918-920.
REINHARDT, S. K., HILL, M. D., LARUS, J. R., LEBECK, A. R., LEWIS, J. C., AND WOOD, D. A. The Wisconsin Wind Tunnel: Virtual prototyping of parallel computers. In Proceeding of the ACM Conference on Measurement & Modeling of Computer System - SIGMETRICS'93 (Santa Clara, May 1993), pp. 48-60.
TAKAMUJI.A, M., AND UTSUMI, T. Why vector parallel? In Proceedings of the High Performance Computing Conference'94 (Singapore, September 1994), pp. 394-398.
THINKING MACHINES CORPORATION. CM5 Technical Summary, October 1991.
UTSUMI, T., IKEDA, M., AND TAKAMURA, M. Architecture of the VPP500 parallel supercomputer. In Proceedings of Supercomputing'94 (Washington, November 1994), pp. 478-487.
WEISS, S. Optimizing a superscalar machine to run vector code. IEEE Parallel & Distributed Technology 1, 2 (May 1993), 73-83.