Matrix calculations with SIMD floating point instructions on x86 processors

  • André Muezerie USP
  • Raul J. Nakashima USP
  • Gonzalo Travieso USP
  • Jan Slaets USP

Resumo


This paper describes and evaluates the use of SIMD floating point instructions for scientific calculations. The performance of these instructions is compared with ordinary floating point code. Implementation concerns, the effects of loop unroll as well as matrix size variations are analyzed. Execution speeds are compared using matrix multiplication. The intrinsic incompatibility of the SIMD floating point implementations used by different manufacturers requires the use of two different instruction sets: 3DNOW! on the AMD K6 processor and the Streaming-SIMD Extensions (SSE) on the Intel Pentium III processor.

Palavras-chave: SIMD, 3DNOW!, SSE, vector operations, performance evaluation

Referências

ABEL, James et al. Application Tuning for Streaming SIMD Extensions. Intel Technology Journal Q2, 1999.

AMD White Paper. Enhanced 3DNow!TM Technology for the AMD Athlon Processor. AMD-52598A Advanced Micro Devices, Inc. October 4,199.

AMD Application Note. 3DNow!TM lnstruction Porting Guide. AMD Publication #2261, August 1999.

AMD Manual, Extensions to the 3DNow!TM and MMX lnstruction Sets. AMD-224668 Advanced Micro Devices, Inc. August, 1999.

Asus motherboard documentation. http://asus.com/Products/Motherboard/

BLAS - Basic Linear Algebra Subprograms http://www.netlib.org/blas/

The Cygwin toolpack. http://sourceware.cygnus.com/cygwin/

Diefendorff, Keith. Pemium III = Pentium II + SSE Internet SSE Architecture Boosts Multimedia Performance. Microprocessor Report. v. 13, n.3, March 1999. p.6-11.

Intel, Intel Architecture MMXTM Technology in Business Applications. Intel Order Number 243367- 002 June 1997.

Application Note. Software Development Strategies For Streaming SIMD Extensions. Intel AP-814 Order Number 243648-002 January 1999.

Mackay, David; Chio, Steven. Streaming SIMD Extensions and General Vector Operations. ISV Performance Lab, Intel Corporation 1999, [link]

The NASM conversion utility. http://www.kernel.org/pub/software/devel/nasm/

The optimizer utility. http://www.imada.ou.dk/~jews/optimizer/
Publicado
10/09/2001
MUEZERIE, André; NAKASHIMA, Raul J.; TRAVIESO, Gonzalo; SLAETS, Jan. Matrix calculations with SIMD floating point instructions on x86 processors. In: INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING (SBAC-PAD), 13. , 2001, Pirenópolis. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2001 . p. 50-55. DOI: https://doi.org/10.5753/sbac-pad.2001.22192.