BatchQueue: Fast and Memory-Thrifty Core to Core Communication

  • Thomas Preud'homme LIP6 – UPMC/CNRS/INRIA
  • Julien Sopena LIP6 – UPMC/CNRS/INRIA
  • Gael Thomas LIP6 – UPMC/CNRS/INRIA
  • Bertil Folliot LIP6 – UPMC/CNRS/INRIA

Abstract


Sequential applications can take advantage of multi-core systems by way of pipeline parallelism to improve their performance. In such parallelism, core to core communication overhead is the main limit of speedup. This paper presents BatchQueue, a fast and memory-thrifty core to core communication system based on batch processing of whole cache line. BatchQueue is able to send a 32bit word of data in just 12.5 ns on a Xeon X5472 and only needs 2 full cache lines plus 3 byte-sized variables - each on a different cache line for optimal performance - to work. The characteristics of BatchQueue - high throughput and increased latency resulting from its batch processing - makes it well suited for highly communicative tasks with no real time requirements such as monitoring.
Keywords: Multicore processing, Pipelines, Parallel processing, Synchronization, Monitoring, Hardware, Indexes
Published
2010-10-27
PREUD'HOMME, Thomas; SOPENA, Julien; THOMAS, Gael; FOLLIOT, Bertil. BatchQueue: Fast and Memory-Thrifty Core to Core Communication. In: INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING (SBAC-PAD), 22. , 2010, Petrópolis/RJ. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2010 . p. 215-222.