Using Petri-Net Modelling to Support the Case for HW-Assisted Task Scheduling
Given the pervasiveness of multi-core processors in systems from various domains, the need for efficient parallelization tools has only increased during the last decade. Among the paradigms built to answer this demand, Task Parallelism stands out as a highly productive tool for leveraging data parallelism with minimum code altering. Nonetheless, its current supporting runtimes cannot efficiently execute workloads involving tasks in the fine 1-100us range, limiting its applicability. That said, by performing a thorough Petri-Net-based analysis of task parallel systems with several degrees of HW-assistance, we show that the development of Native CPU support for Task Parallelism is the key for efficiently serving these challenging workloads.
Bamnote, R. and Nerkar, R. P. (2015). Review on Dynamic Task Scheduling to Support OoO Execution in an MPSoC Environment. In Int'l Journal of Computer Applications.
Borkar, S. and Chien, A. (2011). The Future of Microprocessors. In Comm. of the ACM.
César, D. (2016). MTSP: Multicore Task Scheduling Platform. https://bitbucket.org/lgeunicamp/mtsp/.
Dallou, T., Elhossini, A., and Juurlink, B. (2013). FPGA-Based Prototype of Nexus++ Task Manager. In 6th Wksh. on Many-Task Computing on Clouds Grids and Supercomputers.
Dallou, T. and Engelhardt, N. (2015). Nexus : A Distributed Hardware Task Manager In IEEE 29th Int'l Parallel and Distributed for Task-Based Programming Models. Processing Symposium (IPDPS).
Dallou, T., Lucas, D. S., Araujo, G., Morais, L., Frank, M., and Ferreira, E. (2016). Task Parallel Programming Model + Hardware Acceleration = Performance Advantage (poster). In Hot Chips.
Dingle, N. J., Knottenbelt, W., and Suto, T. (2009). PIPE2: A Tool for the Performance Evaluation of Generalised Stochastic Petri Nets. In ACM SIGMETRICS Performance Evaluation Review, pages 34–39.
GNU (2013). An OpenMP implementation for GCC. http://gcc.gnu.org/projects/gomp.
Gupta, G. and Sohi, G. (2011). Dataow Execution of Sequential Imperative Programs on Multicore Architectures. In Proc. 44th IEEE/ACM Int'l Symp. on Microarchitecture.
Intel (2013). Intel OpenMP Runtime Library. https://www.openmprtl.org.
Kumar, S., Hughes, C., and Nguyen, A. (2007). Carbon: Architectural Support for FineGrained Parallelism on Chip Multiprocessors. In Proc. 34th annual Int’l Symp. on Computer architecture.
Meenderinck, C. and Juurlink, B. (2010). A Case for Hardware Task Management Support for the StarSS Programming Model. In Proc. 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools.
Virouleau, P., Brunet, P., Broquedis, F., Furmento, N., Thibault, S., Aumage, O., and Gautier, T. (2014). Evaluation of OpenMP Dependent Tasks with the KASTORS Benchmark Suite. In 10th Int'l Workshop on OpenMP, pages 16 – 29.
Vuduc, R., Chandramowlishwaran, A., and Choi, J. (2010). On the Limits of GPU Acceleration. In Proc. 2nd USENIX conference on Hot topics in parallelism, page 13.
Wang, C., Li, X., and Zhang, J. (2013). MP-Tomasulo: A Dependency-Aware Automatic Parallel Execution Engine for Sequential Programs. In ACM Transactions on Architecture and Code Optimization (TACO), pages 1–9.
Yazdanpanah, F., íAlvarez, C., and Jiménez-González, D. (2015). Picos: A hardware runtime architecture support for OmpSs. In Future Generation Computer Systems.