partitioned shared memory, Non-Uniform Memory Access (NUMA), heap manager, multithread, manycore
The Distributed Shared Memory (DSM) architecture is widely used in today’s computer design to mitigate the ever-widening processing-memory gap, and it inevitably exhibits Non-Uniform Memory Access (NUMA) to shared-memory parallel applications. Failure to adapt to the NUMA effect can significantly downgrade application performance, especially on today’s manycore platforms with tens to hundreds of cores. However, traditional approaches such as first-touch and memory policy fall short in false page-sharing, fragmentation, or ease of use. In this paper, we propose a partitioned shared-memory approach that allows multithreaded applications to achieve full NUMA-awareness with only minor code changes and develop an accompanying NUMA-aware heap manager which eliminates false page-sharing and minimizes fragmentation. Experiments on a 256-core cc-NUMA computing node show that the proposed approach helps applications to adapt to NUMA with only minor code changes and improves the performance of typical multithreaded scientific applications by up to 4.3 folds with the increased use of cores.
Tsinghua University Press
Zhang Yang, Aiqing Zhang, Zeyao Mo. PsmArena: Partitioned Shared Memory for NUMA-Awareness in Multithreaded Scientific Applications. Tsinghua Science and Technology 2021, 26(3): 287-295.