GASNet-EX 2018 Performance Examples

Click here for more recent GASNet performance results

The following graphs show performance examples of GASNet-EX release 2018.9.0, measured late 2018.

Many of these graphs are reprinted (with permission) from the following publication, which also contains further discussion of the study and results:
Bonachea D, Hargrove P. GASNet-EX: A High-Performance, Portable Communication Library for Exascale, Proceedings of Languages and Compilers for Parallel Computing (LCPC'18). Oct 2018. doi:10.25344/S4QP4W.

Test Methodology:

All tests use two physical nodes, with one core injecting communication operations to the remote node and all other cores idle. Hardware configuration details are provided in each section.

GASNet-EX RMA (Put and Get) results report the output of testlarge and testsmall, provided as part of the 2018.9.0 source distribution.
MPI RMA (Put and Get) results report the output of the Unidir_put and Unidir_get tests from the IMB-RMA portion of the Intel MPI Benchmarks, v2018.1.
These tests measure the performance of MPI_Put() and MPI_Get() in a passive-target access epoch synchronized with MPI_Win_flush().
MPI message-passing (Isend/Irecv) results report the output of the Uniband test from the IMB-MPI1 portion of the Intel MPI Benchmarks, v2018.1.
Flood Bandwidth Graphs show uni-directional non-blocking flood bandwidth, and compare GASNet-EX testlarge with the "MODE: AGGREGATE" bandwidth reports of the Unidir_put and Unidir_get tests and the bandwidth report of the Uniband test.
All bandwidth is reported here in units of Binary Gigabytes/sec (GiB/sec), where GiB = 2³⁰ bytes (GASNet-EX tests report Binary Megabytes (MB=2²⁰) while IMB tests report Decimal Megabytes (MB=10⁶)).
Command lines used:
- [mpirun -np 2] testlarge -m -in [iters] 4194304 B
- [mpirun -np 2] IMB-RMA -time 600 -iter_policy off -iter [iters] -msglog 4:22 Unidir_put
- [mpirun -np 2] IMB-RMA -time 600 -iter_policy off -iter [iters] -msglog 4:22 Unidir_get
- [mpirun -np 2] IMB-MPI1 -time 600 -iter_policy off -iter [iters] -msglog 4:22 Uniband
Latency Graphs show uni-directional blocking operation latency, and compare GASNet-EX testsmall with the "MODE: NON-AGGREGATE" latency reports of Unidir_put and Unidir_get tests.
Latency is reported as total operation completion time (i.e. a wire-level round-trip) in microseconds (μs).
Command lines used:
- [mpirun -np 2] testsmall -m -in [iters] 4096 A
- [mpirun -np 2] IMB-RMA -time 600 -iter_policy off -iter [iters] -msglog 2:12 Unidir_put
- [mpirun -np 2] IMB-RMA -time 600 -iter_policy off -iter [iters] -msglog 2:12 Unidir_get

aries-conduit vs Cray MPI: on 'Cori (Phase-I)' at NERSC

Cori-I: Cray XC40, Cray Aries Interconnect, Node config: 2 x 16-core 2.3 GHz Intel "Haswell", PE 6.0.4, Intel C 18.0.1.163, Cray MPICH 7.7.0

aries-conduit vs Cray MPI: on 'Cori (Phase-II)' at NERSC

Cori-II: Cray XC40, Cray Aries Interconnect, Node config: 68-core 1.4 GHz Intel Phi "Knights Landing", PE 6.0.4, Intel C 18.0.1.163, Cray MPICH 7.7.0

ibv-conduit vs IBM Spectrum MPI: on 'Summit' at OLCF

Summit: Mellanox EDR InfiniBand, Node config: 2 x IBM POWER9, Red Hat Liux 7.5, GNU C 6.4.0, IBM Spectrum MPI 10.2.0.7-20180830
These are results for a single InfiniBand HCA.

We gratefully acknowledge the assistance of Geoffroy Vallee of ORNL, who collected the results on Summit.

ibv-conduit vs MVAPICH2: on 'Gomez' at the Joint Laboratory for System Evaluation (JLSE) at Argonne National Laboratory

Gomez: Mellanox EDR InfiniBand, Node config: 2 x Intel Xeon E7-8867v3 "Haswell-EX", Red Hat Linux 7.4, GNU C 4.8.5, MVAPICH2 2.3

gemini-conduit vs Cray MPI: on 'Titan' at Oak Ridge National Laboratory (OLCF)

Titan: Cray XK7, Cray Gemini Interconnect, Node config: 16-core 2.2 GHz AMD Opteron 6274 (GPUs not used), PE 5.2.82, PGI C 18.4, Cray MPICH 7.6.3

pami-conduit vs IBM MPI: on 'Cetus' at Argonne Leadership Computing Facility (ALCF)

Cetus: IBM Blue Gene/Q, 5D Torus Proprietary Interconnect, Node config: 16 1.6 GHz PowerPC64 A2 cores, BG/Q driver V1R2M4, GCC 4.4.7, IBM MPI (V1R2M4, MPICH2 1.5 based)

MPI-3 RMA is not supported in the IBM MPI implementation for this system.

This research was funded in part by the Exascale Computing Project (17-SC-20-SC), a collaborative effort of the U.S. Department of Energy Office of Science and the National Nuclear Security Administration.

This research used resources of the National Energy Research Scientific Computing Center, a DOE Office of Science User Facility supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231.

This research used resources of the Argonne Leadership Computing Facility, which is a DOE Office of Science User Facility supported under Contract DE-AC02-06CH11357.

This research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725.

Back to the GASNet home page

aries-conduit:	Cray XC40 with Haswell CPUs, Cray XC40 with Xeon Phi CPUs
ibv-conduit:	EDR InfiniBand with POWER9 CPUs, EDR InfiniBand with Haswell CPUs
gemini-conduit:	Cray XK7
pami-conduit:	IBM Blue Gene/Q