GASNet-EX 2018 Performance Examples
The following graphs show performance examples of GASNet-EX release 2018.9.0, measured late 2018.
Many of these graphs are reprinted (with permission) from the following
publication, which also contains further discussion of the study and results:
Bonachea D, Hargrove P.
GASNet-EX: A High-Performance, Portable Communication Library for Exascale,
Proceedings of Languages and Compilers for Parallel Computing (LCPC'18). Oct 2018.
doi:10.25344/S4QP4W.
Test Methodology:
All tests use two physical nodes, with one core injecting communication
operations to the remote node and all other cores idle.
Hardware configuration details are provided in each section.
-
GASNet-EX RMA (Put and Get) results report the output of
testlarge and testsmall, provided as part of the 2018.9.0
source distribution.
-
MPI RMA (Put and Get) results report the output of the
Unidir_put and Unidir_get tests from the IMB-RMA
portion of the
Intel MPI Benchmarks, v2018.1.
These tests measure the performance of MPI_Put() and
MPI_Get() in a passive-target access epoch synchronized with
MPI_Win_flush().
-
MPI message-passing (Isend/Irecv) results report the output of the
Uniband test from the IMB-MPI1 portion of the
Intel MPI Benchmarks, v2018.1.
-
Flood Bandwidth Graphs show uni-directional non-blocking flood
bandwidth, and compare GASNet-EX testlarge with the "MODE: AGGREGATE"
bandwidth reports of the Unidir_put and Unidir_get tests
and the bandwidth report of the Uniband test.
All bandwidth is reported here in units of Binary Gigabytes/sec (GiB/sec),
where GiB = 230 bytes (GASNet-EX tests report Binary Megabytes
(MB=220) while IMB tests report Decimal Megabytes
(MB=106)).
Command lines used:
- [mpirun -np 2] testlarge -m -in [iters] 4194304 B
- [mpirun -np 2] IMB-RMA -time 600 -iter_policy off -iter [iters] -msglog 4:22 Unidir_put
- [mpirun -np 2] IMB-RMA -time 600 -iter_policy off -iter [iters] -msglog 4:22 Unidir_get
- [mpirun -np 2] IMB-MPI1 -time 600 -iter_policy off -iter [iters] -msglog 4:22 Uniband
-
Latency Graphs show uni-directional blocking operation latency, and
compare GASNet-EX testsmall with the "MODE: NON-AGGREGATE" latency
reports of Unidir_put and Unidir_get tests.
Latency is reported as total operation completion time (i.e. a
wire-level round-trip) in microseconds (μs).
Command lines used:
- [mpirun -np 2] testsmall -m -in [iters] 4096 A
- [mpirun -np 2] IMB-RMA -time 600 -iter_policy off -iter [iters] -msglog 2:12 Unidir_put
- [mpirun -np 2] IMB-RMA -time 600 -iter_policy off -iter [iters] -msglog 2:12 Unidir_get
Jump to:
aries-conduit vs Cray MPI: on 'Cori (Phase-I)' at NERSC
Cori-I:
Cray XC40, Cray Aries Interconnect,
Node config: 2 x 16-core 2.3 GHz Intel "Haswell", PE 6.0.4, Intel C 18.0.1.163, Cray MPICH 7.7.0
aries-conduit vs Cray MPI: on 'Cori (Phase-II)' at NERSC
Cori-II:
Cray XC40, Cray Aries Interconnect,
Node config: 68-core 1.4 GHz Intel Phi "Knights Landing", PE 6.0.4, Intel C 18.0.1.163, Cray MPICH 7.7.0
ibv-conduit vs IBM Spectrum MPI: on 'Summit' at OLCF
Summit:
Mellanox EDR InfiniBand,
Node config: 2 x IBM POWER9, Red Hat Liux 7.5, GNU C 6.4.0, IBM Spectrum MPI 10.2.0.7-20180830
These are results for a single InfiniBand HCA.
We gratefully acknowledge the assistance of Geoffroy Vallee of ORNL, who collected the results on Summit.
ibv-conduit vs MVAPICH2: on 'Gomez' at the Joint Laboratory for System Evaluation (JLSE) at Argonne National Laboratory
Gomez:
Mellanox EDR InfiniBand,
Node config: 2 x Intel Xeon E7-8867v3 "Haswell-EX", Red Hat Linux 7.4, GNU C 4.8.5, MVAPICH2 2.3
gemini-conduit vs Cray MPI: on 'Titan' at Oak Ridge National Laboratory (OLCF)
Titan:
Cray XK7, Cray Gemini Interconnect, Node config: 16-core 2.2 GHz AMD Opteron 6274 (GPUs not used), PE 5.2.82, PGI C 18.4, Cray MPICH 7.6.3
pami-conduit vs IBM MPI: on 'Cetus' at Argonne Leadership Computing Facility (ALCF)
Cetus:
IBM Blue Gene/Q, 5D Torus Proprietary Interconnect, Node config: 16 1.6 GHz PowerPC64 A2 cores, BG/Q driver V1R2M4, GCC 4.4.7, IBM MPI (V1R2M4, MPICH2 1.5 based)
MPI-3 RMA is not supported in the IBM MPI implementation for this system.
This research was funded in part by the Exascale Computing Project
(17-SC-20-SC), a collaborative effort of the U.S. Department of Energy Office
of Science and the National Nuclear Security Administration.
This research used resources of the National Energy Research Scientific
Computing Center, a DOE Office of Science User Facility supported by the Office
of Science of the U.S. Department of Energy under Contract No.
DE-AC02-05CH11231.
This research used resources of the Argonne Leadership Computing Facility,
which is a DOE Office of Science User Facility supported under Contract
DE-AC02-06CH11357.
This research used resources of the Oak Ridge Leadership Computing Facility
at the Oak Ridge National Laboratory, which is supported by the Office of
Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725.
Back to the GASNet home page