LBL

GASNet

UCB


Download
Specification
Publications
Performance
Users/Links
Bugs
Contact

GASNet 2004 Performance (historical)

The results on this page are VERY old, demonstrating GASNet performance on long-outdated/decomissioned systems, and only shown here for historical interest.

Click here for more recent GASNet performance results


The following graphs show performance of GASNet release v1.4, measured 11/2004.

Jump to: Summary
elan: Itanium2-QsNet2/elan4, AlphaServer-QsNet1/elan3
gm: Itanium2-Myrinet/GM2
lapi: Power4-Federation/LAPI and GASNet vs MPI
shmem: Itanium2-Altix3000/SHMEM, CrayX1/SHMEM and GASNet vs MPI
udp: Itanium2-GigabitEthernet - GASNet vs MPI
vapi: Opteron-Infiniband/VAPI - GASNet vs MPI

Summary: GASNet performance across many networks, machines and conduits

latbw

elan-conduit: on 'MPP2' at PNNL

1.5 GHz Intel Itanium-2, Quadrics QsNet2/Elan4 interconnect, 8GB main mem

mpp2

elan-conduit: on the Lemieux TSC at the Pittsburgh Supercomputing Center

Compaq Alphaserver SC, ES45 Elan3, double-rail (only tested w/single) 
750-node, 4-way 1GHz Alpha, 4GB, libelan1.3, OSF 5.1

lemieux

gm-conduit: on the Berkeley CITRIS cluster

Itanium-2 Linux Cluster, LANai10.0 PCI-X, GM 2.0.8 
64-node, 2-way 1.3 GHz Itanium-2, 4GB 
PCI-X bus bandwidth: 482 MB/sec read, 501 MB/sec read

citris

lapi-conduit: on the NPACI/SDSC 'DataStar' IBM SP

IBM Federation Interconnect, LAPI v.2.3.2.1, 8-way 1.5GHz Power4, 16GB main mem

datastar
Comparing GASNet lapi-conduit and IBM MPI back-to-back on the same Power4/Federation hardware:
Both GASNet and IBM MPI are built on top of the IBM LAPI interface. GASNet consistently and significantly outperforms IBM MPI, because GASNet's lightweight semantics are a closer match to the operations exposed in the LAPI interface, and eliminate MPI's tag/communicator matching and message ordering enforcement that generally impose additional copies and CPU overheads.
datastar-mpi

shmem-conduit: on the Altix 3000 'Ram' at ORNL

1.5 GHz Intel Itanium-2, 6MB L3, 256KB L2, 32K L1, 2 TB system memory (8GB/node)

ram

shmem-conduit: on the Cray X1 'Phoenix' at ORNL

512 MSPs, 2MB cache per MSP, 16 GB per node

phoenix
phoenix-mpi

udp-conduit vs MPICH_p4 MPI: 

Comparing GASNet udp-conduit and MPICH_p4 MPI back-to-back on the same Gigabit Ethernet hardware:
1.3 GHz Dual Itanium-2, 4 GB, Broadcom NetXtreme BCM5701 Gigabit Ethernet, Linux 2.4.20, PCI-X  
PCI-X bus bandwidth: 482 MB/sec read, 501 MB/sec read

Both layers pay a high latency cost due to kernel crossings and store-and-forward routing required by Ethernet (which generally make Ethernet an unsuitable network hardware for parallel scientific computing). However, GASNet consistently and significantly outperforms MPICH_p4. MPICH-p4 is built on TCP, a non-scalable and heavyweight connection-based protocol that often imposes extra copies and includes reliability and ordering protocols designed for streaming applications. GASNet builds its operations directly on UDP, a light-weight unordered datagram protocol which is the lowest-level protocol portably available in the TCP/IP stack. UDP is an unreliable protocol, so GASNet achieves reliability using a protocol which is designed and tuned specifically for the needs of HPC (see AMUDP). GASNet semantics never guarantee ordering, and the implementation gains performance by not providing it.
citris-mpi

vapi-conduit vs MVAPICH MPI: 

Comparing GASNet vapi-conduit and OSU MVAPICH MPI back-to-back on the same Infiniband hardware:
Dual 1.4Ghz Opteron, 1GB main memory mem Linux 2.4.21-bigphys, Mellanox InfiniHost (Cougar) IB-4X HCAs, Mellanox Drivers 3.1, Firmware 3.0.0
MVAPICH numbers are from their own unmodified tester, re-normalized to MB=2^20 bandwidth bytes and full round-trip latency. GASNet numbers are from the GASNet testsmall and testlarge benchmarks.

GASNet consistently and significantly outperforms MVAPICH because the GASNet one-sided put/get semantics are fundamentally a better match for the capabilities of the underlying RDMA hardware than MPI's two-sided message passing semantics. GASNet's put/gets turn into simple, fully one-sided RDMA operations in the common case, and therefore reap the hardware peak performance, whereas MPI pays in performance for enforcing MPI's ordering and tag matching semantics.

lambda-mpi

Back to the GASNet home page