Dan Bonachea
email
Dan Hettena
email
Active Messages (AM) is a lightweight messaging protocol used to optimize network communications with an emphasis on reducing latency by removing software overheads associated with buffering and providing applications with direct user-level access to the network hardware [1]. AM provides low-level asymmetric network messaging primitives which have come into popular use as a low-level substrate in the implementations of higher-level parallel languages and systems, for example MPI, Split-C, Titanium, and others [2], [3], [4].
Most implementations of AM are highly hardware-specific, and in particular they usually require the support of high-performance, non-commodity "smart" network interfaces such as Myrinet [5]. This unfortunately presents a problem when trying to run AM-based software on systems that have commodity network interface hardware, or network interfaces for which no AM implementation is readily available. This first part of this project attempts to bridge that gap by providing an AM-2 implementation that runs on UDP, a standard component of the TCP/IP protocol suite that is ubiquitous across platforms and operating systems [6]. We don't expect to achieve latency performance competitive with a native implementation of AM optimized for special purpose hardware, instead we seek to provide a compatibility layer that will allow AM-based systems to quickly get up and running on virtually any platform. The motivation for choosing UDP over other protocols (such as TCP) is that it typically provides the lowest overhead access to the network with little or no internal buffering, and the connectionless model is best suited for scaling up the number of distributed processors. Because UDP occasionally drops packets, we add a thin reliability layer that provides the guaranteed delivery required by AM-2, hopefully providing this fault tolerance with better performance than full-blown TCP.
In the second part of this project, we attempt to further optimize the AM over UDP layer by targeting a specific commodity Ethernet interface (the Intel EtherExpress Pro 100). We provide a special version of the AM layer that bypasses the kernel implementation of UDP for that network interface card (NIC) and uses high-performance direct hardware access with minimal buffering. This provides a high-performance AM implementation on systems that use the Intel EtherExpress Pro 100 commodity hardware (which is very common on x86 architectures), and specifically will provide high-performance active messages on the IStore architecture which also uses that NIC [7].
[1] Active Messages - 2 Specification
https://www2.eecs.berkeley.edu/Pubs/TechRpts/1996/5768.html
(archived PDF)
[2] Message Passing Interface (MPI) Forum home page. http://www.mpi-forum.org/
[3] Split-C. https://en.wikipedia.org/wiki/Split-C
[4] Titanium home page. http://titanium.cs.berkeley.edu
[5] Myrinet home page. http://www.myricom.com/
[6] RFC 768, UDP Specification. http://www.ietf.org/rfc/rfc0768.txt
[7] ISTORE home page. http://iram.cs.berkeley.edu/istore/
[8] AMMPI home page. https://gasnet.lbl.gov/ammpi
This paper is based on version 1.3, which is available in the download archive.
An implementation of the AM-2 spec over UDP that should run on any standard POSIX system (including Mac OSX and MS Windows with Cygwin).Archived versions
README with ChangeLog
Browse the full archive