GASNet extended-ref documentation Dan Bonachea User Information: ----------------- The extended-ref directory is not intended for direct use by end users. It contains common code used to implement parts of the GASNet Extended API within many conduits. Recognized environment variables: --------------------------------- * All the standard GASNet environment variables (see top-level README) Optional compile-time settings: ------------------------------ * GASNETE_GETPUT_MEDIUM_LONG_THRESHOLD - size threshold where gets/puts stop using medium messages and start using longs (if/when enabled, see below) * GASNETE_USE_LONG_GETS - true if we should try to use Long replies in gets (only possible if dest falls in segment) * GASNETE_USE_LONG_PUTS - true if we should try to use Long requests in puts * GASNETI_THREADINFO_OPT - optimize thread discovery using hidden local variable * GASNETI_LAZY_BEGINFUNCTION - postpone thread discovery to first use Known problems: --------------- * See the GASNet Bugzilla server for details on known bugs: https://gasnet-bugs.lbl.gov/ Future work: ------------ =============================================================================== Design Overview: ---------------- The extended-ref directory provides a reference implementation of the extended API, written solely in terms of the GASNet core API. This provides an easy porting path for new networks - once the core API is implemented, the extended-ref files provide working implementations of everything else in GASNet. Conduits for networks with good hardware support can subsequently customize and optimize selected functions from the extended API using hardware support. gex_Event_t is a pointer to a gasnet_op_t, which can be either an "eop" type (handle for explicit non-blocking op) or an "iop" type (handle for an implicit non-blocking op or access region). eops are allocated from a thread-specific pool to reduce allocation and locking costs, and basically just contain a flag block to track completion state. iops contain initiation and completion counters for puts and gets that are incremented atomically as operations complete. iops are also kept in a free list once freed to avoid reallocation costs (only created/freed for access regions). Each thread has an iop which is active for any new iop operations. Puts and gets are implemented on AM as follows: gasnet_put(_bulk) is translated to a gasnete_put_nb(_bulk) + sync gasnet_get(_bulk) is translated to a gasnete_get_nb(_bulk) + sync gasnete_put_nb(_bulk) translates to if nbytes < GASNETE_GETPUT_MEDIUM_LONG_THRESHOLD AMMedium(payload) else if nbytes < AMMaxLongRequest AMLongRequest(payload) else gasnete_put_nbi(_bulk)(payload) gasnete_get_nb(_bulk) translates to if nbytes < GASNETE_GETPUT_MEDIUM_LONG_THRESHOLD AMSmall request + AMMedium(payload) reply else gasnete_get_nbi(_bulk)() gasnete_put_nbi(_bulk) translates to if nbytes < GASNETE_GETPUT_MEDIUM_LONG_THRESHOLD AMMedium(payload) else if nbytes < AMMaxLongRequest AMLongRequest(payload) else chunks of AMMaxLongRequest with AMLongRequest() AMLongRequestAsync is used instead of AMLongRequest for put_bulk gasnete_get_nbi(_bulk) translates to if nbytes < GASNETE_GETPUT_MEDIUM_LONG_THRESHOLD AMSmall request + AMMedium(payload) reply else chunks of AMMaxMedium with AMSmall request + AMMedium() reply The current implementation uses AMLongs for large puts because the destination is guaranteed to fall within the registered GASNet segment. The spec allows gets to be received anywhere into the virtual memory space, so we can only use AMLong when the destination happens to fall within the segment - GASNETE_USE_LONG_GETS indicates whether or not we should try to do this. (conduits which can support AMLongs to areas outside the segment could improve on this through the use of this conduit-specific information). Barrier: There are two implementations available to choose between at runtime via the GASNET_BARRIER environment variable. The compile-time constant GASNETE_BARRIER_DEFAULT can be used to set the default barrier. "AMCENTRAL" - a silly, centralized barrier implementation: everybody sends notifies to a single node, where we count them up central node eventually notices the barrier is complete (probably when it calls wait) and then it broadcasts the completion to all the nodes The main problem is the need for the master to call wait before the barrier can make progress - we really need a way for the "last thread" to notify all the threads when completion is detected, but AM semantics don't provide a simple way to do this. The centralized nature also makes it non-scalable - we really want to use a tree-based barrier or pairwise exchange algorithm for scalability (but these impose even greater potential delays due to the lack of attentiveness to barrier progress) "AMDISSEM" - an AM-based Dissemination barrier implementation: With N nodes, the barrier takes ceil(lg(N)) steps (lg = log-base-2). At step i (i=0..): node n first sends to node ((n + 2^i) mod N) then node n waits to receive (from node ((n + N - 2^i) mod N)) once we receive for step i, we can move the step i+1 (or finish) The distributed nature makes this barrier more scalable than a centralized barrier, but also more sensitive to any lack of attentiveness to the network. We use a static allocation, limiting us to 2^GASNETE_AMBARRIER_MAXSTEP nodes. Algorithm is described in section 3.3 of John M. Mellor-Crummey and Michael L. Scott. "Algorithms for scalable synchronization on shared-memory multiprocessors." ACM ToCS, 9(1):21 65, 1991.