Index of /dist/pami-conduit

[ICO]NameLast modifiedSize

[PARENTDIR]Parent Directory  -
[DIR]contrib/2018-07-19 13:23 -
[TXT]gasnet_core_help.h2018-07-19 12:45 495
[   ]conduit.mak.in2018-07-19 12:45 558
[TXT]gasnet_coll_pami.h2018-07-19 12:45 2.0K
[TXT]license.txt2018-07-19 12:45 2.0K
[TXT]gasnet_core_fwd.h2018-07-19 12:45 2.9K
[   ]Makefile.am2018-07-19 12:45 3.6K
[TXT]gasnet_coll_pami_bcast.c2018-07-19 12:45 4.4K
[TXT]gasnet_core.h2018-07-19 12:45 5.0K
[TXT]gasnet_extended_fwd.h2018-07-19 12:45 5.8K
[TXT]gasnet_coll_pami_allga.c2018-07-19 12:45 6.8K
[TXT]README2018-07-19 12:45 7.1K
[TXT]gasnet_coll_pami_allto.c2018-07-19 12:45 7.3K
[TXT]gasnet_coll_pami_gathr.c2018-07-19 12:45 7.3K
[TXT]gasnet_coll_pami_scatt.c2018-07-19 12:45 7.5K
[TXT]gasnet_core_internal.h2018-07-19 12:45 9.3K
[TXT]gasnet_coll_pami.c2018-07-19 12:45 20K
[   ]Makefile.in2018-07-19 13:22 30K
[TXT]gasnet_extended.c2018-07-19 12:45 54K
[TXT]gasnet_core.c2018-07-19 12:45 66K

GASNet pami-conduit documentation
Paul H. Hargrove <PHHargrove@lbl.gov>

User Information:
-----------------

This is an implementation of the GASNet CORE and EXTENDED
API using the IBM PAMI communication protocol.

Where this conduit runs:
-----------------------

pami-conduit implements GASNet over IBM's Parallel Active Messaging Interface
(PAMI) which is available on the IBM Blue Gene/Q, IBM PERCS/POWER 775 systems,
and InfiniBand-connected clusters running IBM's Parallel Environment (PE)
software.  There is a good collection of PAMI-related links available from
https://github.com/jeffhammond/pami-examples/blob/master/README

PAMI is the recommended GASNet conduit on the Blue Gene/Q, and POWER 775
(a.k.a. PERCS) systems, and has been developed and tested on both.  On
InfiniBand-connected clusters, ibv-conduit is likely to provide superior
performance.

There are no known minimum required versions of PAMI or related software

Optional compile-time settings:
------------------------------

* The following compile-time settings from extended-ref
  (see the extended-ref README)

 + GASNETI_THREADINFO_OPT - optimize thread discovery using hidden local variable

 + GASNETI_LAZY_BEGINFUNCTION - postpone thread discovery to first use

 + GASNETE_SCATTER_EOPS_ACROSS_CACHELINES(1/0) - scatter newly allocated eops
    across cache lines to reduce false sharing

Recognized environment variables:
---------------------------------

* All the standard GASNet environment variables (see top-level README)

* GASNET_BARRIER - barrier algorithm selection
  In addition to the algorithms in the top-level README, there are two
  PAMI-specific values supported:
    PAMIDISSEM - like AMDISSEM, but implemented using PAMI-level AMs.
    PAMIALLREDUCE - barrier matching is implemented in terms of a
                    PAMI-level ALLREDUCE collective operation.
  Currently PAMIDISSEM is the default on all PAMI platforms.

* GASNET_USE_PAMI_COLL - enable use of native-PAMI collectives
  Not all collectives are supported for all input conditions, but when
  support is available this setting controls if it will be used.
  [NOTE: currently only blocking collectives are implemented over PAMI]
  Additionally, the following allow finer-grained control over which
  collective operations use PAMI_Collective() when GASNET_USE_PAMI_COLL
  is enabled:
    GASNET_USE_PAMI_BROADCAST - gasnet_coll_broadcast functions
    GASNET_USE_PAMI_EXCHANGE  - gasnet_coll_exchange functions
    GASNET_USE_PAMI_GATHER    - gasnet_coll_gather functions
    GASNET_USE_PAMI_GATHERALL - gasnet_coll_gather_all functions
    GASNET_USE_PAMI_SCATTER   - gasnet_coll_scatter functions
  Default value for variables in this family is YES

* GASNET_NETWORKDEPTH - depth of AM Request queues (default 1024)
  This integer parameter sets the limit on the number of outstanding
  Active Message Requests, where outstanding is defined in terms of
  local completion of the network send.
  Too-small values may reduce performance of AM-intensive applications.
  Too-large values may result in excessive buffering requirements in
  AM-intensive applications which can both reduce performance and can
  result in excessive memory use.
  Applications not sending "floods" of AMs will be be insensitve to
  the value of this parameter.

* GASNET_AMPOLL_MAX - limit on work done in AMPoll (default 16)
  This integer parameter sets the maxumum number of PAMI operations
  to be retired by a call to gasnet_AMPoll().

Known problems:
---------------

* See the GASNet Bugzilla server for details on known bugs:
  http://gasnet-bugs.lbl.gov/

Future work:
------------

The following are planned work items for pami-conduit:

* Use dynamic registration (firehose) when local addres is out-of-segment?
  Initial benchmarks seem to show PAMI getting RDMA speeds for xfers of
  sufficient size even when using PAMI_Put/Get, suggesting that some
  dynamic registration is already used internally.
  However, the gap between Put and Rput bandwidth between 2KB and 64KB as
  measured with 1 proc-per-node on PERCS shows that there is currently a
  *possibility* that dynamic registration (firehose) could be beneficial.

* Register bounce buffers used for AM headers and payloads and apply
  the appropriate "use_rdma" hints.

* Use multiple PAMI contexts/endpoints.  At a minimum it would be desirable
  to separate the AM and RDMA for independent progress.  Use of multiple
  endpoints when using pthreads is also worth some implementation effort.
  A separate context used for the exit coordination would prevent deadlock
  when exiting from an AM handler.

* Explore use of PAMI's "remote_async_progress" hint.

* Explore use of bounce buffers to avoid blocking for local completion
  of non-blocking/non-bulk Puts.

* Explore use of conduit-level flow control for AMs, though it is not yet
  certain that this is needed as it was with dcmf-conduit.

* Improve exit handling to raise SIGQUIT for non-collective exits.

* For sufficiently small payloads, AMRequestLong could use a bounce buffer
  to avoid stalling for local completion.

* Explore use of PAMI_Send_immediate() for small enough Medium and/or Long AMs.

==============================================================================

Design Overview:
----------------

* Core API:
  + GASNet's AMs are implemented in terms of PAMI's AMs, and execute
    handlers directly from the PAMI callbacks.
    - Short AMs use PAMI_Send_immediate() due to their length.
    - Medium and Long AMs use PAMI_Send().
    - Medium AMs copy their payloads to bounce buffers to avoid
      stalling for local completion.
    - Long Request AMs block for local completion.
    - LongAsync Request AMs do NOT block for local completion.
    - Long Replies AMs copy their payloads to bounce buffers, like
      a Medium AM, since our running of AM handlers from PAMI's
      callbacks precludes blocking for local completion.
  + The current default barrier is a PAMI-specifc implementation of the
    dissemination barrier in terms of PAMI_Send_immediate().
  + GASNet's exit handling is done using an PAMI "all reduce" operation
    (w/ a timeout) to determine the MAX() of the exit codes, and whether
    the exit is collective.  For non-collective exits, the conduit is
    currently calling exit(1) and using the fact that IBM's software
    will take care of aborting the job.  However this does NOT get the
    desired behavior of raising SIGQUIT on the non-exiting nodes.
  + PSHM is supported through the default mechanisms.

* Extended API:
  + GASNET_SEGMENT_FAST and GASNET_SEGMENT_LARGE are identical:
    - The segment is allocated using mmap() via the default mechanisms.
    - The segment is pinned/registered as a single PAMI memory region.
  + All Extended API operations are performed using PAMI_Rput and _Rget
    when both addresses fall in the GASNet segment, and PAMI_Put and
    _Get otherwise.  As a result, GASNET_SEGMENT_EVERYTHING "just works".
  + The blocking operations block for remote completion, of course.
  + The non-blocking NON-BULK Put operations will stall for the required
    local completion, but don't need to stall for remote completion.
    There is not currently any use of bounce buffers for these Puts.