Next: Performance Measurements
Up: Test Implementation
Previous: Test Implementation
Of the ADI layer, we implemented those functions required to run a
ping pong benchmark for timing: MPID_Init(), MPID_End(),
MPID_CH_Eagerb_send_short() and MPID_CH_Check_incoming(). The latter
two belong to the MPICH channel abstraction, and are called by ADI
layer functions on receive and send, respectively. Communication
progresses only while the thread of control is with the ADI
layer. For simplicity, we maintain only a single retransmission timer for the left
edge of the send window.
Since short sends are non-blocking, they are performed as follows:
- preparing the 16-byte MPI header, of which three words are
reserved for the MPI layer's tag, length and mode field, and one word
carries the 15 bit sequence and acknowledgment numbers along with a
SEQ and ACK bit indicating the validity of those fields.
- moving the header and user data as a packet into the
transmission buffer area inside the memory segment shared with the
U-Net device.
- Pushing a descriptor into the U-Net transmission FIFO containing
the destination, length, and a pointer to the packet.
- Trapping into the Kernel to transfer the packet.
- If there are no older unacknowledged packets, reset the timer.
- Increase the right send window edge by one message.
When the ADI layer is invoked to receive a message in blocking mode,
the following happens:
- The U-Net device is queried in a spin lock until a message comes
in. While no message is coming in, we check if the oldest message
sent off has become due for retransmission. If overdue, we retransmit
this message, resets the timer, and go back spinning for an incoming
message.
- If the incoming message is a duplicate or out-of-order packet,
an acknowledgment with the left receive window (duplicate ack) is
sent out immediately.
- In case the incoming message is in sequence, advance the left
edge of the receive window, and remember the sequence number such that
the acknowledgment can be piggy backed on the next send
.
- If the incoming message has a valid ACK field, reset the
retransmission timer and advance the left edge of the send window
according to the acknowledgment number.
All of this happens inside the MPID_CH_Check_incoming(), which is
basically the ADI progress engine.
Finally, MPID_End() finishes by retransmitting until all packets are
acknowledged, or a timer goes off. If it has not received
acknowledgments for all packets by then, it issues a warning message
and exits. It is then the responsibility of the user to ensure that
the application has completed correctly.
On the sender side, there is only a single memory copy from the user's
send buffer into the U-Net transmission buffer. From there, the packet
gets DMAed directly into the network. On the receive side, we were
able to collapse the ADI layer to perform only a single memory copy
from the U-Net receive buffer space into the user's receive buffer.
This was accomplished by leaving the packet inside the U-Net receive
buffer until the MPI header is inspected, and the destination of
the data is known. The Tulip Device does not have this flexibility --
the incoming packet gets first placed into kernel buffers, and, after
the U-Net channel is known, moved to the appropriate U-Net receive
buffers. In summary, we have two memory copies on the receive side,
and one memory copy for the sender.
Next: Performance Measurements
Up: Test Implementation
Previous: Test Implementation
Bernd Pfrommer
Mon May 26 12:18:25 PDT 1997