next up previous
Next: Short Versus Long Up: Protocol Previous: Protocol

Description of the Protocol

What follows describes the use of the reliable stream to implement the MPI short messages protocolgif. When a short message is sent by the application, it is placed into the reliable message stream. A sequence number gets assigned to it, and a queue descriptor is pushed into the U-Net transmission FIFO. The U-Net device immediately transmits the message. On the receiving side, the U-Net device picks the message up and places it into the U-Net receive buffer. When the application on the receiver calls a receiving subroutine, the ADI layer checks for messages in the receive buffer. If it detects an out-of-order or duplicate (retransmitted) packet, it immediately sends out a duplicate acknowledgmentgif. If the received packet is in order, it is not acknowledged, but the receive call returns immediately. The acknowledgments are all piggy backed onto messages, unless they are triggered by retransmissions or out-of-order packets. This lazy acknowledgment strategy will work well for small loss rates, and for applications that exchange a comparable number of messages in a rather synchronous fashion. It will not work well for a producer-consumer type of application. In this case, one could configure MPI with an environment variable to follow an eager acknowledgment protocol.

For simplicity, only a single retransmission timer is maintained for the packet at the left sender window. Upon retransmission, the timer is backed off exponentially, clamped by a constantgif. Incoming acknowledgments reset the timer.

A correct MPI program must call MPI_Finalize() before a group member process exits. As mentioned before, our protocol relies on this. Before a process leaves the group, it makes sure that it has received acknowledgments for all messages it has sent, and sends out acknowledgments for all packets it has received. If it has not received acknowledgments for sent-out packets after retransmitting a fixed number of times, the process exits with a warning message, and it is up to the user to make sure the application finished correctly. Assuming that message (and acknowledgment) losses rarely happen, we feel this two-way handshake release protocol is adequate to handle the three-army problem[2]

In summary, we suggest to use a reliable message stream with a ``go-back-n''[2] window-based protocol. The send window is only limited by the size of the U-Net transmission queue, whereas at the receiver, a window size of one at the ADI level discards out-of order packets. At the U-Net level however, the receive window size is given by the size of the U-Net recieve queue, such that packets can arrive in a burst without being dropped.



next up previous
Next: Short Versus Long Up: Protocol Previous: Protocol



Bernd Pfrommer
Mon May 26 12:18:25 PDT 1997