next up previous
Next: Performance Measurements Up: Test Implementation Previous: Test Implementation

Details of the Implementation

Of the ADI layer, we implemented those functions required to run a ping pong benchmark for timing: MPID_Init(), MPID_End(), MPID_CH_Eagerb_send_short() and MPID_CH_Check_incoming(). The latter two belong to the MPICH channel abstraction, and are called by ADI layer functions on receive and send, respectively. Communication progresses only while the thread of control is with the ADI layer. For simplicity, we maintain only a single retransmission timer for the left edge of the send window.

Since short sends are non-blocking, they are performed as follows:

  1. preparing the 16-byte MPI header, of which three words are reserved for the MPI layer's tag, length and mode field, and one word carries the 15 bit sequence and acknowledgment numbers along with a SEQ and ACK bit indicating the validity of those fields.
  2. moving the header and user data as a packet into the transmission buffer area inside the memory segment shared with the U-Net device.
  3. Pushing a descriptor into the U-Net transmission FIFO containing the destination, length, and a pointer to the packet.
  4. Trapping into the Kernel to transfer the packet.
  5. If there are no older unacknowledged packets, reset the timer.
  6. Increase the right send window edge by one message.

When the ADI layer is invoked to receive a message in blocking mode, the following happens:

  1. The U-Net device is queried in a spin lock until a message comes in. While no message is coming in, we check if the oldest message sent off has become due for retransmission. If overdue, we retransmit this message, resets the timer, and go back spinning for an incoming message.
  2. If the incoming message is a duplicate or out-of-order packet, an acknowledgment with the left receive window (duplicate ack) is sent out immediately.
  3. In case the incoming message is in sequence, advance the left edge of the receive window, and remember the sequence number such that the acknowledgment can be piggy backed on the next sendgif.
  4. If the incoming message has a valid ACK field, reset the retransmission timer and advance the left edge of the send window according to the acknowledgment number.
All of this happens inside the MPID_CH_Check_incoming(), which is basically the ADI progress engine.

Finally, MPID_End() finishes by retransmitting until all packets are acknowledged, or a timer goes off. If it has not received acknowledgments for all packets by then, it issues a warning message and exits. It is then the responsibility of the user to ensure that the application has completed correctly.

On the sender side, there is only a single memory copy from the user's send buffer into the U-Net transmission buffer. From there, the packet gets DMAed directly into the network. On the receive side, we were able to collapse the ADI layer to perform only a single memory copy from the U-Net receive buffer space into the user's receive buffer. This was accomplished by leaving the packet inside the U-Net receive buffer until the MPI header is inspected, and the destination of the data is known. The Tulip Device does not have this flexibility -- the incoming packet gets first placed into kernel buffers, and, after the U-Net channel is known, moved to the appropriate U-Net receive buffers. In summary, we have two memory copies on the receive side, and one memory copy for the sender.



next up previous
Next: Performance Measurements Up: Test Implementation Previous: Test Implementation



Bernd Pfrommer
Mon May 26 12:18:25 PDT 1997