next up previous
Next: Related Work Up: Test Implementation Previous: Details of the

Performance Measurements

To assess the performance of our implementation, we ran a ping pong benchmark between two hosts, and measured the round trip time for a four byte long message (twenty bytes long including the MPI header). We find latencies of about 32 s. Timing the raw U-Net ping pong benchmark revealed that the ADI layer is responsible for about 11 of the 32 s. A more detailed timing of our ADI code was severely hampered by the lack of accurate timers. Nevertheless, it appears that the overhead added by the reliability protocol is only about 3-4 s. Another 7-8 s are spent for searching the MPI posted and unexpected queue, for message handling, and for overhead incurred by the multiple device support.

Our first naive implementations showed a latency of 60 s, which under great efforts we curbed to 32 s. There might still be considerable room for improvement by optimizing the queue searches and hand-coding the most time critical routines in assembly language.



Bernd Pfrommer
Mon May 26 12:18:25 PDT 1997