Next: Reliability Without Context
Up: No Title
Previous: Network
This section reports our experience with implementing the MPI startup
phase (MPI_Init()) under U-Net on Fast Ethernet. We encountered and
overcame several difficulties which we feel are worthwhile mentioning.
Setting up a U-Net channel to a remote host requires knowledge of
- the local U-Net port
- the remote host's MAC address
- the remote host's U-Net port
While the remote host's MAC address is a static parameters, and could be read from a
configuration file at startup time, the U-Net port numbers
are not known beforehand. Since port numbers are globally shared
between all processes on a
host, one cannot rely on a certain port number to be available. For
this reason, we
had
to add a port reservation function to the U-Net device. This allowed
the following startup mechanism.
- The master queries its MAC address from the U-Net Tulip device, and
reserves a set of ports, one for each channel to a slave.
- The master reads the names of the slaves from the host file, and
starts up the slaves with rsh, passing its host name as an argument.
- When the slaves come up, they also learn about their MAC address
and reserve a set of ports to communicate to all peers.
- The slaves set up a TCP connection to the master and tell it
their MAC address and the reserved port numbers.
- Now the master has complete information about MAC addresses and
port numbers at each slave, and it sends out the relevant
information to the slaves.
- At that point, the master and all slaves know the MAC address and
corresponding port numbers for their peers, and they set up the U-Net
channels.
- Master and slaves synchronize via another TCP two-way
handshake to make sure that all U-Net channels are set up completely
before returning from MPID_Init(). This ensures that no communication
is attempted before all U-Net channels are set up completely.
One could also design a faster startup procedure, where the slaves set
up U-Net channels with the master in a star topology early on, and then
use the U-Net channels to communicate the port numbers. We decided to
use TCP because first the startup phase is not performance critical,
and second we avoid dealing with reliability issues.
Next: Reliability Without Context
Up: No Title
Previous: Network
Bernd Pfrommer
Mon May 26 12:18:25 PDT 1997