ICS 632, Fall 2008: Homework #2
Send a .tar.gz file to henric@hawaii.edu that contains all
your source code and your report. The top-level directory should be
called ics632_<lastname>_hw2. In that directory there
should be sub-directories called exercise_1, exercise_2, etc. Each such directory should itself have sub-directories called question_1, question_2, etc. These directories contain your code with Makefiles and reports whenever applicable.
Question #1: A Simple broadcast
MPI provides a broadcast function (MPI_Bcast()), which we had described in class. Implement an MPI program
that takes one integer command-line argument, let's call it N. All MPI processes allocates
an array of N integers. Process with rank 0 (i.e., the "master") fills its array with random elements and broadcasts it to
all other processes by calling a function with the following prototype:
void MPI_MyBcast(int *buffer, int count, int root, MPI_Comm comm);
You can see that this function is essentially a simplified version of the
original MPI_Bcast(), which handles multiple data types, returns
all types of error codes, etc. It should be implemented easily via a sequence
of send and receive operations.
Each process that had received the array must
send it back to the master so that the master can check the data (making sure
that it receives the same data it sent out in the first place).
The program then does the same thing calling the original MPI_Bcast() function.
Question #2: Performance Comparison
Modify your code from Question #1 so that MPI_Bcast()
and MPI_MyBcast() are each called 5,000 times. The program also times how
long it takes to do these 10,000 broadcasts with either your function or the one
provided by MPI.
Run your code using 7 nodes, for the following values of
N: 1000, 50000, 100000, 200000.
Discuss the results and the difference between your code and that in the MPICH implementation
of MPI with respect to performance (in a README file).
Question #3 [EXTRA CREDIT]: A Better Broadcast
Modify your implementation of the MPI_MyBcast() function so that
it uses a broadcast tree, by which broadcasting can happen in parallel.
Report on performance improvements (or lack thereof) for the same
experiment as in Question #2. Note that the number 7 is conveniently
chosen so that it's easy to have a binary broadcast tree.
Question #1: Parallel Code
Implement a Master/Worker version of the mystery function minimization from
homework #1, with 100 trials and a step size of 1 (or whatever step size make the code run in a reasonable amount of time!!). Your code should have a
master processor that gives out work to the workers dynamically, i.e.,
only when a worker process is idle. Your code should print out the
load imbalance as defined in homework #1 and the best local minimum found.
henric@hawaii.edu