ICS 632: List of Projects (Fall 2008)
- Most of the projects have an "open-ended" flavor to them. The idea is
to approach them like small research projects in which you define your own
methodology and (some of) your objectives. What I am looking for is
"mature" approaches expected from graduate students. In particular, your
first task should be to come up with a precise formulation of the project.
- Some projects may be harder than others, and expectations will be
tailored to the projects.
- At most 2 students can pick the same project.
- Group projects involving 2 students are possible, but expectations will
be higher for the end result and I refuse to get involved in "I did
everything and my partner did nothing because he/she was out surfing all
the time" arguments.
- Students are strongly encouraged to define their own projects,
but these projects have to be approved by me beforehand and approval will
be contingent on the project being sufficiently involved.
- Looking at previous work, papers, results in the literature is
encouraged. Downloading actual code that does the project is ok for
comparison with your own implementation, but using that code instead
of your own implementation constitutes ground for getting a zero.
- For the projects that require that you write a parallel applications,
it is understood that you should write a sequential version (if need
be) and that you compute speedups with respect to the sequential
version. It is also understood that you will perform in-depth
performance measurements. It is your responsibility to come up with
interesting things to say in your report! One way to do this is coming
up with multiple versions of your implementation so that you can study
what worked and what didn't in terms of performance. Producing only one
implementation doesn't really give you anything interesting to talk
about. Performance analysis/modeling is a plus.
IMPORTANT: You should discuss progress with me, and
not hesitate to ask me questions and for directions!
You must turn in your code, a report (PDF) around 10-pages, and be prepared
to present your project to the class and answer questions (about 15/20 minutes).
Projects are due on 12/8, with presentations starting around 12/1.
Sorting a list of numbers is a key operation in many applications and
it is well-known that it is difficult to parallelize it efficiently. For instance,
due to the fact that the amount of data is large when compared to the amount
of computation, the cost of I/O may be overwhelming. In this project you
will consider the following problem:
Problem Statement: You have a binary file in your home area
that contains a list of N random integers. You must write another
binary file in your home area that contains the sorted list. Your
goal is to do this on our cluster as fast as possible.
This is a difficult
problem because sorting is not very compute intensive, and so the result
may be disappointing. The point is to see whether some speedup can indeed
be achieved and to see what matters for performance.
You will develop several parallel sorting algorithms (we saw one in class,
you can probably come up with a few of your own, or research existing
algorithms), and most likely several versions of each algorithm. For you
performance evaluation focus on measuring I/O and computing costs. You
probably need to spend some time thinking of how the data gets given to the
processors initially. This could be done by some script before the call to
mpirun in your PBS script, in the MPI code itself. Note that each node has
its own local disk, to which I/O is of course faster than over the
NSF-mounted home areas. Comparing different ways of doing the data
distribution is most likely in order. And of course you should vary the
value of N in your experiments. Report on performance discounting file I/O
and on performance including file I/O.
The Smith Waterman algorithm is a dynamic programming algorithm used to
compute global alignments of biological sequences, e.g., to align full
genomes of procaryot organisms (i.e., bacterias). Research this algorithm
(it's famous and information on the algorithm is readily available) and
implement a fast parallel implementation (using existing or random DNA
sequences or arbitrary lengths). The input sequences are stored in files in
the user's home area and the resulting alignment should be stored to an
output file. Come up with reasonable ways of distributing the data. You can
also find parallel implementations of the SW algorithm and compare them with
your own.
In this project you'll try to implement the fastest possible matrix
multiplication on our cluster. You should implement the
outer-product algorithm using both a 2-D non-cyclic distribution and a 2-D
cyclic distribution (you can assume nicely divisible matrix dimensions in
this project). Your code should be self-checking (for instance using the
same scheme as used in HW #3). Perform an in-depth performance analysis of
your implementations and converge towards the fastest implementation,
explaining the steps you followed and explaining the behaviors your
observed. Your programs should all work on a rectangular grid of
processors (e.g. 2x4). One of your goals should be to find out how fast
you can multiply a matrix of total size ~8GB.
An N-body simulation is one that studies how bodies (represented by a mass,
a location, and a velocity) move in space (in our case a 2-D space),
according to the laws of (Newtonian) physics. Here is an implementation of
the N-body problem in Matlab (runnable using the free implementation Octave
on any good Linux system): nbod.m and mkbod.m (courtesy of Howard Motteler at UMBC).
Implement an MPI version of this sequential program (your MPI code should
read a file that specifies the problem's parameters). Describe your
data distribution strategies. You'll probably need to implement adaptive
load-balancing. All these should be part of an in-depth performance analysis
of subsequent versions of the code. Design a way in which your program
will output the results in a verifiable and viewable format (best would
probably be a sequence of bitmap images that can be animated as an animated
gif, but don't spend all your time doing this if you have no idea how to do
it!).
Consider a 2-D rectangular area and a number n of random line
segments of some maximum length l in this area. Write the fastest
possible MPI program that returns the list of all intersections as a
triplet segment1, segment2, intersection point, (for large n). Do the
usual performance analysis describing what your different optimizations
were and what worked what didn't.
In this project you will verify the notion that DAG-scheduling heuristics
that based their scheduling decisions on the critical path are indeed more
effective than the standard MaxMin, MinMin, and Sufferage heuristics. This
will be done entirely in simulation (i.e., by constructing Gantt charts,
etc.). You will have to define a DAG generator that generates random DAGs
with given characteristics, and perhaps synthetic DAGs with idiosyncratic
properties to highlight the behavior of the different algorithms. The end
result will be a set of graphs plotting the performance of relevant
scheduling algorithms versus important DAG characteristics and perhaps
number of available processors. An important component of this project is
defining the experimental framework. Trying your own heuristics, or trying some
available in the literature, is obviously a great idea.
henric@hawaii.edu