Parallel debugging with MPI

BASICS

mpirun can be used to launch non-MPI programs as long as the programs that you run eventually launch LAM/MPI programs. Therefore, you can use mpirun1 to launch debuggers on remote nodes.

Alternatively (and maybe in combination), I often use cout and cerr statements liberally. The downside of this approach is that output from the nodes can overlap and this sort of i/o is slow.

Launching a debugger for each rank: background

Since all ranks except Rank 0 have their stdin tied to /dev/null by default, you must start text-based debuggers (such as gdb) in separate X windows.

For text debuggers, you will need a short shell script to launch an xterm (or whatever your favorite X window program is -- not all systems have xterm -- other terminal programs can be used instead, such as konsole, gnome_terminal, etc.). For example:

    % mpirun N -x DISPLAY run_gdb.csh my_program_name

The shell script run_gdb.csh needs to be in your path (I usually put it in ~/bin), and my_program_name is the name of your LAM/MPI executable. An example run_gdb.csh is shown below:

    #!/bin/csh -f

    echo "Running GDB on node `hostname`"
    xterm -e gdb $*
    exit 0

Also note that the DISPLAY environment variable is exported to the processes nodes with mpirun. This is necessary so that the remote processes know where to send the X display of the xterm.

NB: At this point, I have only had success using local machines of this procedure. I recommend using a single machine for this. The buzzard cluster, is not correctly forwarding X11 because of the NFSed home setup is confusing xauth (I think).

Once gdb starts, you then set whatever breakpoints you need and begin the execution with run in each window followed by the command-line arguments, e.g.:

    run -mpi -f script0

Alternatively, it is sometimes easier to debug one process at a time with the following script:

    #!/bin/csh -f

    if ("$LAMRANK" == "0") then
      gdb $*
    else
      $*
    endif
    exit 0

Send suggestions, questions, and feedback to WEINBERG at ASTRO dot UMASS dot EDU.
Documentation generated at Fri Mar 26 00:35:11 2010 by doxygen