Posted by & filed under Beginner MPI.

In this lesson, I will show you a basic MPI Hello World application and also discuss how to run an MPI program. The lesson will cover the basics of initializing MPI and running an MPI job across several processes. This lesson is intended to work with installations of MPICH2 (specifically 1.4). If you have not installed MPICH2, please refer back to the installing MPICH2 lesson.

MPI Hello World

First of all, the source code for this lesson can be downloaded here or can be viewed/cloned on GitHub. Download it, extract it, and change to the example directory. The directory should contain three files: makefile, mpi_hello_world.c, and run.perl. Here is the output from my terminal for downloading and extracting the example code.

>>> wget http://www.mpitutorial.com/lessons/mpi_hello_world.tgz
--2011-06-20 19:33:54-- http://www.mpitutorial.com/lessons/mpi_hello_world.tgz
Resolving www.mpitutorial.com... 50.56.34.184
Connecting to www.mpitutorial.com|50.56.34.184|80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1162 (1.1K) [application/x-gzip]
Saving to: `mpi_hello_world.tgz'
100%[============================>] 1,162 --.-K/s in 0s
2011-06-20 19:33:54 (222 MB/s) - `mpi_hello_world.tgz' saved [1162/1162]
>>> tar -xzf mpi_hello_world.tgz >>> cd mpi_hello_world >>> ls makefile mpi_hello_world.c run.perl

Open the mpi_hello_world.c source code. Below are some excerpts from the code.

#include <mpi.h>;
 
int main(int argc, char** argv) {
  // Initialize the MPI environment
  MPI_Init(NULL, NULL);
 
  // Get the number of processes
  int world_size;
  MPI_Comm_size(MPI_COMM_WORLD, &world_size);
 
  // Get the rank of the process
  int world_rank;
  MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);
 
  // Get the name of the processor
  char processor_name[MPI_MAX_PROCESSOR_NAME];
  int name_len;
  MPI_Get_processor_name(processor_name, &name_len);
 
  // Print off a hello world message
  printf("Hello world from processor %s, rank %d"
         " out of %d processors\n",
         processor_name, world_rank, world_size);
 
  // Finalize the MPI environment.
  MPI_Finalize();
}

You will notice that the first step to building an MPI program is including the MPI header files with #include <mpi.h>. After this, the MPI environment must be initialized with:

MPI_Init(int *argc, char ***argv)

During MPI_Init, all of MPI’s global and internal variables are constructed. For example, a communicator is formed around all of the processes that were spawned, and unique ranks are assigned to each process. Currently, MPI_Init takes two arguments that are not necessary, and the extra parameters are simply left as extra space in case future implementations might need them.

After MPI_Init, there are two main functions that are called. These two functions are used in almost every single MPI program that you will write.

MPI_Comm_size(MPI_Comm communicator, int* size)

MPI_Comm_size returns the size of a communicator. In our example, MPI_COMM_WORLD (which is constructed for us by MPI) encloses all of the processes in the job, so this call should return the amount of processes that were requested for the job.

MPI_Comm_rank(MPI_Comm communicator, int* rank)

MPI_Comm_rank returns the rank of a process in a communicator. Each process inside of a communicator is assigned an incremental rank starting from zero. The ranks of the processes are primarily used for identification purposes when sending and receiving messages.

A miscellaneous and less-used function in this program is:

MPI_Get_processor_name(char* name, int* name_length)

MPI_Get_processor_name obtains the actual name of the processor on which the process is executing. The final call in this program is:

MPI_Finalize()

MPI_Finalize is used to clean up the MPI environment. No more MPI calls can be made after this one.

Running MPI Hello World

Now compile the example by typing make. My makefile looks for the MPICC environment variable. If you installed MPICH2 to a local directory, set your MPICC environment variable to point to your mpicc binary. The mpicc program in your installation is really just a wrapper around gcc, and it makes compiling and linking all of the necessary MPI routines much easier.

>>> export MPICC=/home/kendall/bin/mpicc
>>> make
/home/kendall/bin/mpicc -o mpi_hello_world mpi_hello_world.c

After your program is compiled, it is ready to be executed. Now comes the part where you might have to do some additional configuration. If you are running MPI programs on a cluster of nodes, you will have to set up a host file. If you are simply running MPI on a laptop or a single machine, disregard the next piece of information.

The host file contains names of all of the computers on which your MPI job will execute. For ease of execution, you should be sure that all of these computers have SSH access, and you should also setup an authorized keys file to avoid a password prompt for SSH. My host file looks like this.

>>> cat host_file
cetus1
cetus2
cetus3
cetus4

For the run script that I have provided in the download, you should set an environment variable called MPI_HOSTS and have it point to your hosts file. My script will automatically include it in the command line when the MPI job is launched. If you do not need a hosts file, simply do not set the environment variable. Also, if you have a local installation of MPI, you should set the MPIRUN environment variable to point to the mpirun binary from the installation. After this, call ./run.perl mpi_hello_world to run the example application.

>>> export MPIRUN=/home/kendall/bin/mpirun
>>> export MPI_HOSTS=host_file
>>> ./run.perl mpi_hello_world
/home/kendall/bin/mpirun -n 4 -f host_file ./mpi_hello_world
Hello world from processor cetus2, rank 1 out of 4 processors
Hello world from processor cetus1, rank 0 out of 4 processors
Hello world from processor cetus4, rank 3 out of 4 processors
Hello world from processor cetus3, rank 2 out of 4 processors

As expected, the MPI program was launched across all of the hosts in my host file. Each process was assigned a unique rank, which was printed off along with the process name. As one can see from my example output, the output of the processes is in an arbitrary order since there is no synchronization involved before printing.

Notice how the script called mpirun. This is program that the MPI implementation uses to launch the job. Processes are spawned across all the hosts in the host file and the MPI program executes across each process. My script automatically supplies the -n flag to set the number of MPI processes to four. Try changing the run script and launching more processes! Don’t accidentally crash your system though. :-)

Now you might be asking, “My hosts are actually dual-core machines. How can I get MPI to spawn processes across the individual cores first before individual machines?” The solution is pretty simple. Just modify your hosts file and place a colon and the number of cores per processor after the host name. For example, I specified that each of my hosts has two cores.

>>> cat host_file
cetus1:2
cetus2:2
cetus3:2
cetus4:2

When I execute the run script again, voila!, the MPI job spawns two processes on only two of my hosts.

>>> ./run.perl mpi_hello_world
/home/kendall/bin/mpirun -n 4 -f host_file ./mpi_hello_world
Hello world from processor cetus1, rank 0 out of 4 processors
Hello world from processor cetus2, rank 2 out of 4 processors
Hello world from processor cetus2, rank 3 out of 4 processors
Hello world from processor cetus1, rank 1 out of 4 processors

Up Next

Now that you have a basic understanding of how an MPI program is executed, it is now time to learn fundamental point-to-point communication routines. In the next lesson, I cover basic sending and receiving routines in MPI. Feel free to also examine the beginner MPI tutorial for a complete reference of all of the beginning MPI lessons.

Having trouble? Confused? Feel free to leave a comment below and perhaps I or another reader can be of help.

Recommended books

View all recommended books here.

12 Responses to “MPI Hello World”

  1. mwood

    Hi Wes,

    Really like the site so far! I am trying to run a simple MPI program on two computers, one at my house, the other at school. I have ssh’d into the school computer, set secret password, started mpd, and transferred the program file over. When I try to run mpirun with the above options, specifically the -f flag, it spits back errors and brings up the help screen. Seems like mpirun does not like the way I am trying to use the -f flag, perhaps.

    At any rate, I can run the program locally just fine, but have not been able to get it working over ssh. Any thoughts?

    Thanks for your time

    Reply
    • Wes

      Hey Matt, apologies for the really late reply to your question. I have never tried to run an MPI program over a local and remote computer. I am assuming that MPI is getting confused on finding the executable though. Normally people will execute MPI programs over a networked file system of some sort so that all of the processes can find the executable in the same place.

      Reply
  2. David

    First: Thanks for putting this tutorial together.

    But…
    I’ve set up a 4 node system using Raspberry Pi’s. The basic testing (hostname and example/cpi) work fine.

    When I run the hello world program on mpmaster, I get this:
    ————————-
    mpirun was unable to launch the specified application as it could not access
    or execute an executable:

    Executable: ./mpi_hello_world
    Node: mpnode2

    while attempting to start process rank 1.
    ———
    INFO:
    from running env:
    MPI_HOSTS=/home/sysop/dev/hostsfile

    sysop@mpmaster:~/dev$ cat hostsfile
    mpmaster
    mpnode2
    mpnode3
    mpnode4

    And I can ssh without a passphase into the other three nodes.

    One other “strange” thing: I am using MPICH2 1.5. It doesn’t like the -f parameter. I had to change that to -machinefile. Don’t know why, just the -f doesn’t work.

    I was reading somewhere else and it said to scp the executable to all the nodes. Is that correct? Does the executable have to be on all the nodes?

    Reply
    • Wes

      Hey David, that is very strange the -f flag doesn’t work for you. I just installed MPICH2 1.5 and the -f flag still works for me. I will update all of my code to use the -machinefile flag instead.

      Regarding your question – Yes, mpirun needs to have access to the executable on all of the nodes. I’m not positive if copying the file to all of the nodes will work, but I would recommend trying that (and make sure to copy the executable to the same location across all nodes). If that doesn’t work, I would recommend setting up NFS on the nodes or setting up PVFS (http://www.pvfs.org/). Unfortunately I have never set up a networked file system and only have limited experience setting up a parallel file system on clusters. Also, in your case, I believe you are restricted to using only the SD drive on your Raspberry Pi, right? I’m really interested to know if that presents any restrictions in setting up a networked or parallel file system of some sort.

      Keep me updated on your progress, and sorry I couldn’t have been more help. I’m really interested to know how your Raspberry Pi cluster works out!

      Reply
  3. Rahul

    Hey Wes,,if you are still answering questions on this site…can i run the hello world mpi program on a virtual cluster ..say a 2 instances of linux installed On virtual box..also iam having trouble with the MPICC environment variable…can you help me out

    Reply
    • Wes

      Hello Rahul, the MPI programs should still be able to be executed on virtual machines that have linux on them. Just make sure that you have set up your hosts file properly and that you can SSH into the Linux machine via a key (rather than typing in a password). Let me know if there is anything else I can do to help

      Reply
      • Rahul

        Hey wes,i created a ssh key to login into the other machine and i was able to ssh into it..but when i run the program i get this following error
        /usr/local/bin/hydra_pmi_proxy: No such file or directory .

        Reply
  4. Ben

    So far, so good, with some slight modifications. I thought I would share a couple of minor things I had to do differently…
    1) On the Linux cluster I’m using, there are some hosts that I can connect to and others that I simply can’t!
    2) Here’s a site that was helpful for me figuring out password-less ssh access to the remote hosts, http://andrew.triumf.ca/pssh/linux-ssh.html.
    3) Each time I log into a remote machine to run mpi processes, I have to enter “export MPI_HOSTS=host_file”.
    4) I had to modify the perl scripts to avoid the same problem with “-f” that David had (Open MPI version 1.4.3). Just edited the $ARGV inputs so they correspond to the number of nodes used and the name of the program to run, and the last two lines changed to “$mpirun -n $program_nodes $hosts ./$program_to_run”". Now I enter “./run.perl 5 send_recv” for example to run the send_recv program using 5 processes (nodes).
    5) Last, maybe obvious to some, but in the host_file file, you need the full host name, such as “hydra4.some.hostname.edu”.
    Hope this helps some folks out.

    Reply
  5. Ben

    …continued…
    on point (4)…also have to modify line 17 so that “$host” returns “-hostfile host_file” in the output. Check the required syntax of open MPI 1.4.3 by typing “mpirun” and you see that there is no argument “-f” but that it does require “-hostfile” followed by the absolute or relative path to “host_file”.

    Reply

Leave a Reply

  • (will not be published)