Showing revision 1

How do I run MMC across a cluster in parallel?

As you already know, MMC supports multi-core processors by multi-threaded computing. That means if you launch MMC in a single PC with multi-core CPUs, it will execute several parallel threads to use all the available resources to accelerate the computation. However, this is limited to SMP (shared memory processors) systems like a stand-alone PC. If you have a distributed memory system, such as a cluster, MMC must be launched across the network.

GNU parallel is a free software to help run multiple MMC simulations in parallel on a multi-core PC or distributed system (like a cluster). Here we give examples how to use this tool to do parallel computing over a cluster.

install GNU parallel

To use GNU parallel, you first need to install it as it is not typically shipped by most Unix/Linux distributions. You don't have to be administrator to install it. Here are the commands:

 cd /tmp
 wget http://ftp.gnu.org/gnu/parallel/parallel-20110205.tar.bz2
 tar jxvf parallel-20110205.tar.bz2
 cd parallel-20110205

You can find an executable named "parallel" under the src/ directory. It is a perl script and can by readily copied and used.

You now need to copy the parallel script to a common folder, for example, "~/bin/", and add the path to this folder to your PATH environment variable. To do so, you need to browse this page for instructions.

After updating your PATH variable, start a new console window, and type

  parallel --help

if you see the usage information, then parallel is successfully installed.

prepare your MMC session

Of course, you have to prepare the necessary input files for the MMC session. This include the mesh files and the input files. It is recommended to do this in a separate folder for each simulation, so the output files won't mixed or overwritten. You can find more details in this page.

prepare the server list

GNU parallel can launch jobs across the internet using ssh remote execution. You can specify these servers by the command line option -S or --sshlogin or --sshloginfile. In the later case, you simply create a ASCII file with each row being the hostname of the computer that you want to use. If the computer requires separate login, you can specify the protocol and username. The example host file can be found in the manual.

If you don't know the node computer names in your cluster, you can browse /etc/hosts file where all the hostnames are listed.

run parallel simulations

Then you are now ready to launch a distributed simulation for MMC. Here is a set of example commands (assuming your shell is bash):

 MMCROOT=/home/fangq/Project/mmc
 seq -f "%02g" 1 10 | parallel -j1 --sshloginfile nodes.txt --progress \
    "cd $MMCROOT/examples/meshtest && $MMCROOT/src/bin/mmc -f sph1.inp -s sph1_{} -n 1000000 -b 1 -E 12345678{} -D T"

let me explain each part of the above command. The first line

 MMCROOT=/home/fangq/Project/mmc

defines a variable named "MMCROOT" pointing to the full path of the MMC root directory. This has two uses: 1) you don't have to type it again in the later commands, and 2) using the full path ensures the unique execution path when login remote servers (here we assume all nodes in your cluster shares a common file system).

The first part of the second line

 seq -f "%02g" 1 10

generates a list of strings (in this case, 10). Each string corresponds to a job.

The second part of the second line

 parallel -j1 --sshloginfile nodes.txt --progress ...

launches a parallel job. The "-j1" option tells parallel to run 1 job for each server at a time. Because mmc binary is multi-threaded, so running 1 job per node is sufficient to use all the resources. If you happen to have a "single-threaded" mmc, then you need to use "-j0" which will tell parallel to launch as many job as the CPU cores for a server.

The "--sshloginfile nodes.txt" option tells parallel to read the server name list from a file, nodes.txt. The content of this file is

 fangq@launchpad:[meshtest]$ cat nodes.txt 
 c0-90
 c0-91
 c0-92
 c0-93
 c0-94

each row is a computer name. You many decide how many servers you want to use. Here we use only 5 servers in this example.

The --progress flag is optional, it will print some information while running the jobs.

The third line is the real command for the simulation.

 "cd $MMCROOT/examples/meshtest && $MMCROOT/src/bin/mmc -f sph1.inp -s sph1_{} -n 1000000 -b 1 -E 12345678{} -D T"

you quote it with "" and it will be sent to each server and executed. The "cd $MMCROOT/examples/meshtest" command simply tells parallel to change dir to the test folder after logging on the remote server. Then you can run the mmc command as you normally do.

 $MMCROOT/src/bin/mmc -f sph1.inp -s sph1_{} -n 1000000 -b 1 -E 12345678{} -D T

The only difference is the "{}" placeholder. When running with parallel, each presence of {} will be replaced by an element from the list piped from the "seq" command. In this case, it will be replaced by strings "01" to "10".

To put everything together, you are now telling parallel to run 10 parallel jobs (as defined by seq output) over 5 remote servers (as defined in the nodes.txt). The full job list is (can be printed by inserting --dry-run after parallel command)

 cd .../meshtest && .../bin/mmc -f sph1.inp -s sph1_01 -n 1000000 -b 1 -E 38918101 -D T
 cd .../meshtest && .../bin/mmc -f sph1.inp -s sph1_02 -n 1000000 -b 1 -E 38918102 -D T
 cd .../meshtest && .../bin/mmc -f sph1.inp -s sph1_03 -n 1000000 -b 1 -E 38918103 -D T
 cd .../meshtest && .../bin/mmc -f sph1.inp -s sph1_04 -n 1000000 -b 1 -E 38918104 -D T
 cd .../meshtest && .../bin/mmc -f sph1.inp -s sph1_05 -n 1000000 -b 1 -E 38918105 -D T
 cd .../meshtest && .../bin/mmc -f sph1.inp -s sph1_06 -n 1000000 -b 1 -E 38918106 -D T
 cd .../meshtest && .../bin/mmc -f sph1.inp -s sph1_07 -n 1000000 -b 1 -E 38918107 -D T
 cd .../meshtest && .../bin/mmc -f sph1.inp -s sph1_08 -n 1000000 -b 1 -E 38918108 -D T
 cd .../meshtest && .../bin/mmc -f sph1.inp -s sph1_09 -n 1000000 -b 1 -E 38918109 -D T

where the simulation specified by input file sph1.inp will be executed 10 times; each time, the output files are labeled differently (by -s option) to avoid overwriting, and for each simulation, the random number generator (RNG) seed are set explicitly by the -E option. Here I used "..." to save space, but in real case, you need to make sure those are full path names. Also make sure your seeds are different for all the simulations, otherwise you are just repeating a simulation for 10x, and it is not helpful. You can also set the seed in the sph1.inp file to -1. This will let MMC to automatically generate a seed at run-time based on the system clock. However, you won't know what is the actual seed, and it is not possible for you to repeat your simulation again.

Now let's see how parallel executes this command. Because we have 5 servers and 10 jobs in total, parallel will first launch 5 jobs in parallel to each server (as specified by -j1). When any of the jobs completes, parallel will submit new jobs from the unfinished ones, until no job left. Every time when a job finish, parallel will print the command line output so you can see if there is any error. When all jobs complete, parallel will give a short summary. In the meantime, you will see 10

 fangq@launchpad:[meshtest]$ ls -S sph1_*
 sph1_01.dat  sph1_03.dat  sph1_05.dat  sph1_07.dat  sph1_09.dat
 sph1_01.mch  sph1_03.mch  sph1_05.mch  sph1_07.mch  sph1_09.mch
 sph1_02.dat  sph1_04.dat  sph1_06.dat  sph1_08.dat  sph1_10.dat
 sph1_02.mch  sph1_04.mch  sph1_06.mch  sph1_08.mch  sph1_10.mch

Here are the flux output (.dat file) and detected photon output from the 10 completed jobs. Now you can load the flux data in matlab/octave, and average them to produce the final solution. For the detected photon info, you can simply load them in matlab and concatenate the data section.