As the name indicates, this example runs a short simulation for the purpose of testing the command line options build with MCX.
To run this example, you need to first compile mcx binary. Then you can choose to run the following shell scripts:
A script to call mcx to print the GPU information only. Run this script and find out how many GPUs you have in your graphics card and the related parameters (memory, cores etc)
This will run MCX for a simple homogeneous medium as the example used in the Fang2009 paper. It will launch only ~1 million photons and take roughly half a second.
This script does the same thing as run_qtest.sh, except it let mcx to print messages to a log file rather than printing on the screen (so called silent mode)
In this example, we benchmark the speed of MCX code with different combinations of compilation options, number of photons atomic/non-atomic memory access.
The detailed report can be found in the paper Fang2009 Section 3.4 and Fig. 7.
To run this example, you need to first compile the mcx binary, and then run "runspeedbench.sh > speed.log" command. This may take up to 10 minutes to run all the tests. To make sure nothing goes wrong, it is probably a good idea to run runspeedbench.sh without the redirection first, and make sure everything compiles and runs properly.
When you finish the simulation, run "genreport.sh speed.log" to get a tabulated report for all conditions. From this, you can reproduce the data used for Fig. 7 in Fang2009. To interpret the output, the first column is the executation time (in ms), the 2nd column is the total photon number simulated, the 3rd column is the specified photon moves per thread, the last column is the thread number.
For some simulations with more threads, you may experience the "kernel launch timed-out" error. Please read the doc/FAQ.txt to find out why. If you have configured a dedicated GPU, you can edit the runspeedbench.sh script and uncommend the test settings for the dedicated cases.
In this example, we validate MCX code with a homogeneous medium defined by a 60x60x60 uniform grid. The medium has mua=0.005/mm, musp=1/mm, anisotropy g=0.01 and refraction index n=1 (without reflection) or n=1.37 (with reflection).
The detailed report can be found in the paper Fang2009 Section 3.2 and Fig. 5.
To run this example, you need to first compile the mcx binary, and then run the run_validation.sh and run_validation_b.sh scripts to run simulations for n=1 and n=1.37, respectively. Then you need to start GNU Octave or Matlab, and run plotsimudata.m script to reproduce Fig. 5 in Fang2009.
In this example, we qualitatively assess the severity of the race condition in the non-atomic version of MCX.
To do this test, we use a register variable in each thread to accumulate the number of events that a thread is trying to write to the global memory. The sum of these values indicates the total "true" write events. On the other hand, we accumulate 1 into the global memory when a write event happens. Thus, from the output global memory space, we can sum these counts to get the "actually recorded" write events. The differences between these two sums are due to the write conflict in the race-conditions.
The detailed report can be found in the paper Fang2009 Section 3.2 and Fig. 4.
To run this example, you need to compile the mcx binary first with "make racing", and then run the "runmiscount.sh > miscnt.log" command. This will test the missing event ratios for a range of scattering coefficients at different thread numbers (note, the absorption coefficient will not have influences to the race condition). If you redirect the output of runmiscount.sh to a text file, then you can use script genreport.sh to get a tabulated report from the log file.
Although we have shown that for biological tissues, the error due to non-atomic memory write is negligible, for cases where the accuracy is critically important, we also proposed an approach to further reduce the impact due to race conditions in the non-atomic version of MCX (see paragraph 4 in Section 4 of Fang2009). Here we call this approach the "bubble-mode".
In the "bubble mode", user specifies a radius using the -R flag. When a photon is within this specified radius from the source, we accumulate the absorbed energy to a register variable rather than writing to the global memory. This example is to show to use this flag and what kind of improvement you are expecting.
To run this example, you need to first compile the mcx binary, and then run the run_bubble.sh script to produce simulation results for bubble size at 0mm, 3mm and 5mm, respectively. Then you start GNU Octave or Matlab, and run the mcxskiptest.m script and compare the solutions. You need to be aware that by setting the bubble size above 3mm, you effectively avoid over 99% of the race conditions.
Because the atomic/non-atomic difference is so tiny, you have to zoom in with a great magnification to see the differences. Generally, the bigger bubble will give you a slightly smaller solution, because it makes the denominator slightly bigger in Eq. (1) of Fang2009 by avoiding the race conditions.
In this example, we test the speed performance of the two RNGs, MT and LL5, used in MCX.
To start, you first need to run script runbench.sh and the speed information will be printed.
[Fang2009] Qianqian Fang and David A. Boas, "Monte Carlo Simulation of Photon Migration in 3D Turbid Media Accelerated by Graphics Processing Units," Opt. Express, vol. 17, issue 22, pp. 20178-20190 (2009)