Release Notes for Monte Carlo eXtreme - OpenCL v2023 (1.0)

code name: Eternity, released on September 28, 2023

Click this link to download MCX-CL/MCXLAB-CL v2023

Acknowledgement: This software release is made possible with the funding support from the NIH/NIGMS under grant R01-GM114365.

: 1. What's New
: 2. Introduction
: 3. System requirements
: 4. Reference

1. What's New

The last official MCX-CL release was v2020 nearly 3 years ago. Many new features have been implemented in MCX/MCX-CL since then. Some of the key updates made to the v2023 version of MCX-CL are listed below

MCX-CL v2023 is significantly faster than the previous releases due to two major updates. First, on NVIDIA GPUs, an native PTX-based atomic function is used to gain 40% acceleration on NVIDIA hardware. Secondly, a highly efficient DDA (Digital Differential Analyzer) ray-marching algorithm was implemented to MCX and ported to MCX-CL. This also brings up to 40% speedup in certain benchmarks such as cube60. Moreover, MCX-CL v2023 provides an official Python mcx module (pmcxcl) to run stream-lined MCX simulations in Python, offering intuitive mcxlab-like interface. During the past year, a large effort was committed to build automated continuous integration (CI) pipelines using Github Action, allowing us to automatically create, test, and distribute portable packages across Linux, Windows, MacOS OSes, and MATLAB, GNU Octave and Python environments.

Starting in MCX-CL v2023, we have completed the migration from MCX-specific binary output formats (.mc2/.mch) to human-readable, extensible and future-proof JSON-based portable data formats defined by the NeuroJSON project. The NeuroJSON project aims at simplify scientific data exchange using portable data formats that are readable, searchable, shareable, can be readily validated and served in the web and cloud. The NeuroJSON project is also led by MCX's author, Dr. Qianqian Fang, funded by the US NIH U24-NS124027 grant.

As a result of this migration, the MCX-CL executable's default output formats are now .jnii for volumetric output data, and .jdat for detected photon/trajectory data. Both data formats are JSON compatible. Details on how to read/write these data files can be found below.

In summary, v2023 is packed with exciting updates, including

pmcxcl (https://pypi.org/project/pmcxcl/) - a Python interface to mcxcl
New continuous integration (CI) and testing system based on Github Action
CMake based building environment
Use NVIDIA PTX-based float atomicadd to gain >30% speedup
Efficient DDA (Digital Differential Analyzer) ray-marching algorithm, gain 40% speedup
Fixed loss of accuracy near the source (fangq/mcx#41)
Trajectory-only output with debuglevel=T
Adopted standardized NeuroJSON JNIfTI and JData formats to ease data exchange

The detailed updates can be found in the below change log

2023-09-13 [0722c09] support ASCII escape code in Windows terminals
2023-08-27 [080855a] fix pmcxcl gpu hanging bug, import utils from pmcx, v0.0.12
2023-08-27 [e342756] port negative pattern support from mcx
2023-08-27 [7a7b456] update debuglevel=R for RNG testing
2023-08-25 [b8095a8] add mcxlab('version'), use MCX_VERSION macro, update msg, port many bug fixes from mcx
2023-08-04 [9b4d76f] fix boundary condition handling, port from mcx
2023-07-30 [50940de] update zmatlib to 0.9.9, use miniz, drop zlib
2023-07-30 [370128a] support compileropt and kernelfile in mcxlab/pmcxcl, fix omp
2023-07-30 [1b28a6d] fix windows gomp multiple linking bug
2023-07-29 [4916939] automatically build and upload Python module via github action
2023-07-28 [09c61b7] bump pmcxcl version, fix windows pypi version check
2023-07-25 [4b5606e] port python module action scripts from mcx
2023-07-25 [f92425e] add initial draft of pmcxcl for Python, add cmake
2023-07-25 [3c2c735] update missing output structs
2023-07-23 [87f3c0e] allow early termination if -d 3 or cfg.issavedet=3 is set
2023-07-23 [c8ccc04] support outputtype=length/l for saving total path lengths per voxel
2023-07-23 [57c3b9b] fix incorrect comment regarding gaussian src, fangq/mcx#165
2023-07-23 [7d5bd16] update mcxplotphoton to mcx
2023-07-23 [1cafd3e] allow to get fluence in non-absorbing medium, fangq/mcx#174
2023-07-23 [8dbc397] update neurojson repo paths
2023-07-23 [0d780bd] support trajectory only output with debuglevel=T
2023-07-23 [e4ade36] fix replay test result matching
2023-07-03 [b57b157] fix macos error
2023-07-02 [99a4486] port zmat ci changes to mcxcl
2023-06-03 [980cc9f] enable doxygen documentation via make doc
2023-05-17 [a25f302] allow device query to handle open-source AMD ocl runtime, fix #44
2023-03-12 [c9697a9] update action from mmc to mcxcl
2023-03-12 [11938a3] copy mmc's merged action script
2023-03-07 [ee7e940] add github action
2022-10-08 [ae7f6e3] update version to 1.0
2022-10-03 [695d2f3] run test on all platforms
2022-10-03 [85beae7] revert debugging information, fix cyclic bc for mac
2022-10-02 [53ec9e7] attempt to fix cyclic bc
2022-10-02 [263abb2] test cyclic bc
2022-10-02 [6c588fa] debug cyclic bc
2022-10-02 [fc481ba] debug cyclic test on the mac
2022-10-02 [8bdc33e] disable zmat and file IO functions in mex/oct targets
2022-10-02 [c6e280a] fix CI error after using voxel index dda
2022-10-01 [24bf948] allow disabling PTX based f32 atomicadd
2022-10-01 [2277f7f] using nvidia native atomicadd for float add 30% speedup
2022-09-29 [f0d0bad] update skipvoid
2022-09-29 [b3d94d2] update to match mcx flipdir use
2022-09-23 [1931489] adopt voxel-index based dda, like fangq/mcx b873f90c6
2022-09-21 [d9e5eaa] add jammy to ci
2022-09-21 [3e71eac] making double-buffer finally work to solve fangq/mcx#41, thanks to @ShijieYan
2022-09-21 [2216686] sync mcxcl's json2mcx with the latest version from mcx
2022-05-21 [39913fc] complete reformat of source code using astyle with 'make pretty'
2022-05-21 [f7d69d5] sync mcx2json from mcx repo
2022-01-27 [2559135] sync mcxdetphoton.m with mcx, move location
2021-10-29 [867314a] Update README.md
2021-06-23 [818f3a1] set maximum characters to read for fscanf, fix #41
2021-06-23 [38d56a6] handle empty detector array in json2mcx
2021-05-26 [4c18305] fix a few minor memory leaks based on valgrind output, still leaks on nvidia GPUs
2021-05-15 [bbee39e] save volume in jdata format by default
2021-02-26 [8eba2cd] add MATLAB_MEX_FILE in the makefile
2021-02-24 [8f793a0] use memcpy to avoid strncpy warning from gcc 10
2021-02-24 [89b46a9] update windows compilation commands
2021-02-24 [49c6217] allow compiling GNU Octave mex on windows
2021-02-07 [e9d2ce7] following Debian script suffix rule
2020-09-06 [a39f271] update numeral version number
2020-09-06 [6ea10b2] add back wiki versions of the README file for easy website update
2020-09-04 [de59205] patch mcxcl for fangq/mcx#103 and fangq/mcx#104
2020-09-01 [9b5431e] sync with mcx, add cubesph60b to match example/benchmark2
2020-08-31 [7e7eb06] flush output for mcxlabcl
2020-08-31 [6079b17] fix pattern3d demo script bug
2020-08-31 [7b36ee8] fix photon sharing mcxlab crash
2020-08-30 [f498e29] fix typo
2020-08-29 [b001786] update mcxlabcl, update ChangeLog

2. Introduction

Monte Carlo eXtreme (MCX) is a fast physically-accurate photon simulation software for 3D heterogeneous complex media. By taking advantage of the massively parallel threads and extremely low memory latency in a modern graphics processing unit (GPU), this program is able to perform Monte Carlo (MC) simulations at a blazing speed, typically hundreds to a thousand times faster than a single-threaded CPU-based MC implementation.

MCX-CL is the OpenCL implementation of the MCX algorithm. Unlike MCX which can only be executed on NVIDIA GPUs, MCX-CL is written in OpenCL, the Open Computing Language, and can be executed on most modern CPUs and GPUs available today, including Intel and AMD CPUs and GPUs. MCX-CL is highly portable, highly scalable and is feature-rich just like MCX.

MCX-CL shares nearly identical command line options and input file formats as MCX. The simulation settings designed for MCX can be simply used for MCX-CL simulations without major modifications. As of v2020, MCX-CL contains almost all features currently supported in MCX (with additional support of AMD/Intel CPUs and GPUs as well as JIT compilation and -J flag).

Similar to MCXLAB, MCXLAB-CL is the MATLAB mex version of the MCXCL software. It can be directly called inside MATLAB and GNU Octave. It also uses the same input structure settings as in MCXLAB, making both packages highly compatible. One can even define USE_MCXCL=1 in MATLAB command window, and all MCXLAB calls will call MCXLAB-CL automatically.

3. System requirements

By default, MCX-CL uses OpenCL-based simulations to utilize all GPUs and CPUs installed on your system. If you have a GPU (NVIDIA, AMD or Intel), the OpenCL support is typically installed if you have correctly installed the latest version of the graphics driver. Please verify that the OpenCL library (libOpenCL.so* on Linux, OpenCL.dll on Windows or /System/Library/Frameworks/OpenCL.framework/Versions/A/OpenCL on the Mac) must exist in your system.

Generally speaking, AMD and NVIDIA high-end dedicated GPU performs the best, about 20-60x faster than a multi-core CPU; Intel's integrated GPU is about 3-4 times faster than a multi-core CPU.

In addition, MMC has been fully tested with the open-source OpenCL runtime pocl (http://portablecl.org/) on the CPU. To install pocl on a Ubuntu/Debian system, please run

  sudo apt-get install pocl-opencl-icd

Step-by-step installation guide can be found in this link.

4. Reference

Leiming Yu, Fanny Nina-Paravecino, David Kaeli, Qianqian Fang, "Scalable and massively parallel Monte Carlo photon transport simulations for heterogeneous computing platforms," J. Biomed. Opt. 23(1), 010504 (2018).