GPU-Accelerated Polyenergetic DRR Generation Based On Data Parallelism and Task Parallelism with a Dispatcher Using OpenCL: Effect of the Numbers of Tasks and Energies
L Zhou*, K Chao, J Chang, New York Weill Cornell Medical Ctr, New York, AASU-E-J-2 Sunday 3:00PM - 6:00PM Room: Exhibit Hall
To improve the performance of parallel processing of multi-energetic digitally reconstructed radiograph (DRR) generation using a task-overlap strategy on heterogeneous platforms.
A segmented 512x512x223 head-neck phantom was used to generate 512x512 polyenergetic DRR based on Mohan4 and Mohan6 spectrums containing 16 and 24 energy bins, respectively. The DRR formation for each energy bin comprises three steps: (1) phantom conversion, (2) line integral and (3) exponential and weighting of projection. The parallel computing ecosystem consisted of one 8-core CPU and one general purpose graphics processing unit (GPGPU). We used Open Computing Language (OpenCL) to decompose the low-degree parallel and serial workloads into multiple tasks on CPU using task parallelism, and partition the high-degree parallel workloads on GPGPU using data parallelism. Two sequential task partitions for the first DRR formation were tested: (A) Step1 as Task1 on CPU, Step2 as Task2 on GPU and Step3 as Task3 on CPU; and (B) Step1 as Task1 on CPU and Step2 and Step3 as Task2 on GPU. The subsequent DRR generation does not need Step 1 so Step1 as Task1 was excluded in each partition. A task-overlap method driven by a dispatcher was also implemented using regular single-threaded host program to further improve the performance.
For the first DRR formation and Partition A, the task-overlap strategy was 5.8 and 6.5 times faster than sequential method for 16-energy-bin Mohan4 and 24-energy-bin Mohan6 spectrums, respectively. For Partition B, the speedup of the task-overlap strategy was 5.2 and 5.5 times for Mohan4 and Mohan6 spectrums, respectively. For the following DRR formation, the speedups were 1.16 and 1.165 times in two-task scenario for Mohan4 and Mohan6 spectrums.
The task-overlap strategy significantly improves the performance for parallel processing of multi-energetic DRR generation. The parallelism is increased when more energies and tasks are driven by the dispatcher.