Fast Iterative Cone Beam CT Reconstruction On GPGPU Using OpenCL
L Zhou*, J Chang, Cornell Medical College, New York, NYTH-C-BRA-3 Thursday 10:30:00 AM - 12:30:00 PM Room: Ballroom A
Algebraic reconstruction technique (ART) type algorithms produce superior image quality for CBCT and CT reconstructions over the popular filtered-back-projection based approaches but are too slow for real-time clinical applications. The purpose of this study is to employ the emerging OpenCL architecture to accelerate simultaneous ART (SART) by parallelizing the most time-consuming forward- and back-projections using General-Purpose-Graphics-Processing-Unit (GPGPU).
For each iteration, SART sequentially performs three ray-driven projections (one forward- and two back-projections) for each acquired projection image. To accelerate SART reconstruction, both forward projection and back-projection kernels were scheduled on GPGPU using data parallelism to take full advantage of compute units on GPGPU. The single-work-item-for-single-ray technique was employed as parallelization mechanism. We conducted numerical experiments to test OpenCL-based implementation on a Dell Precision T7500 workstation with two quad-core CPUs and one Nvidia Tesla C2050 GPGPU. Poly-energetic projection data (512x512) for the Mohan 4 MV energy spectrum were simulated each degree for 360 gantry angles for a head-and-neck digital phantom and were fed into the SART algorithms for CBCT reconstruction of 256x256x256 volume. To accelerate poly-energetic projection computation, we partitioned the workloads using task parallelism and data parallelism and scheduled them in a parallel computing ecosystem consisting of CPU and GPGPU using OpenCL only.
The GPGPU computation time including the kernel launch time, kernel running time and data transfer time was 42 ms for forward-projection and 95 ms for back-projection. Each SART iteration took 101 s on GPGPU in comparison to 7195 s on a single-threaded CPU. The proposed method achieved a ~71-times speedup. The relative difference of the reconstructed images between the CPU-based and OpenCL/GPGPU-based implementations was on the order of 0.00001 and virtually indistinguishable.
We have successfully implemented the SART algorithm on GPGPU using OpenCL and significantly reduced the reconstruction time to a level that is almost suitable for real-time clinical applications.