Encrypted login | home

Program Information

Fast Iterative Cone Beam CT Reconstruction On GPGPU Using OpenCL


L Zhou

L Zhou*, J Chang, Cornell Medical College, New York, NY

TH-C-BRA-3 Thursday 10:30:00 AM - 12:30:00 PM Room: Ballroom A

Purpose:
Algebraic reconstruction technique (ART) type algorithms produce superior image quality for CBCT and CT reconstructions over the popular filtered-back-projection based approaches but are too slow for real-time clinical applications. The purpose of this study is to employ the emerging OpenCL architecture to accelerate simultaneous ART (SART) by parallelizing the most time-consuming forward- and back-projections using General-Purpose-Graphics-Processing-Unit (GPGPU).

Methods:
For each iteration, SART sequentially performs three ray-driven projections (one forward- and two back-projections) for each acquired projection image. To accelerate SART reconstruction, both forward projection and back-projection kernels were scheduled on GPGPU using data parallelism to take full advantage of compute units on GPGPU. The single-work-item-for-single-ray technique was employed as parallelization mechanism. We conducted numerical experiments to test OpenCL-based implementation on a Dell Precision T7500 workstation with two quad-core CPUs and one Nvidia Tesla C2050 GPGPU. Poly-energetic projection data (512x512) for the Mohan 4 MV energy spectrum were simulated each degree for 360 gantry angles for a head-and-neck digital phantom and were fed into the SART algorithms for CBCT reconstruction of 256x256x256 volume. To accelerate poly-energetic projection computation, we partitioned the workloads using task parallelism and data parallelism and scheduled them in a parallel computing ecosystem consisting of CPU and GPGPU using OpenCL only.

Results:
The GPGPU computation time including the kernel launch time, kernel running time and data transfer time was 42 ms for forward-projection and 95 ms for back-projection. Each SART iteration took 101 s on GPGPU in comparison to 7195 s on a single-threaded CPU. The proposed method achieved a ~71-times speedup. The relative difference of the reconstructed images between the CPU-based and OpenCL/GPGPU-based implementations was on the order of 0.00001 and virtually indistinguishable.

Conclusions:
We have successfully implemented the SART algorithm on GPGPU using OpenCL and significantly reduced the reconstruction time to a level that is almost suitable for real-time clinical applications.

Contact Email