Accelerating Collapsed Cone Convolution/Superposition Dose Calculation On GPU Using Spatial Decomposition
K Xiao1*, B Zhou2, X. S Hu1, D. Z Chen1, (1) University of Notre Dame, Notre Dame, IN, (2) Altera Corp., San Jose, CAWE-C-108-6 Wednesday 10:30AM - 12:30PM Room: 108
The performance of all known Collapsed Cone Convolution/Superposition (CCCS) dose calculation methods on Graphics Processing Units (GPUs) is bounded by memory throughput. One major cause of this bottleneck is the traditional uniform grid representation of the density volume for human body, which generally contains many homogeneous areas. The common ray traversal process of CCCS within a homogeneous area in a uniform grid repeatedly reads the same density information voxel by voxel, hence increasing the memory accessing workloads. We propose a new spatial decomposition based data structure, called Shell, to improve the CCCS performance on GPU. Our idea is to group voxels in homogeneous areas into rectangular regions such that the density of each such homogeneous region is acquired by one read. An efficient ray traversal method across different regions is developed to reduce the number of memory accesses.
We use a spatial decomposition (e.g., octree) algorithm to adaptively partition a uniform grid density volume into a set of approximately homogeneous regions. Our Shell data structure consists of these regions, each containing its average density and pointers for its neighboring regions. The memory addresses of such pointers are associated with the voxels lying on the region boundaries, which allow any ray traversing in a region to efficiently find the next neighboring region to traverse using table lookup schemes.
We integrated our method with a CCCS program and tested using various clinical cases on NVIDIA GTX570 GPU. The program using Shell runs 1.32X to 1.44X faster than using a uniform grid, with less than 2% variance in the dose calculation results.
The results show our proposed method effectively reduces the memory bottleneck and improves the CCCS performance on GPU. There is a trade off between the speedup and dose accuracy, which is determined by the spatial decomposition methods.
Funding Support, Disclosures, and Conflict of Interest: The research of D.Z. Chen was supported in part by NSF under Grants CCF-0916606 and CCF-1217906.
Add this talk to vcal | ical | Contact Email: