Model-based image reconstruction (MBIR) techniques have the potential to generate high quality images from noisy measurements and a small number of projections which can reduce the x-ray dose in patients. These MBIR techniques rely on projection and backprojection to refine an image estimate. One of the widely used projectors for these modern MBIR based technique is called branchless distance driven (DD) projection and backprojection. While this method produces superior quality images, the computational cost of iterative updates keeps it from being ubiquitous in clinical applications. In this paper, we provide several new parallelization ideas for concurrent execution of the DD projectors in multi-GPU systems using CUDA programming tools. We have introduced some novel schemes for dividing the projection data and image voxels over multiple GPUs to avoid runtime overhead and inter-device synchronization issues. We have also reduced the complexity of overlap calculation of the algorithm by eliminating the common projection plane and directly projecting the detector boundaries onto image voxel boundaries. To reduce the time required for calculating the overlap between the detector edges and image voxel boundaries, we have proposed a pre-accumulation technique to accumulate image intensities in perpendicular 2D image slabs (from a 3D image) before projection and after backprojection to ensure our DD kernels run faster in parallel GPU threads. For the implementation of our iterative MBIR technique we use a parallel multi-GPU version of the alternating minimization (AM) algorithm with penalized likelihood update. The time performance using our proposed reconstruction method with Siemens Sensation 16 patient scan data shows an average of 24 times speedup using a single TITAN X GPU and 74 times speedup using 3 TITAN X GPUs in parallel for combined projection and backprojection.
|Journal||Journal of Imaging Science and Technology|
|State||Published - Jan 1 2017|