DOWNLOAD PRESENTATIONWATCH VIDEOKirchhoff 3-D prestack depth migration is a widely used and computationally intensive seismic imaging approach that is highly parallelizable. There is parallelism both in the input seismic data and in the output image. At this workshop in 2008, we presented an algorithm with parallelism from the level of many serial Grid Engine tasks down to individual CUDA threads running on Nvidia GPUs. The performance of this algorithm gave clusters with GPUs a significant price-to- performance advantage over standard CPU clusters.
We recently revisited our Kirchhoff code with two computational goals, hardware portability and increased performance, as well as the geophysical goal of generalized gather creation. To address the possibility of hardware portability, we first tested the OCCA unified parallel programming model by re-coding our CUDA computational kernels in OCCA. This enabled our kernels to run on Nvidia or AMD GPUs and multi-core CPUs using CUDA, OpenCL or OpenMP. Because of the similarities between the OCCA and CUDA programming models, we were able to port and benchmark the code in less than a month without losing any production performance. In fact, the run-time compilation feature of OCCA generally resulted in better optimized kernels, since the compilers have knowledge of user parameters and the execution hardware, effectively tailoring the kernels for each production run.
With this portability success, we next explored two dimensions in the design of the algorithm to address the possibility of increasing the migration performance on newer GPU architecture. We explored (1) the fraction of the GPU memory used for input seismic data (as opposed to the output image samples) and (2) the trade-off of computation vs memory usage. We found significant im- provements in performance when we shifted the balance towards more input data and less output data in memory, and switched to a more compute intensive approach.
We have coded the new algorithm in OCCA and added the ability to generate generalized image gathers (e.g., offset gathers, offset vector gathers and reflection angle gathers). In this talk, we will discuss our porting, design experiences, and compare the advantages and performance of the old and new algorithms.