Loading…
Back To Schedule
Wednesday, March 4 • 16:25 - 16:45
Coarse-grained Seismic Algorithms: “Using Performance Tools to Analyze the Performance of an 
MPI+OpenMP Reverse Time Migration Code,” Sri Raj Paul, Rice University; Mauricio Araya-Polo, Shell; John Mellor-Crummey, Rice University

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Applications to analyze seismic data or simulate oil reservoirs routinely employ scalable parallel systems to produce timely results. Today, applications for such systems typically use the MPI everywhere model. With increasing core counts on multicore and manycore chips, this programming model is becoming less viable because of the decreasing memory per core. To fully exploit emerging processor architectures, programs will need to employ threaded parallelism within a node in addition to MPI. OpenMP is a mature threaded programming model that is widely available today with implementations in both vendor and open source compilers. For that reason, MPI+OpenMP is the recommended programming model for systems such as the forthcoming supercomputer at NERSC with manycore compute nodes based on Intel’s Phi Knights Landing (KNL) chips.

Changing to a hybrid MPI+OpenMP programming model is a challenge for programmers. Tuning MPI+OpenMP programs for high performance is difficult without performance analysis tools to identify bottlenecks and uncover opportunities for improvement. In this talk, we describe our experience applying performance tools to gain insight into an MPI+OpenMP code that performs reverse time migration (RTM) on a cluster of multicore processors. Our goals for this analysis were (1) to identify any performance bottlenecks present in the code and opportunities for improvement, and (2) to assess the capabilities of available tools for analyzing the performance of such applications. We were looking to the tools for help evaluating the domain decomposition strategy used to partition the work among nodes, evaluating the use of threaded parallelism on a node, and evaluating functional unit utilization of individual cores.

In this work we identified opportunities for improvements of the application by reducing load imbalance resulting from the domain decomposition, and idleness among OpenMP worker threads during MPI communication. The combination of Rice’s HPCToolkit and TACC’s PerfExpert enabled us to gain insight into the distributed memory parallel performance, threading performance, as well as bottlenecks within a core.

Speakers
MA

Mauricio Araya

Senior Researcher Computer Science, Shell Intl. E&P Inc.
avatar for John Mellor-Crummey

John Mellor-Crummey

Professor, Computer Science, Rice University
John Mellor-Crummey’s research focuses on software for high performance and parallel computing, including compilers, tools, and runtime libraries for multicore processors and scalable parallel systems. His current work is principally focused on performance tools for scalable parallel... Read More →


Wednesday March 4, 2015 16:25 - 16:45 CST
BioScience Research Collaborative 6500 Main Street, Houston, Tx 77005

Attendees (0)