DOWNLOAD PRESENTATIONWATCH VIDEOStencil computation is the core of most wave-based Seismic Imaging applications. The stencil solver alone -depending on the governing equation- can represent up to 90\% of the overall elapsed time, where the efficient use of the memory hierarchy is a mayor concern~\cite{Araya-Polo2008}. Therefore, source code analysis and improvements development that can fully take advantage of modern architectures is crucial. Those tasks can be assisted by performance models. Performance models help exposing bottlenecks and predicting suitable tuning parameters in order to boost stencil performance.
To achieve that, the following aspects need to be accurately modeled: shared multi-level caches in multi/many cores, and the prefetching engine mechanism.
In this work, we introduce our published performance model (\cite{delacruz2014_2}) focusing on these architectural characteristics, and then we show how it can help improve stencil computation performance. Also, a new metric that estimates the efficiency of thread-parallelized stencil code is proposed.