Loading…

Sign up or log in to bookmark your favorites and sync them to your phone or calendar.

Wednesday, March 4
 

08:00 CST

Tutorial Registration
Tutorial Registration

Wednesday March 4, 2015 08:00 - 08:30 CST
BioScience Research Collaborative 6500 Main Street, Houston, Tx 77005

08:30 CST

Introduction to HDF5 for High-Performance Computing Environments, Quincey Koziol, The HDF Group
DOWNLOAD PRESENTATION

This tutorial provides an introduction to using HDF5 with HPC applications.  Beginning with the HDF5 data model and progressing though serial application development with HDF5 to using HDF5 for parallel I/O, this tutorial provides a fast-paced overview of using HDF5 for writing application data in high-performance environments.

Speakers
avatar for Quincey Koziol

Quincey Koziol

Director of Core Software and HPC, The HDF Group
Mr. Koziol has been the principal software architect for the HDF5 software project from its inception. HDF5 represents a revolutionary approach to providing fast, portable serial and parallel I/O, data sharability and application interoperability. In this effort, Mr. Koziol has designed... Read More →


Wednesday March 4, 2015 08:30 - 10:00 CST
Room 282 BioScience Research Collaborative

08:30 CST

OCCA: Portability Layer for Many-core Thread Programming, Tim Warburton, Rice University; David Medina, Rice University
The OCCA API enables an experienced programmer who is comfortable with programming in OpenCL, CUDA, pThreads, or OpenMP to write a single implementation of their compute kernels that can be treated at run time as any of these four threading approaches. In this way the best performing threading model can be chosen at run time for almost all modern mainstream many-core processors. See http://libocca.org for further background on OCCA.

Instructors: Tim Warburton and David Medina

Speakers
TW

Tim Warburton

Rice University
Over the last decade Tim has developed and analyzed discontinuous Galerkin methods for the time-domain Maxwell’s equations. He has recently extended this research agenda to include the development of high order, local artificial radiation boundary conditions to provide closure for... Read More →


Wednesday March 4, 2015 08:30 - 10:00 CST
Room 280 BioScience Research Collaborative

10:00 CST

Networking & Break
Wednesday March 4, 2015 10:00 - 10:30 CST
BioScience Research Collaborative 6500 Main Street, Houston, Tx 77005

10:30 CST

OpenMP Tutorial, Barbara Chapman, University of Houston
DOWNLOAD PRESENTATION

OpenMP has emerged as a popular directive-based approach for shared memory parallel programming. For some applications, a parallel version of an existing sequential code can be created via the insertion of just a few OpenMP directives. Recently, OpenMP has been extended to enable its use on accelerators attached to a host system. In this tutorial we give an introduction the features and scope of OpenMP.

Speakers
BC

Barbara Chapman

Professor, Computer Science, University of Houston
Dr. Chapman is a Professor of Computer Science at the University of Houston, TX, USA, where she also directs the Center for Advanced Computing and Data Systems. Chapman has performed research on parallel programming languages and the related implementation technology for over 15... Read More →


Wednesday March 4, 2015 10:30 - 12:00 CST
Room 282 BioScience Research Collaborative

10:30 CST

Performance Analysis of MPI+OpenMP Programs with HPCToolkit, John Mellor-Crummey, Rice University
DOWNLOAD PRESENTATION

The number of hardware threads per processor on multicore and manycore processors is growing rapidly. Fully exploiting emerging scalable parallel systems will require programs to use threaded programming models at the node level. OpenMP is the leading model for multithreaded programming. This tutorial will give a hands-on introduction of how to use Rice University's open-source HPCToolkit performance tools to analyze the performance of programs that employ MPI + OpenMP to harness the power of scalable parallel systems. See http://hpctoolkit.org for more information about HPCToolkit.

Speakers
avatar for John Mellor-Crummey

John Mellor-Crummey

Professor, Computer Science, Rice University
John Mellor-Crummey’s research focuses on software for high performance and parallel computing, including compilers, tools, and runtime libraries for multicore processors and scalable parallel systems. His current work is principally focused on performance tools for scalable parallel... Read More →


Wednesday March 4, 2015 10:30 - 12:00 CST
Room 280 BioScience Research Collaborative

12:00 CST

Registration & Networking
Registration and Networking

Wednesday March 4, 2015 12:00 - 13:00 CST
BioScience Research Collaborative 6500 Main Street, Houston, Tx 77005

13:00 CST

Opening & Welcome

Moderators
avatar for Jan E. Odegard

Jan E. Odegard

Executive Director Ken Kennedy Institute/ Associate Vice President Research Computing, Rice University
Jan E. Odegard Executive Director, Ken Kennedy Institute for Information Technology and Associate Vice President, Research Computing & Cyberinfrastructure at Rice University. Dr. Odegard joined Rice University in 2002, and has over 15 years of experience supporting and enabling research... Read More →

Speakers
avatar for Klara Jelincova

Klara Jelincova

Vice President for Information Technology and Chief Information Officer, Rice University
Klara Jelinkova is the Vice President for IT and Chief Information Officer at Rice University. As CIO Klara is responsible for strategic technology issues ranging from governance, policy and resource allocation to protocol and organization. She represents the University's information... Read More →


Wednesday March 4, 2015 13:00 - 13:15 CST
BioScience Research Collaborative 6500 Main Street, Houston, Tx 77005

13:15 CST

Plenary: 'Preparing the Broad Department of Energy, Office of Science User Community for Advanced Manycore Architectures,' Katie Antypas, NERSC

Speakers
KA

Katie Antypas

NERSC Services Department Head; NERSC-8 Project Manager, NERSC, Lawrence Berkeley National Laboratory
Katie is the NERSC Services Department Head, with oversight of the Advanced Technologies, Data and Analytics Services, and User Services Groups. The Services Department works to increase the impact of science research conducted by NERSC users by providing exceptional services and... Read More →


Wednesday March 4, 2015 13:15 - 14:00 CST
BioScience Research Collaborative 6500 Main Street, Houston, Tx 77005

14:00 CST

Keynote: 'OpenPower Innovation is Redefining High Performance Computing,' Bradley McCredie, IBM
DOWNLOAD PRESENTATION

WATCH VIDEO

IBM has come together with over 85 technology companies including founding partners Google, Nvidia, Mellanox, and Tyan to launch the OpenPOWER foundation.  The innovation coming from this group is going beyond semiconductor manufacturing alone (with its increasing challenges) to extend Moore's law at the system and application level.  This talk will demonstrate numerous innovations from the OpenPOWER group and show how these are having profound impact on the High Performance Compute community as demonstrated by the recently announced $325M 200 PFLOP US Dept. of Energy win announced at Super-Computing 2014.  Innovation examples include recently announced technologies such as the massive bandwidth, low-latency NVlink connection between future Power processors and Nvidia GPUs, the industries first coherent PCIe attach high-performance network adapter from Mellanox, and another industry first leverage coherent PCIe (CAPI) to attach 40TB of Flash to a Power8 system as "slow memory" versus high-speed storage.

Speakers
avatar for Bradley McCredie

Bradley McCredie

IBM
Dr. Bradley McCredie is an IBM Fellow, Vice President of IBM Power Systems Development, and President of the OpenPOWER Foundation. He is also a member of IBM’s Technology Team, a senior executive group that sets IBM’s technical strategy.  In his current position, he oversees... Read More →


Wednesday March 4, 2015 14:00 - 14:45 CST
BioScience Research Collaborative 6500 Main Street, Houston, Tx 77005

14:45 CST

Disruptive Technology (2 minutes): 'Automata Processing: A Massively Parallel Computing Solution,” Dan Skinner, Micron
DOWNLOAD PRESENTATION

WATCH VIDEO

Many of today’s most challenging computer science problems—such as those involving very large data structures, unstructured data, random access or real-time performance requirements—require highly parallel solutions. The current implementation of parallelism can be cumbersome and complex, challenging the capabilities of traditional CPU and memory system architectures and often requiring significant effort on the part of programmers and system designers.

For the past seven years, Micron Technology has been developing a hardware co-processor technology that can directly implement large-scale Non-deterministic Finite Automata (NFA) for efficient parallel execution. This new non-Von Neumann processor, currently in fabrication, borrows from the architecture of memory systems to achieve massive data parallelism, addressing complex problems in an efficient, manageable method. 

Mr. Skinner’s talk will provide an overview on this revolutionary new technology, the growing ecosystem, as well as potential applications in the area of oil and gas exploration and data analytics.

Speakers

Wednesday March 4, 2015 14:45 - 15:05 CST
BioScience Research Collaborative 6500 Main Street, Houston, Tx 77005

14:45 CST

Disruptive Technology (2 minutes): 'HPC Anywhere: Submersion and Directed Flow Cooling Technology for Oil & Gas,” Herb Zien, LiquidCool Solutions; Jay Ford, LiquidCool Solutions
DOWNLOAD PRESENTATION

WATCH VIDEO

More than two-thirds of the world’s oil and gas production comes from existing, mature fields. Oil & Gas has been challenged with the task of finding reserves under increasingly difficult conditions. HPC compute technology has been critical to pinpoint the location of these hard-to-locate fields through the development and use of cutting-edge hardware and software technology. A common method is to deploy arrays of sensors that record seismic reflections, creating vast amounts of raw data from high-resolution sensing systems. Processing this data requires High Performance Compute platforms utilizing new code written for CPUs, GPGPUs and math coprocessors. The process requires a combination of highly efficient data collection and timely transport to a HPC datacenter. These HPC datacenters routinely manage petabytes of data, and the incoming data stream is growing. Oil & Gas teams have requirements for a swift, economical and rugged method to reduce data latency. LiquidCool Solutions makes it possible to bring the datacenter to the field while realizing the highest levels of uptime and reliability with its rugged compute systems made specifically for offshore drilling rigs and shale platforms, watercraft, desert and tropic environments. Where the traditional method of housing, protecting and cooling electronics uses circulating air around and through components, which wastes energy and exposes computing equipment to corrosion, salt air, dust, dirt and airborne pollutants, LiquidCool Solutions ‘ disruptive innovation is an environmentally-sealed HPC system that can be deployed in less than half the space and in one quarter of the time, using half the power, radically transforming how data flows. This concept, known as “HPC Anywhere,” is available in several embodiments. Rugged Tier from LiquidCool Solutions combines sealed enclosures, weather tight field cabinets and ruggedized mini modular datacenters. Energy savings is an added benefit. The LCS-cooled compute requires 40% less power when compared to standard air- cooled compute systems – and 75% less space! Service is a snap, as one can swap servers in less than two minutes. Comparing the value proposition with legacy air systems, or other forms of liquid cooling for that matter, LiquidCool Solution technology saves energy, saves space, enhances reliability, operates silently, and is easy to maintain in the field.

Speakers
JF

Jay Ford

VP Sales & Marketing, LiquidCool Solutions
Harsh operating conditions create profound computing challenges in the Oil & Gas industry. We are at Rice to demonstrate how LiquidCool Solutions fully sealed server technology can help identify and take advantage of opportunities earlier, and mitigate potentially damaging even... Read More →
HZ

Herb Zien

CEO, LiquidCool Solutions
I have over 30 years of experience in project development, engineering management, power generation and energy conservation. Prior to assuming the position as CEO of LiquidCool Solutions I was cofounder of ThermalSource, LLC, a firm that grew to become the largest owner and operator... Read More →


Wednesday March 4, 2015 14:45 - 15:05 CST
BioScience Research Collaborative 6500 Main Street, Houston, Tx 77005

14:45 CST

Disruptive Technology (2 minutes): 'Improving the Design of Subsea Riser Systems,” William Calver II, Altair; Geert Wenes, Cray; Bert Beals, Cray
DOWNLOAD PRESENTATION

WATCH VIDEO

Ultra deepwater subsea CFD riser analysis requires an accurate and scalable approach to capture the complex effects of flow separation and resulting Vortex-Induced Vibration (VIV) in order to appropriately assess riser fatigue life. Simulating the multi-physical phenomena of VIV requires a highly-scalable solver with fully-coupled Fluid/Structure Interaction (FSI) capability to analyze and adequately describe the nonlinear response due to marine currents, riser shapes, and the possible influence of multiple subsea structures. We present a state-of-the-art solver approach as well as optimized HPC implementation to do exactly that. We discuss the impact of both solver and HPC technologies on performance, accuracy, and scalability.

Speakers
avatar for Geert Wenes

Geert Wenes

Sr. Practice Leader/Architect, Cray, Inc.


Wednesday March 4, 2015 14:45 - 15:05 CST
BioScience Research Collaborative 6500 Main Street, Houston, Tx 77005

14:45 CST

Disruptive Technology (2 minutes): 'Keep Agility, Reduce Cost! HPC and Compute-on-Demand with AWS,' Dougal Ballantyne, Amazon Web Services
In this session, Dougal Ballantyne from Amazon Web Services will talk about how on-demand scalable compute resources can be easily and quickly leveraged, specifically for Oil & Gas HPC. The session will give a quick overview of the Amazon Web Services platform and then dive into specific architectural patterns, including customer examples. This session is ideal for those who want to learn more about how other Oil & Gas customers and ISVs are leveraging the Amazon Web Services platform for increased performance and scalability in upstream and downstream applications.

Speakers
avatar for Dougal Ballantyne

Dougal Ballantyne

Principal Product Manager, Amazon Web Services
Dougal Ballantyne is a HPC Solutions Architect at Amazon Web Services. He works closely with customers wishing to build new HPC solutions and migrate existing HPC solutions onto the AWS platform. He is the Americas lead and supports customers from all over the world. Prior to joining... Read More →


Wednesday March 4, 2015 14:45 - 15:05 CST
BioScience Research Collaborative 6500 Main Street, Houston, Tx 77005

14:45 CST

Disruptive Technology (2 minutes): 'RTM Using Hadoop: Is There a Case for Migration?,” Ian Lumb, York University & Bright Computing, Inc.
DOWNLOAD PRESENTATION

WATCH VIDEO

Reverse-Time Migration (RTM) is a compute-intensive step in processing seismic data for the purpose of petroleum exploration. Because complex geologies (e.g., folds, faults, domes) introduce unwanted signal (aka. noise) into recorded seismic traces, RTM is also an essential step in all upstream-processing workflows. The need to apply numerically intensive algorithms for wave-equation migration against extremely large volumes of seismic data is a well-established industry requirement. Not surprisingly then, providers of processing services for seismic data continue to make algorithm development an ongoing area of emphasis. With implementations making use of the Message Passing Interface (MPI), and variously CUDA for programming GPUs, RTM algorithms routinely exploit the processing of large volumes of seismic data in parallel. Given its innate ability to topologically align data with compute, through the combination of a parallel, distributed high volume filesystem (HDFS or Lustre) and workload manager (YARN), RTM algorithms could make use of Hadoop. Given the current level of convergence between High Performance Computing (HPC) and Big Data Analytics, the barrier for entry has never been lower. At the outset then, this presentation reviews the opportunities and challenges for Hadoop’izing RTM. Because recontextualizing RTM for Big Data Analytics will be a significant undertaking for organizations of any size, the analytics upside of using Hadoop applications as well as Apache Spark will be also considered. Although the notion of Hadoop’izing RTM is at the earliest of stages, the platform provided by Big Data Analytics has already delivered impressive results in processing large-scale seismic event data via waveform cross correlation (e.g., Addair et al., 2014, http://dx.doi.org/10.1016/j.cageo.2014.01.014).

Wednesday March 4, 2015 14:45 - 15:05 CST
BioScience Research Collaborative 6500 Main Street, Houston, Tx 77005

14:45 CST

14:45 CST

Disruptive Technology (2 minutes): “REX Neo Architecture: A Path to Exascale,” Paul Sebexen, REX Computing; Thomas Sohmers, REX Computing
DOWNLOAD PRESENTATION

WATCH VIDEO

REX Computing is developing an energy efficient, high performance computing (HPC) architecture that aims to accelerate the arrival of exascale computing systems. The REX Neo architecture leverages the world’s first ultra-low-power HPC processor, as well as a system-level node design to deliver greater energy efficiency. Compared to the efficiency offered by existing systems, typically a maximum of 10 single precision GFLOPs per watt, the Neo architecture aims to achieve at least 100 GFLOPs per watt. Early prototypes of the REX Neo processor and compute node will be available for customer evaluation in 2016. 

The Neo architecture is a new approach to HPC that aims to overcome barriers of scaling existing systems by designing each component, from the processor chip to the node, with power efficiency in mind. Inspired by ideas in system design, from early systolic arrays to today’s commercially-available manycore platforms, and advances in compiler design and programming models, the guiding philosophy behind Neo centers on removing complexity from hardware in favor of performing optimizations ahead-of-time. Just as many pieces of complex software, including geospatial data processing applications, can often be reduced to compact reusable pieces of code, underneath the software, interconnects, and microarchitecture of Neo are simple compute cores. This structural isomorphism between workload demands and hardware design is the key to high scalability and particularly beneficial in simplifying the process of mapping parallel control and data flow into targeted executables. 


Wednesday March 4, 2015 14:45 - 15:05 CST
BioScience Research Collaborative 6500 Main Street, Houston, Tx 77005

15:15 CST

Networking & Break
Wednesday March 4, 2015 15:15 - 15:45 CST
BioScience Research Collaborative 6500 Main Street, Houston, Tx 77005

15:45 CST

Coarse-grained Seismic Algorithms: “RTM - Asynchronous Constraint Execution for Scalability and Heterogeneity on Shot Level,” Daniel Grünewald, Fraunhofer ITWM; Franz-Josef Pfreundt, Fraunhofer ITWM
DOWNLOAD PRESENTATION

WATCH VIDEO

Reverse time migration (RTM) is the method of first choice in seismic imaging. It fully respects the two-way wave equation. It provides high quality imaging also in regions with complex geological structures, e.g. regions composed of steep flanks and/or high velocity contrasts. This comes at a price. RTM has high demands on the underlying compute resources. Being of pure academical interest for a long time, progress in hardware development has made RTM feasible also on the industrial scale. Complex physical modeling, large target output domains, large migration apperture and/or high frequency content still require efficient parallelization on the algorithmic side. There, the memory footprint might be too large for the computation of shot results on a single device. RTM benefits from high throughput accelerators like e.g. GPUs. In order to deal with the heterogeneity at the hardware level, RTM needs a high level of parallelism and improved load balancing features in order to fully exploit the underlying hardware resources. Furthermore, with an ever increasing floating point throughput, I/O should be avoided as much as possible to preserve scalability.

These requirements also arise in the context of interactive velocity model building based on RTM, which comes into the realms of possibility nowadays. Here, time to solution matters and efficient internode parallelization is required to achieve good scalability.

We propose to introduce the concept of asynchronous constraint execution complemented by random velocity perturbation boundary conditions to achieve good scalability. We show the parallel efficiency achieved for forward propagation in an acoustic isotropic medium in a strong scalability set up on a 1024 cube grid. For 64 processes running on a Intel(R) Xeon(R) CPU E5-2680 using 10 cores each, the parallelization efficiency is 97%.

Speakers
DG

Daniel Grunewald

Competence Center High Performance Computing, Fraunhofer ITWM
avatar for Franz-Josef Pfreundt

Franz-Josef Pfreundt

Division Director, Competence Center for HPC
I studied Mathematics, Physics and Computer Science and got a PhD in Mathematical Physics and helped to found the Fraunhofer ITWM in kaiserslautern/ Germany. Today I am leading the Competence Center for HPC. In my group we developed the GPI programming model and the BeeGFS... Read More →


Wednesday March 4, 2015 15:45 - 16:05 CST
BioScience Research Collaborative 6500 Main Street, Houston, Tx 77005

15:45 CST

Facilities, Infrastructure & Data I: “Hosting DLC HPC System and Beyond,” Diego Klahr, Total
DOWNLOAD PRESENTATION

WATCH VIDEO

Due to the extraordinary challenges the Oil and Gas industry has to face, we have more and more data to process with algorithms taking more and more physics into account.
High Performance Computing is a key component of this evolution. We are now able to use and to take advantage of hundred thousands of cores. But are our facilities (data center building, electrical power, cooling system) ready to host such a "monster"?
To achieve the Peta flops challenge using our 15 years old datacenter we chose an ambient temperature direct water cooling approach. We have now an experience of 3 years of production with such a system and numbers prove that this solution is mature, efficient and stable.
But keeping in mind the evolution curve of the needs in term of computing power, we will have to face a new challenge in the next ten years: hosting an exascale system! How much energy will be necessary to run those machines run and to ensure a secure power and cooling supply at an acceptable cost?
The life cycle of our facilities is slightly different from the one of super computers. The target when we design a new datacenter is to ensure the capability of hosting for about five different generations of calculators. So we will have not only to take in account what the provider have in their lab today but also to imagine what will be the step beyond and for this, we need to change our classical approach in terms of data center design.

Speakers

Wednesday March 4, 2015 15:45 - 16:05 CST
BioScience Research Collaborative 6500 Main Street, Houston, Tx 77005

16:05 CST

Coarse-grained Seismic Algorithms: 'Portable Task-based Programming for Elastodynamics,” Lionel Boillot, INRIA Magique3D team; George Bosilca, Innovative Computing Laboratory - University of Tennessee; Emmanuel Agullo, INRIA Hiepacs team; Henri Calandra,
DOWNLOAD PRESENTATION

WATCH VIDEO

Most of the seismic imaging techniques are based on wave propagation simulations. 3D anisotropic elastodynamics is a generally accepted accurate candidate for the modeling of the subsurface. A Discontinuous Galerkin space discretization, associated with a Leap-Frog time scheme, leads to a quasi-explicit matrix linear system involving only local computations (i.e. cell by cell) at every half-timestep. The original, message passing parallel implementation uses a domain decomposition with one domain per process, and lead to a uneven work balance between processes. As a result, the optimization for each architecture is time consuming and the solution is never portable. In a previous work, we successfully overcame this problem on shared memory architectures (ccNUMA node and Intel Xeon Phi accelerator) by changing the programming paradigm with a task-based approach and the use of the PaRSEC runtime system. The two key-features for efficient task scheduling are finer granularity than one subdomain per core and work-stealing depending on data locality. The results showed very good parallel efficiency and were portable on these machines. We now plan to address distributed memory architectures in order to target clusters of hybrid multicore nodes and coprocessors.

Speakers
avatar for Lionel Boillot

Lionel Boillot

Expert Engineer, Inria
avatar for Henri Calandra

Henri Calandra

Total
Henri Calandra obtained his M.Sc. in mathematics in 1984 and a Ph.D. in mathematics in 1987 from the Universite des Pays de l’Adour in Pau, France. He joined Cray Research France in 1987 and worked on seismic applications. In 1989 he joined the applied mathematics department of... Read More →


Wednesday March 4, 2015 16:05 - 16:25 CST
BioScience Research Collaborative 6500 Main Street, Houston, Tx 77005

16:05 CST

Facilities, Infrastructure & Data I: “BP Center for High Performance Computing Facility - Year 1 Review,” Stefan Garrard, BP; Keith Gray, BP; Gary Kuzma, BP; Saadeddine Dimachkieh, HOK
DOWNLOAD PRESENTATION

WATCH VIDEO

In July, 2011, HOK was commissioned to design a High Performance Computing facility for BP. The BP HPC project team consisting of HOK (Architects and Engineers), Anslow-Bryant (Contractor), BP Westlake Property Management and Computing Teams worked collaboratively to design and build a facility uniquely capable of supporting the needs of research computing.

The facility is designed to provide a 30+ year life expectancy with capability to support the largest forecasted computing systems. Energy efficiency, operational flexibility and safety were primary design and construction drivers. Building systems are designed to provide an aggressive PUE target of 1.35.

Construction began June, 2012, and the BP HPC Team started computing in early September, 2013. The facility has been operating successfully for over a year. We will report on how the energy efficiency and operational goals have been achieved.

Moderators
Speakers
SD

Saad Dimachkieh

SVP/Firmwide Director of Electrical Engineering, HOK


Wednesday March 4, 2015 16:05 - 16:25 CST
BioScience Research Collaborative 6500 Main Street, Houston, Tx 77005

16:25 CST

Coarse-grained Seismic Algorithms: “Using Performance Tools to Analyze the Performance of an 
MPI+OpenMP Reverse Time Migration Code,” Sri Raj Paul, Rice University; Mauricio Araya-Polo, Shell; John Mellor-Crummey, Rice University
Applications to analyze seismic data or simulate oil reservoirs routinely employ scalable parallel systems to produce timely results. Today, applications for such systems typically use the MPI everywhere model. With increasing core counts on multicore and manycore chips, this programming model is becoming less viable because of the decreasing memory per core. To fully exploit emerging processor architectures, programs will need to employ threaded parallelism within a node in addition to MPI. OpenMP is a mature threaded programming model that is widely available today with implementations in both vendor and open source compilers. For that reason, MPI+OpenMP is the recommended programming model for systems such as the forthcoming supercomputer at NERSC with manycore compute nodes based on Intel’s Phi Knights Landing (KNL) chips.

Changing to a hybrid MPI+OpenMP programming model is a challenge for programmers. Tuning MPI+OpenMP programs for high performance is difficult without performance analysis tools to identify bottlenecks and uncover opportunities for improvement. In this talk, we describe our experience applying performance tools to gain insight into an MPI+OpenMP code that performs reverse time migration (RTM) on a cluster of multicore processors. Our goals for this analysis were (1) to identify any performance bottlenecks present in the code and opportunities for improvement, and (2) to assess the capabilities of available tools for analyzing the performance of such applications. We were looking to the tools for help evaluating the domain decomposition strategy used to partition the work among nodes, evaluating the use of threaded parallelism on a node, and evaluating functional unit utilization of individual cores.

In this work we identified opportunities for improvements of the application by reducing load imbalance resulting from the domain decomposition, and idleness among OpenMP worker threads during MPI communication. The combination of Rice’s HPCToolkit and TACC’s PerfExpert enabled us to gain insight into the distributed memory parallel performance, threading performance, as well as bottlenecks within a core.

Speakers
MA

Mauricio Araya

Senior Researcher Computer Science, Shell Intl. E&P Inc.
avatar for John Mellor-Crummey

John Mellor-Crummey

Professor, Computer Science, Rice University
John Mellor-Crummey’s research focuses on software for high performance and parallel computing, including compilers, tools, and runtime libraries for multicore processors and scalable parallel systems. His current work is principally focused on performance tools for scalable parallel... Read More →


Wednesday March 4, 2015 16:25 - 16:45 CST
BioScience Research Collaborative 6500 Main Street, Houston, Tx 77005

16:25 CST

16:45 CST

Coarse-grained Seismic Algorithms: 'Using Modeling to Develop Stencil Codes,” Raul de La Cruz, Barcelona Supercomputing Center; Mauricio Araya-Polo, Shell
DOWNLOAD PRESENTATION

WATCH VIDEO

Stencil computation is the core of most wave-based Seismic Imaging applications. The stencil solver alone -depending on the governing equation- can represent up to 90\% of the overall elapsed time, where the efficient use of the memory hierarchy is a mayor concern~\cite{Araya-Polo2008}. Therefore, source code analysis and improvements development that can fully take advantage of modern architectures is crucial. Those tasks can be assisted by performance models. Performance models help exposing bottlenecks and predicting suitable tuning parameters in order to boost stencil performance.

To achieve that, the following aspects need to be accurately modeled: shared multi-level caches in multi/many cores, and the prefetching engine mechanism.

In this work, we introduce our published performance model (\cite{delacruz2014_2}) focusing on these architectural characteristics, and then we show how it can help improve stencil computation performance. Also, a new metric that estimates the efficiency of thread-parallelized stencil code is proposed.

Speakers
MA

Mauricio Araya

Senior Researcher Computer Science, Shell Intl. E&P Inc.
RD

Raul de La Cruz

Researcher, Phd. Student, Barcelona Supercomputing Center (BSC-CNS)


Wednesday March 4, 2015 16:45 - 17:05 CST
BioScience Research Collaborative 6500 Main Street, Houston, Tx 77005

16:45 CST

17:05 CST

Coarse-grained Seismic Algorithms: “Democratization of HPC in the Oil & Gas Industry Through Automatic Parallelization with Parallware,” Manuel Arenaz, Appentra Solutions S.L. & University of A Coruña; J.M. Dominguez, EPHYSLAB Environmental Physics Labora
DOWNLOAD PRESENTATION

WATCH VIDEO

The Oil&Gas industry is an extremely competitive business sector where the return of investment of High Performance Computing (HPC) is well understood. The development of HPC programs is a complex, error-prone, tedious undertaking that impacts negatively in the productivity of HPC developers. In the years to come, due to the retirement of expert engineers, there will be a growing urgency in finding experts skilled in geosciences and computer science. In modern computing systems, parallelism is the primary source of performance gain. Automatic parallelization is the ideal approach to address the productivity gap as it decouples software development from the underlying parallel hardware. This paper shows that the use of modern parallelizing compilers is a step forward in the democratization of HPC as the productivity of HPC developers is improved significantly. The completion time of numerous tedious, complex and error-prone tasks of the HPC workflow is significantly shortened. The experiments demonstrate that Parallware is a step forward in state-of-the-art as it is the first production-state parallelizing compiler that succeeds with full-scale real programs. The HPC code DualSPHysics to simulate the impact of a wave on a petroleum plant is used as case study.

Speakers
avatar for Dr. Manuel Arenaz

Dr. Manuel Arenaz

Arenaz, Appentra Solutions
Dr. Manuel Arenaz is the CEO of APPENTRA Solutions and professor at the University of A Coruña (Spain). He holds a PhD in Computer Science from the University of A Coruña (2003) on advanced compiler techniques for parallelisation of scientific codes. He is passionate about technology... Read More →


Wednesday March 4, 2015 17:05 - 17:25 CST
BioScience Research Collaborative 6500 Main Street, Houston, Tx 77005

17:05 CST

Facilities, Infrastructure & Data I: 'Best Practices in HPC Systems Administration,' Shawn Hall, Numerical Algorithms Group

Speakers
avatar for Shawn Hall

Shawn Hall

Infrastructure Engineer, Numerical Algorithms Group
Shawn's experience is in large scale system administration, having worked with HPC clusters in industry and academia. He has worked on many aspects of large scale systems and his interests include parallel file systems, configuration management, performance analysis, and security. Shawn... Read More →


Wednesday March 4, 2015 17:05 - 17:25 CST
BioScience Research Collaborative 6500 Main Street, Houston, Tx 77005

17:30 CST

Networking Reception
Wednesday March 4, 2015 17:30 - 19:30 CST
BioScience Research Collaborative 6500 Main Street, Houston, Tx 77005
 
Thursday, March 5
 

07:30 CST

Registration & Networking
Thursday March 5, 2015 07:30 - 08:30 CST
BioScience Research Collaborative 6500 Main Street, Houston, Tx 77005

08:30 CST

Plenary: Message from Organizers

Speakers
avatar for Jan E. Odegard

Jan E. Odegard

Executive Director Ken Kennedy Institute/ Associate Vice President Research Computing, Rice University
Jan E. Odegard Executive Director, Ken Kennedy Institute for Information Technology and Associate Vice President, Research Computing & Cyberinfrastructure at Rice University. Dr. Odegard joined Rice University in 2002, and has over 15 years of experience supporting and enabling research... Read More →


Thursday March 5, 2015 08:30 - 08:45 CST
BioScience Research Collaborative 6500 Main Street, Houston, Tx 77005

08:45 CST

Keynote: 'Current Trends in Parallel Numerical Computing and Challenges for the Future,' Jack Dongarra, American University Distinguished Professor of Computer Science in the Electrical Engineering and Computer Science Department, University of Tennessee
DOWNLOAD PRESENTATION

WATCH VIDEO

In this talk we examine how high performance computing has changed over the last 10-year and look toward the future in terms of trends. These changes have had and will continue to have a major impact on our numerical scientific software. A new generation of software libraries and algorithms are needed for the effective and reliable use of (wide area) dynamic, distributed and parallel environments.  Some of the software and algorithm challenges have already been encountered, such as management of communication and memory hierarchies through a combination of compile--time and run--time techniques, but the increased scale of computation, depth of memory hierarchies, range of latencies, and increased run--time environment variability will make these problems much harder.

Speakers

Thursday March 5, 2015 08:45 - 09:30 CST
BioScience Research Collaborative 6500 Main Street, Houston, Tx 77005

09:30 CST

Keynote: 'The Green Data Center and Energy Efficient Computing,' Steven Hammond, Computational Science Center Director, NREL
DOWNLOAD PRESENTATION

In this talk we present the holistic approach to data center energy efficiency in practice at the National Renewable Energy Laboratory (NREL). NREL completed construction of its warm-water, liquid cooled data center in late 2012. This data center is demonstrating an annualized average PUE of 1.06. It uses evaporative cooling rather than mechanical chillers and the waste heat generated by the HPC system is captured and used as the primary heat source for the building office space and laboratories. The new data center cost less to build than comparable data centers and will cost less to operate. We will share our experiences to date with liquid cooling, lessons learned along the way, and some thoughts about the future.

Speakers
avatar for Steven W. Hammond

Steven W. Hammond

Director of Computational Science Center, National Renewable Energy Laboratory
Steve Hammond is the Director of the Computational Science Center at theNational Renewable Energy Laboratory in Golden, CO.Steve leads laboratory efforts in high performance computing and energyefficient data centers.Prior to joining NREL, Steve spent 10 years at the National Center... Read More →


Thursday March 5, 2015 09:30 - 10:15 CST
BioScience Research Collaborative 6500 Main Street, Houston, Tx 77005

10:15 CST

Networking & Break
Thursday March 5, 2015 10:15 - 10:55 CST
BioScience Research Collaborative 6500 Main Street, Houston, Tx 77005

10:45 CST

Facilities, Infrastructure & Data II: “Petascale Data Management at ExxonMobil,” Alan Wild, ExxonMobil
It’s not surprising when one discusses “High Performance Computing” that the focus tends to fall on the “computing” aspects (algorithms, computational capability). However, when HPC is applied to seismic imaging it’s important to realize that computing can’t happen without data and the storage subsystem. ExxonMobil HPC has been operating multi-petabyte file systems for over five years and has developed various practices for managing data at this scale. In this talk we plan to discuss the processes used to manage a petascale storage facility consisting of >500M files and the best practices. We will also describe a tool that allows us to effectively move data across file systems when disks need to be balanced or upgraded.

Speakers
avatar for Alan Wild

Alan Wild

High Performance Computing Systems Technical Specialist, ExxonMobil Technical Computing Company


Thursday March 5, 2015 10:45 - 11:05 CST
BioScience Research Collaborative 6500 Main Street, Houston, Tx 77005

10:45 CST

Fine-grained Seismic Algorithms: “OKL: A Unified Kernel Language for Parallel Architectures,” David Medina, Rice University; Amik St-Cyr, Shell; Tim Warburton, Rice University: Lucas Wilcox, Naval Postgraduate School
DOWNLOAD PRESENTATION

WATCH VIDEO

The inability to predict lasting languages and architectures led us to develop OCCA, a library focused on host-device interaction. OCCA is made up of a portable API, natively available in C, C++, C#, Fortran, Python, Julia and MATLAB, and the device kernel language. The unified kernel language in OCCA is based on macro expansions exposing parallelism and expanding to OpenMP, OpenCL, CUDA, Pthreads and COI.

However, rather than coding in the OCCA intermediate representation, we introduce two native languages: OKL and OFL. The OCCA Kernel Language (OKL) is based on C and extends the language by exposing parallel loops by labeling them. The OCCA Fortran Language (OFL) is the Fortran language equivalent of OKL.

OCCA is open-source project and can be found in [https://github.com/tcew/OCCA2] and simple examples are included in [https://github.com/tcew/OCCA2/examples].

Speakers
avatar for Amik St-Cyr

Amik St-Cyr

Senior researcher, Shell
Amik St-Cyr recently joined the Royal Dutch Shell company as a senior researcher in computation & modeling. Amik came to the industry from the NSF funded National Center for Atmospheric Research (NCAR). His work consisted in the discovery of novel numerical methods for geophysical... Read More →
TW

Tim Warburton

Rice University
Over the last decade Tim has developed and analyzed discontinuous Galerkin methods for the time-domain Maxwell’s equations. He has recently extended this research agenda to include the development of high order, local artificial radiation boundary conditions to provide closure for... Read More →


Thursday March 5, 2015 10:45 - 11:05 CST
BioScience Research Collaborative 6500 Main Street, Houston, Tx 77005

10:45 CST

Reservoir/Production: 'Vectorization of Equation of State Calculations in Compositional Reservoir Simulation,” Shaji Chempath, Kjetil Haugen and Bret Beckner, ExxonMobil
Reservoir simulations are computationally intensive and the next generation simulators will be using all available modern high performance computing tools for achieving maximum performance. ExxonMobil’s next generation simulator is designed to run in parallel on a large number of processors. Vectorization provides another way to further improve speedups by accelerating the calculations within a single cpu-core. A significant fraction of compositional reservoir simulations is spent on equation of state (EoS) calculations within each cell. We will present results of applying vectorization of EoS calculations within ExxonMobil’s next generation massively parallel reservoir simulator.

Vectorization in the traditional sense deals with an SIMD model were the SSE/AVX registers are used for parallelizing simple operations such as additions, and multiplications within a loop. We will present a different way of vectorizing an entire algorithm - Equation of State calculation - on SSE and AVX registers, with resulting speedup of 4-8 on a single CPU core. Ideal speed-ups are 4x on 128-bit SSE registers for 32-bit floating point operations (8x for the 256-bit AVX registers for 32-bit floating point operations). Rather than relying on individual loops being vectorized, we run the entire EoS calculation using vectorized variables, allowing 4 or 8 EoS calculations to be done simultaneously. To realize ideal speedups, we exploit the fact that, while there may be different branches within the EoS algorithm, the branching is very similar for spatially contiguous points in a reservoir simulation. For example, 8 adjacent cells in a reservoir simulation will have similar phase behavior and similar number of iterations during a phase-split calculation. EoS calculations within a compositional reservoir simulation have been scaled up by a factor of 4x-8x using this approach.


Thursday March 5, 2015 10:45 - 11:05 CST
BioScience Research Collaborative 6500 Main Street, Houston, Tx 77005

11:05 CST

Facilities, Infrastructure & Data II: “HDF5 - Where We're At and Where We're Going,” Quincey Koziol, HDF Group
DOWNLOAD PRESENTATION

WATCH VIDEO

This short talk provides an overview of the HDF5 technology, including its history, state of current practice, and plans for the future.  Focusing on high-performance computing and high-speed data acquisition, this talk gives an overview of HDF5 and a roadmap for future development.

Speakers
avatar for Quincey Koziol

Quincey Koziol

Director of Core Software and HPC, The HDF Group
Mr. Koziol has been the principal software architect for the HDF5 software project from its inception. HDF5 represents a revolutionary approach to providing fast, portable serial and parallel I/O, data sharability and application interoperability. In this effort, Mr. Koziol has designed... Read More →


Thursday March 5, 2015 11:05 - 11:25 CST
BioScience Research Collaborative 6500 Main Street, Houston, Tx 77005

11:05 CST

Fine-grained Seismic Algorithms: “Revisiting Kirchhoff Migration on GPUs,” Rajesh Gandham, Rice University; Thomas Cullison, Hess; and Scott Morton, Hess
DOWNLOAD PRESENTATION

WATCH VIDEO

Kirchhoff 3-D prestack depth migration is a widely used and computationally intensive seismic imaging approach that is highly parallelizable. There is parallelism both in the input seismic data and in the output image. At this workshop in 2008, we presented an algorithm with parallelism from the level of many serial Grid Engine tasks down to individual CUDA threads running on Nvidia GPUs. The performance of this algorithm gave clusters with GPUs a significant price-to- performance advantage over standard CPU clusters.

We recently revisited our Kirchhoff code with two computational goals, hardware portability and increased performance, as well as the geophysical goal of generalized gather creation. To address the possibility of hardware portability, we first tested the OCCA unified parallel programming model by re-coding our CUDA computational kernels in OCCA. This enabled our kernels to run on Nvidia or AMD GPUs and multi-core CPUs using CUDA, OpenCL or OpenMP. Because of the similarities between the OCCA and CUDA programming models, we were able to port and benchmark the code in less than a month without losing any production performance. In fact, the run-time compilation feature of OCCA generally resulted in better optimized kernels, since the compilers have knowledge of user parameters and the execution hardware, effectively tailoring the kernels for each production run.

With this portability success, we next explored two dimensions in the design of the algorithm to address the possibility of increasing the migration performance on newer GPU architecture. We explored (1) the fraction of the GPU memory used for input seismic data (as opposed to the output image samples) and (2) the trade-off of computation vs memory usage. We found significant im- provements in performance when we shifted the balance towards more input data and less output data in memory, and switched to a more compute intensive approach.

We have coded the new algorithm in OCCA and added the ability to generate generalized image gathers (e.g., offset gathers, offset vector gathers and reflection angle gathers). In this talk, we will discuss our porting, design experiences, and compare the advantages and performance of the old and new algorithms.

Speakers
avatar for Thomas Cullison

Thomas Cullison

Solutions Architect Oil & Gas, NVIDIA
Thomas is a Solutions Architect at NVIDIA with a focus on the Oil & Gas industry, and he has a background in computational geophysics, algorithms, and development of HPC and DSP applications for seismic imaging and seismic data processing. As an undergraduate at Colorado School of... Read More →
avatar for Rajesh Gandham

Rajesh Gandham

PhD Candidate, Rice University
Rajesh Gandham is a PhD candidate in the department of Computational and Applied Mathematics at Rice University. He is working with Prof. Tim Warburton on high performance high order numerical methods for ocean fluid flow problems. His research interests are in developing and implementing... Read More →
avatar for Scott Morton

Scott Morton

Manager and Global Geophysical Advisor, Hess Corporation
Scott Morton has 25 years of experience in computational and theoretical physics distributed between academia, the computer industry and the petroleum industry. Although originally trained as an astrophysicist, he switched to geophysics when he joined Shell in 1991 to do research... Read More →


Thursday March 5, 2015 11:05 - 11:25 CST
BioScience Research Collaborative 6500 Main Street, Houston, Tx 77005

11:05 CST

Reservoir/Production: “High-performance Parallel Preconditioning for Large-scale Reservoir Simulation Through Multiscale, Architecture-aware Techniques,” Jason Sewall, Intel; Abdulrahman Manea and Hamdi Tchelepi, Stanford
DOWNLOAD PRESENTATION

WATCH VIDEO

We propose architecture-aware improvements to the Algebraic Multiscale (AMS) class of preconditioners to accelerate reservoir simulation problems on modern parallel computers, and we present results comparing this augmented parallel AMS with the state of the art.


Thursday March 5, 2015 11:05 - 11:25 CST
BioScience Research Collaborative 6500 Main Street, Houston, Tx 77005

11:25 CST

11:25 CST

Fine-grained Seismic Algorithms: “RiDG: A Portable High-Performance Simulation Tool for Seismic Imaging,” Axel Modave and David Medina, Rice University; Amik St-Cyr, Shell; Tim Warburton, Rice University
DOWNLOAD PRESENTATION

WATCH VIDEO

Improving both the accuracy and computational performance of simulation tools for seismic imaging is a major challenge for the Oil and Gas industry. The current generation of compute clusters consist of many-core CPU and optionally massively parallel graphics processing units or side-car accelerators to provide a performance boost. However, acceleratoraided clusters require specialized algorithms and simulation tools to make full use of the hardware (e.g. [1, 4, 7]).

RiDG is the result of a collaboration between research teams at Rice University and Shell. It was conceived as a high-performance tool for seismic migration that can be run on several hardware architectures. It includes reverse time migration (RTM) capabilities, and multiple wave models on both heterogeneous and anisotropic media.

The model solver is based on a nodal discontinuous Galerkin time-domain (DGTD) method with high-order basis functions. The weak element-to-element coupling of DGTD methods makes it a suitable scheme for efficient computations on modern hardware architectures [2, 3]. Unstructured meshes and multi-rate time-stepping efficiently deal with multi-scale solutions [2, 6]. We adopted the MPI+X approach for distributed programming together with OCCA [5], a unified framework to make use of major multi-threading languages (e.g. OpenMP, OpenCL and CUDA), offering a flexible approach to handling the multi-threading X. The load balancing of our implementation reduces both device–host data movement and MPI node-to-node communication.

While RTM procedure generally requires massive data storage with slow I/O, the thin halo regions inherent in DGTD discretizations eliminate the need to frequently checkpoint volumetric field data. Low storage requirements for DGTD boundary data allows halo trace data to be stored in memory rather than relying on disk based check-pointing. Similarly MPI communications in the reverse time phase can be reduced by retaining outgoing boundary trace data from the forward time calculation and replaying them during the reverse time calculation.

In this talk, we present the main features of the schemes used in RiDG, as well as choices taken for an efficient, accelerated, parallel implementation. Numerical results are proposed to evaluate and illustrate the computational performance for forward simulations and reverse time migration calculations.

Speakers
avatar for Amik St-Cyr

Amik St-Cyr

Senior researcher, Shell
Amik St-Cyr recently joined the Royal Dutch Shell company as a senior researcher in computation & modeling. Amik came to the industry from the NSF funded National Center for Atmospheric Research (NCAR). His work consisted in the discovery of novel numerical methods for geophysical... Read More →
TW

Tim Warburton

Rice University
Over the last decade Tim has developed and analyzed discontinuous Galerkin methods for the time-domain Maxwell’s equations. He has recently extended this research agenda to include the development of high order, local artificial radiation boundary conditions to provide closure for... Read More →


Thursday March 5, 2015 11:25 - 11:45 CST
BioScience Research Collaborative 6500 Main Street, Houston, Tx 77005

11:25 CST

Reservoir/Production: “Heterogeneous HPC Framework for Agile Formation Evaluation and Well Placement Workflows in High-Angle and Horizontal Wells,” Valery Polyakov, Ray Kocian, Dzevat Omeragic and Tarek Habashy, Schlumberger
We present a heterogeneous, platform-independent high-performance computing framework as an enabling technology in formation evaluation and well placement workflows for complex 3D scenarios, such as lateral changes in dips and azimuths of layers and faults, cross-bedding, fracture orientation, etc. The modeling and inversion algorithms needed to interpret the well-log data in high-angle and horizontal (HA/HZ) wells are run on demand as a service on the Grid computing infrastructure and fully integrated with reservoir geo-models.

We show, as an example, how this HPC framework, serving as the back bone for the log simulation library of codes integrated as a Web service into a Petrel plugin, has been deployed in a giant carbonate field study with a national oil company to interpret hundreds of horizontal wells with varying extents and encompassing data from various wireline and LWD tools. This study is a part of a full field review in which static reservoir modeling was used to integrate the relevant data, spanning from core to seismic domains, into a new reservoir description. The latter will serve as a deterministic basis for reservoir history matching, infill well planning, and reservoir management optimization; the eventual goal is to increase production from the field by 25%. The reservoir model had been built using seismic and vertical well data. The results of log interpretation were in the end propagated as a change back to the pillar grid model. Petrophysicists were able to process data from more than 100 wells in less than two months, a productivity improvement of about one order of magnitude compared to conventional workflows; instead of the typical three days needed for each horizontal well analysis, three wells per day were processed.

This HPC-based approach to log modeling as-a-service is essential for new modeling and inversion-based workflows used to interpret measurements in HA/HZ wells. It is especially important for the interpretation of new-generation deep directional resistivity measurements, which are sensitive to the structure on the reservoir scale, where inversion derived structure and property information are incorporated into the geomodels. The system has proven efficiency in formation evaluation and geomodel refinement process in a field deployment. Platform-independence, dynamic expansion/contraction of the compute resources, and the ability to describe the job in a declarative fashion are key aspects of the framework.

Speakers

Thursday March 5, 2015 11:25 - 11:45 CST
BioScience Research Collaborative 6500 Main Street, Houston, Tx 77005

11:45 CST

Facilities, Infrastructure & Data II: “Large-Data Software Defined Visualization on CPUs,” Gregory S. Johnson, Gregory P. Johnson and Ingo Wald, Intel
DOWNLOAD PRESENTATION

WATCH VIDEO

The computational capability of CPUs is rapidly advancing, with increasing core counts, and increasing computational resources per core. At the same time, the capacity of high-speed memory accessible to the CPU (both on package and near-package) is growing exponentially over time. In this talk, we discuss how these trends strongly favor memory-intensive workloads with a high degree of parallelism such as large-data scientific visualization. We describe a low-level, open source software layer for high performance visualization directly on CPUs (without the need for high-end graphics processors), and show how this library can be used in new and existing applications through a straightforward API. Using this library, it is feasible to render large-scale data with high image quality, and performance comparable to (and in some cases greater than) that of top-ofthe-line GPUs. We also briefly discuss the scalability, resource flexibility, and potential cost advantages of such an approach

Speakers
GS

Gregory S. Johnson

Graphics Software Engineer, Intel Corporation


Thursday March 5, 2015 11:45 - 12:05 CST
BioScience Research Collaborative 6500 Main Street, Houston, Tx 77005

11:45 CST

Fine-grained Seismic Algorithms: “Performance Comparison Between HDG and Nodal DG Methods for Elastic Waves Simulation in Harmonic Domain,” Marie Bonnasse-Gahot, INRIA Bordeaux-Sud-Ouest; Henri Calandra, Total; Julien Diaz, INRIA Bordeaux-Sud-Ouest and St
DOWNLOAD PRESENTATION

WATCH VIDEO

In the most widely used methods for seismic imaging, we have to solve 2N wave equations at each iteration of the selected process if N sources are used. N is usually large and the efficiency of the whole simulation algorithm is directly related to the efficiency of the numerical method used to solve the wave equations.
Seismic imaging can be performed in time domain but there is an advantage in considering frequency domain. It is indeed not necessary to store the solution at each time step of the forward simulation. The main drawback lies then in solving large linear systems, which represents a challenging task when considering realistic 3D elastic media, despite the recent advances on high performance numerical solvers. In this context, the goal of our study is to develop new solvers based on reduced-size matrices without hampering the accuracy of the numerical solution.
We consider Discontinuous Galerkin methods (DGm) which are more convenient than finite difference methods to handle the topography of the subsurface. DGm and classical Finite Element methods (Fem) mainly differ from discrete functions which are only piecewise continuous in the case of DGm. DGm are then suitable to deal with hp-adaptivity. This is a great advantage to DGm which is thus fully adapted to calculations in highly heterogeneous media. The main drawback of classical DGm is that they are more expensive in terms of number of unknowns than classical FEm.
In this work we consider a hybridizable DG method (HDGm). Its principle consists in introducing a Lagrange multiplier representing the trace of the numerical solution on each face of the mesh cells. This new variable exists only on the faces and the unknowns of the problem depend on it. This allows us to reduce the number of unknowns of the global linear system. The solution to the initial problem is then recovered thanks to independent elementwise calculation. The parallelization of the HDG formulation does not induce any additional difficulty in comparison with classical DGm.
We have compared the performance of the HDG method with the one of nodal DGm for the 2D elastic waves propagation in harmonic domain. Preliminary results show that HDGm is better than DGm both for computational time and matrix storage. We are performing scalability-tests (strong and weak) in order to study the performance portability and the HPC efficiency.
This work is a preliminary work before considering the more general 3D case.

Speakers
MB

Marie Bonnasse-Gahot

PhD Student, INRIA Bordeaux-Sud Ouest
avatar for Henri Calandra

Henri Calandra

Total
Henri Calandra obtained his M.Sc. in mathematics in 1984 and a Ph.D. in mathematics in 1987 from the Universite des Pays de l’Adour in Pau, France. He joined Cray Research France in 1987 and worked on seismic applications. In 1989 he joined the applied mathematics department of... Read More →


Thursday March 5, 2015 11:45 - 12:05 CST
BioScience Research Collaborative 6500 Main Street, Houston, Tx 77005

11:45 CST

Reservoir/Production: “Strongly Scalable High Order Algorithm for Miscible Flooding on Massively Parallel Architecture,” Jizhou Li and Beatrice Riviere, Rice University
DOWNLOAD PRESENTATION

WATCH VIDEO

Introduction
The miscible displacement problem models the displacement of the mixture of two miscible fluids in porous media. The problem models an important process in enhanced oil recovery.
Our high order discretization based on Discontinuous Galerkin (DG) method for both Darcy flow and fluid transport is mass-conservative and provides high fidelity simulation results for the miscible flooding even under highly heterogeneous, anisotropic permeability and severe grid distortion.
The DG discretizations result in larger and more ill-conditioned linear systems than the ones from commonly used lower order methods. To address this issue, we apply algebraic multigrid (AMG) and domain decomposition (DD) to construct parallel preconditioners.
With carefully designed Distributed and Unified Numerics Environment (DUNE), we are able to achieve scalability and efficiency for our miscible flow simulator.

Scalable Solver and Preconditioner
We use preconditioned Kylov subspace iterative method as our solver.
The transport system can be preconditioned by overlapping domain decomposition with SSOR or ILU preconditioners.
The Darcy system is preconditioned using overlapping domain decomposition and aggregated algebraic multigrid (AMG) method.The preconditioners yield good convergence even for problem with largely varying permeability in magnitude from 10^{-10} to 10^{-18} m^2 such as in the SPE10 model.
The solvers and preconditioners are scalable on massively parallel computing architecture as we will illustrate in our result where the pressure and concentration are approximated by piecewise quadratic elements over 1,122,000 cells with up to 512 processes on IBM iDataPlex cluster. The AMG solver which is the most time consuming aspect of the simulation is strongly scalable as we will also demonstrate.

Speakers
JL

Jizhou Li

Graduate Student, Rice University
I am PhD student in Computational and Applied Mathematics at Rice University. I am interested in developing efficient and accurate solutions to porous media flow and transport problems, while maintaining a solid theoretical base.
avatar for Beatrice Riviere

Beatrice Riviere

Professor, Rice University
Expert in numerical methods for solving partial differential equations modeling processes in porous media, occurring at reservoir scales and pore scales. Develops efficient discontinuous Galerkin methods of arbitrary order.


Thursday March 5, 2015 11:45 - 12:05 CST
BioScience Research Collaborative 6500 Main Street, Houston, Tx 77005

12:05 CST

12:05 CST

Fine-grained Seismic Algorithms: “Towards an Explicit RTM Stencil Computation Framework on KALRAY TURBOCARD2,” Yann Kalemkarian, Kalray Inc.
DOWNLOAD PRESENTATION

WATCH VIDEO

RTM seismic imaging migration algorithms are very IO- and compute-intensive, and can benefit from using accelerators such as GPGPU or Manycores.
One of the main concern to the use of accelerators is the integration of legacy code with accelerators and the ability to leverage the standard programming models the industry has relied upon: code written in FORTRAN and parallelized with MPI and OpenMP-3 is not always easy to port to accelerators.

Most of the time the most computationally intensive part of algorithms such as RTM can be reduced to small parts of the code, mostly constituted of loops nests, called kernels. The problem then becomes how to move (or offload) those kernels to the accelerators and how to integrate the offloaded kernels with the rest of the application.
To achieve this, some approaches extends well-known programming models such as OpenMP-3 to support accelerators offloading such as OpenACC and OpenMP-4.

One of the shortcomings of this "#pragma based" approach is that it does not always allow to extract most of the performance of the accelerators, because of being too high-level.
For examples, explicit RTM schemes using stencil computation models are mostly IO-bound on current accelerators architectures, and could benefit from strategies such as cache-blocking or time-skewing, but those optimization strategies need to be explicitly described for a specific architecture.
These optimization can be hard to implement and defeat the main goals of those approaches which is to be accelerator-independent.

An alternative approach is to bring domain specific languages or libraries to allow the scientists to concentrate on the model itself while letting experts optimize for each platform.
Several tools and libraries exists for stencil computation such as ArrayLib, Pluto or Pochoir, but they are not targeting accelerators and they can be intrusive.

In order to facilitate explicit stencils computation such as explicit RTM schemes on the new TURBOCARD2 accelerator we decided to concentrate our efforts on a stencil library, which will abstract and optimize the domain decomposition and the data distribution on the accelerators directly from the host, while letting the programmers implement their kernels using their usual tools and models.

Speakers

Thursday March 5, 2015 12:05 - 12:25 CST
BioScience Research Collaborative 6500 Main Street, Houston, Tx 77005

12:05 CST

Reservoir/Production: “Rapid Simulation of Hydraulic Fracturing Using a Planar 3D Model,” Bjoern Nordmoen, Aron Anderson, Sergey Nenakhov and Olga Kresse, Schlumberger
DOWNLOAD PRESENTATION

WATCH VIDEO

Hydraulic fracturing simulators are widely used in the petroleum industry to design pumping schedules, monitor hydraulic fracturing treatments in real time, and to evaluate pumped treatments. To accurately quantify uncertainties and be able to efficiently optimize fracture treatments, the simulators have to be run multiple times—preferably, “as many times as possible”. Naturally, many independent runs can simply be executed in parallel, but in practice, it is also necessary to have a reasonably short execution time for each individual run.
The fracturing design simulator based on a planar 3D model is designed to produce highly accurate results for planar fractures in layered reservoirs. Unlike many other simulators, this simulator takes into account the properties of each individual reservoir layer. The elastic properties and widths of these layers can vary by orders of magnitude without compromising the integrity of the planar 3D model. Not surprisingly, this simulator feature greatly increases the computational demands compared with other less complex models, especially when there are numerous thin layers in the reservoir. In particular, the nonlinear equations governing the coupling between the fracture width and the pressure have a fully dense Jacobian matrix. To be able to rapidly solve the resulting linear systems, a Fourier transform (FFT)-based method was developed to quickly perform matrix-vector multiplications with the Jacobian matrix, allowing for rapid solutions using iterative methods. This multiplication kernel was heavily tuned (using single-instruction-multiple-data (SIMD) vectorization) and parallelized. Furthermore, the Jacobian matrix can be rather accurately approximated by a sparse matrix, which can be factorized and used as a preconditioner. By carefully assembling this sparse Jacobian matrix and using highly optimized routines for performing the (approximate) factorization, it is possible to achieve a near-constant iteration count with respect to the mesh size, and simultaneously limit the cost of factorizing the preconditioner.
As a result of these efforts, typical problems can be simulated in minutes and even the most challenging simulations can now be completed in a few hours on consumer-grade hardware. These results greatly improve the general usability of the fracturing design simulator based on a planar 3D model and will enable it to be used much more easily for optimization and uncertainty analysis


Thursday March 5, 2015 12:05 - 12:25 CST
BioScience Research Collaborative 6500 Main Street, Houston, Tx 77005

12:30 CST

Networking & Lunch Break
Thursday March 5, 2015 12:30 - 13:30 CST
BioScience Research Collaborative 6500 Main Street, Houston, Tx 77005

13:30 CST

Plenary: 'Future of High Performance Networking,' Anshul Sadana, Senior Vice President, Customer Engineering, Arista
WATCH VIDEO

In this talk, we will cover High Speed Ethernet (10/40/100G), new standards such as 25G and 50G and the upcoming 400G standard. In addition to raw speeds, ethernet networking has come a long way in enabling deeper buffers and lossless interconnect, operating at mega-scale in 100MW+ Datacenters. Lastly, we will look at evolution of optics and cabling choices that are emerging to enable interconnects at lower costs.

Speakers
avatar for Anshul Sadana

Anshul Sadana

Anshul Sadana has over a decade of experience in engineering management, design & software development in the networking industry. Anshul joined Arista in 2007 and is responsible for product definition and development priorities of Arista’s next generation products. He also leads... Read More →


Thursday March 5, 2015 13:30 - 14:00 CST
BioScience Research Collaborative 6500 Main Street, Houston, Tx 77005

14:00 CST

Plenary: 'Future Systems and Seismic Computing,' Kent Winchell, Director, Cray
WATCH VIDEO

The memory, storage, interconnect, and computing architectures in future systems will be very different from the past.  There are implications to algorithm design and the management of data within the system.  Discussion of the key system features that will be in new systems.

Speakers
avatar for Kent Winchell

Kent Winchell

CTO Office, Cray
I enjoy solving hard computing problems. My educational background is Compiler theory, I have worked on systems architecture, I/O performance, algorithm deisgn and almost everything related to HPC.


Thursday March 5, 2015 14:00 - 14:30 CST
BioScience Research Collaborative 6500 Main Street, Houston, Tx 77005

14:30 CST

Keynote: 'Seismic Imaging and HPC partnership – Innovation at BP,' Eric Green, VP, Seismic Imaging, BP
Seismic Imaging is critical to the success of the BP’s Exploration, Development and Production activities. We will describe some of the breakthroughs delivered through imaging research over the last 15 years, and how these breakthroughs generate new opportunities and value.  We will discuss the contributions that High Performance Computing has made to enable imaging success, both in terms of growing compute power and the collaboration between imaging researchers and computational scientists. Many of the imaging technologies only became practical as computing caught up to our requirements. We will discuss the future directions for imaging, and the computing challenges required to deliver our future.

Speakers
avatar for Eric Green

Eric Green

Vice President, Advanced Seismic Imaging, BP
Eric Green is Vice President for Advanced Seismic Imaging at BP and leads a global organization of highly specialized and experienced geophysicists challenged with solving BP’s most pressing subsurface imaging challenges.  Eric has 33 years of experience in all aspects of the Oil... Read More →


Thursday March 5, 2015 14:30 - 15:15 CST
BioScience Research Collaborative 6500 Main Street, Houston, Tx 77005

15:15 CST

Poster Session & Closing Reception
Thursday March 5, 2015 15:15 - 17:15 CST
BioScience Research Collaborative 6500 Main Street, Houston, Tx 77005

15:15 CST

Poster: 'A High Performance JPEG-XR Image Compression Library,' Lai Wei, Rice University
JPEG-XR, short for JPEG extended range, is a still image compression standard based on HD Photo, originally developed by Microsoft. JPEG-XR supports various image compression features, including both lossless and lossy compression, a high compression ratio, and a tile-based design.     

In a collaboration between Shell and Rice, we developed a high performance JPEG-XR image compression library based on an open source reference implementation to meet Shell's image compression needs. We implemented a thread-safe JPEG-XR encoder and decoder so that one can use them to process multiple images in parallel. Using OpenMP to apply our thread-safe JPEG-XR encoder/decoder to a set of image planes, we achieve nearly optimal speedup on Intel Westmere.

Speakers

Thursday March 5, 2015 15:15 - 17:15 CST
BioScience Research Collaborative 6500 Main Street, Houston, Tx 77005

15:15 CST

Poster: 'A Meshless Approach to Modeling Fluid-Filled Fracture Propagation in a Porous Medium,' Javier Villarreal, Rice University
Induced hydraulic fracturing is the process whereby rock is fractured with pressurized fluids to allow easier extraction of natural resources from the ground, such as natural gas or petroleum. Simulation of hydraulic fracturing can be complicated because of the inherent discontinuities in the geometry when simulating crack propagation. One method of simplifying simulation of fluid-driven crack propagation is by using phase field models. With phase field models, rather than treating each phase (i.e. the cracked material and the medium) separately and applying boundary conditions at the crack interfaces, a continuous phase variable is applied throughout the domain. Because the phase variable is continuous, it can be modeled with a differential equation. An adaptive meshless method using collocation of radial basis functions can be used to approximate a phase field model. An advantage of the meshless method is that there is no mesh that can potentially influence the results of a simulation. In addition, conventional adaption schemes can be used to add evaluation points to the domain in areas with high derivatives, which reduces both the computational overhead and the user interaction required, in contrast with having to create a new mesh. This is particularly useful near crack interfaces, where the phase variable derivatives are high and a finer resolution in the grid is desired.


Thursday March 5, 2015 15:15 - 17:15 CST
BioScience Research Collaborative 6500 Main Street, Houston, Tx 77005

15:15 CST

Poster: 'A Spark-based Seismic Data Analytics Cloud,' Yuzhong Yan, Prairie View A&M University
Cloud Computing as a disruptive technology, provides a dynamic, elastic and promising computing climate to tackle the challenges of big data processing and analytics. Hadoop and MapReduce is the widely used open source framework in Cloud Computing for storing and processing big data in the scalable fashion. Spark is the latest parallel computing engine working together with Hadoop that exceeds MapReduce performance via its in-memory computing and high level programming features. In this paper, we have created a seismic data analytics cloud platform on top of Hadoop and Spark, and experimented the productivity and performance for with a few basic but representative seismic data processing algorithms. We created a variety of seismic processing templates to simplify the programming efforts in implementing scalable seismic data processing algorithms by hiding the complexity of parallelism. The Cloud platform generates a complete Spark application based on user's program and configurations, and allocate resources to meet the program requirements.

Speakers

Thursday March 5, 2015 15:15 - 17:15 CST
BioScience Research Collaborative 6500 Main Street, Houston, Tx 77005

15:15 CST

Poster: 'Accelerating Extended Least Squares Migration with an approximate inverse to the extended Born modeling operator,' Jie Hou, Rice University
Least Squares Migration (LSM) iteratively achieves a mean square best fit to seismic reflection data, provided that a kinematically accurate velocity model is supplied. The subsurface offset extension adds an extra degree of freedom to the model, thereby allowing LSM to fit the data even in the event of significant velocity error. This type of extension also implies additional expense per iteration from cross-correlating source and receiver wavefields over the subsurface offset, and therefore places a premium on rapid convergence. We accelerate the convergence of Extended Least Squares Migration (ELSM), by combining a modified Conjugate Gradient algorithm with a suitable preconditioner. The preconditioner uses an approximate inverse to the extended Born modeling operator. Numerical examples demonstrate that the proposed algorithm dramatically reduces the number of iterations required to achieve a given level of fit or gradient reduction, compared to ELSM by unpreconditioned conjugate gradient iteration.

Speakers

Thursday March 5, 2015 15:15 - 17:15 CST
BioScience Research Collaborative 6500 Main Street, Houston, Tx 77005

15:15 CST

Poster: 'Adaptive hierarchical sparse-grid integration for uncertainty quantification,' Timur Takhtaganov, Rice University
I present adaptive hierarchical sparse-grid quadrature methods for the efficient evaluation of high-dimensional integrals arising in uncertainty quantification or in optimization under uncertainty. High-dimensional integration is needed when the uncertainty in the quantities of interest (such as oil production) of systems subject to random variables (such as reservoir equations with uncertainties in the geological parameters) is quantified via statistical moments (e.g., expected value, variance) or risk measures (e.g., conditional value at risk (CVaR)).  I compare sparse-grid methods based on the tensor products of 1D integration and interpolation rules using global or local polynomials. Sparse-grid methods allow to achieve the desired level of accuracy by using much less function evaluations than full tensor product rules, thus circumventing to a certain extent the "curse of dimensionality" - an exponential growth in the number of sampling points with the increasing number of random variables. This becomes especially important when each function evaluation requires solving a PDE. Particular emphasis is put on the problems where the quantity of interest is non-smooth with respect to the random variables.   I present recent developments in the adaptive sparse-grid methods utilizing hierarchical basis functions. I compare the performance of these methods on a model problem governed by a transport equation with uncertain inputs.


Thursday March 5, 2015 15:15 - 17:15 CST
BioScience Research Collaborative 6500 Main Street, Houston, Tx 77005

15:15 CST

Poster: 'Adding vectors and matrix support to SimSQL, an analytic relational database system,' Shangyu Luo, Rice University
SimSQL is a scalable, parallel and analytic distributed database system, with several modifications that make it useful as a platform for very large scale machine learning. For my research, I added native support for vector and matrix data types to SimSQL. That is, rows in a database table can contains vector or matrix data. This has several advantages compared to the natural, sparse representation of vectors and matrices in the relational model. First, it can save storage space, since a dense matrix (for example) can be stored more compactly in a contiguous block of memory as opposed to a large number of (row, column, val) triples in a relational table. Second, it can provide significant performance improvements for many algorithms, especially machine learning computations that naturally map to vector and matrix computations.

Speakers

Thursday March 5, 2015 15:15 - 17:15 CST
BioScience Research Collaborative 6500 Main Street, Houston, Tx 77005

15:15 CST

Poster: 'Analysis of Separation Mechanisms in Molecular transport through channels,' Shaghayegh Agah, Rice University
Separation is one of the most important steps for numerous technological, industrial and medical processes. Although current experimental methods provide a variety of ways to separate chemical mixtures, they are frequently not efficient and expensive. Utilizing transport through channels and pores has recently been suggested for separating molecular mixtures borrowing the idea from biological systems where efficient, fast, and robust separations are achieved. In order to use and scale up this method, selectivity mechanism should be well understood.

Molecular transport through nanochannel is a complex process that involves various interactions between the molecules and the channel. These interactions lead to a highly efficient and selective transport. In order to understand this complex phenomenon, the effects of molecular-pore interaction in selectivity are studied. We analyze channels with N binding sites employed for separating mixtures of two types of molecules. The molecule transport is assumed as transitions between different states in the channel. It has been suggested to use discrete-state stochastic models that consider all chemical transitions. More specifically, Master equations in stationary states are written for all sites and solved numerically. It is found that, the strength and spatial distribution of molecular-channel interaction have a significant effect on selectivity.

Based on our analysis, the best separation occurs when the site with the specific interaction is located close to the entrance of the channel. Also, large entrance rates into the channel increases the selectivity. In addition, it has been demonstrated that repulsive interactions have higher selectivity than attractive ones. For the application point of view, this idea have the potential to be used in special membranes with different interaction sites to separate species. 

Speakers

Thursday March 5, 2015 15:15 - 17:15 CST
BioScience Research Collaborative 6500 Main Street, Houston, Tx 77005

15:15 CST

Poster: 'Bottom-up Insight into the Morphology, Spectroscopy, and Photophysics of Thiophene Based Polymers,' Ryan Haws, Rice University
The performance of polythiophene based polymers as a donor in donor:acceptor organic photovoltaic devices is highly dependent on its chemical architecture, molecular conformation, and nanoscale morphology, as the optical and electronic properties are highly structurally sensitive. Here we use classical methods to sample structure and mixed quantum-classical methods to analyze photophysical properties and excited state dynamics. Insight is gained into the role of regioregularity, side chains, and aggregation in the morphology and spectroscopic properties from single polymer chains, to nanosized aggregates and bulk films.

Speakers

Thursday March 5, 2015 15:15 - 17:15 CST
BioScience Research Collaborative 6500 Main Street, Houston, Tx 77005

15:15 CST

Poster: 'Calculation of Solubility Parameters of Crude Oil Systems as a Function of Pressure, Temperature and Composition using Experiments, Thermodynamic Modeling and Molecular Simulations,' Mohan Boggara, Rice University
Asphaltenes are the heaviest and most complex fraction of the crude oil. They are polydisperse and are defined as a solubility class which is soluble in aromatic solvents and insoluble in n-alkanes. Asphaltenes can precipitate and deposit in wellbores due to changes in temperature, pressure and composition. Such asphaltene deposition problems reduce the oil production and cost millions of dollars in mitigation and remediation efforts. Our group focuses on the development and implementation of advanced thermodynamic models and experiments to address the thermo-physical characterization and the phase behavior of crude oils in general and asphaltene precipitation in particular. Solubility parameter significantly influences the asphaltene precipitation behavior and is therefore a critical parameter in the context of understanding flow assurance issues related to Asphaltenes. This work focuses on the exploration of the solubility behavior of asphaltenes in various model crude oil samples at a molecular level by calculating solubility parameters of the mixtures over a wide range of temperatures and pressures. Based on previous work, the solubility parameter can be correlated with the density and the refractive index at ambient conditions. By measuring the density and the refractive index data, experiments can be executed to evaluate the solubility parameter at ambient conditions. Using this information and advanced equation-of-state modeling, solubility parameters over wide range of temperatures and pressures (covering reservoir/wellbore conditions) are calculated and will be presented. Molecular dynamics simulations in predicting solubility parameter values of solvent mixtures and asphaltene molecular models under reservoir conditions will also be presented.

Speakers

Thursday March 5, 2015 15:15 - 17:15 CST
BioScience Research Collaborative 6500 Main Street, Houston, Tx 77005

15:15 CST

Poster: 'Communication Avoiding Algorithms: Analysis and Code Generation for Parallel Systems,' Karthik Murthy, Rice University
The computing power of a processor has increased a thousand fold over the last 20 years. As the computing power continues to increase following the predictions of "Moore's law," communication is increasingly becoming the principal power and performance bottleneck. The "ExaScale Computing Study: Technology Challenges in Achieving Exascale Systems" opines that it is easier to solve the power problem for intra-node computation than it is to solve the problem of inter-processor communication. The class of communication-avoiding .5D algorithms were developed to address the problem of inter-processor communication. These algorithms reduce communication and provide strong scaling in both time and energy. As a first step towards automating the development of communication-avoiding libraries, we developed the Maunam compiler.  Maunam generates efficient parallel code from a high level, global view sketch of .5D algorithms that are expressed using symbolic data sizes and numbers of processors. It supports the expression of data movement and communication through high-level global operations such as TILT and CSHIFT as well as through element-wise copy operations. Wrap-around communication patterns can also be achieved using subscripts based on modulo operations.  Maunam employs polyhedral analysis to reason about the communication and computation present in the input .5D algorithm. It partitions data and computation then inserts point-to-point and collective communication as needed. Maunam also analyzes data dependence patterns and data layouts to identify reductions over processor subsets.  Maunam-generated Fortran+MPI code for 2.5D matrix multiplication running on 4096 cores of a Cray XC30 supercomputer achieves 59 TFlops/s (76% of the machine peak).

Speakers

Thursday March 5, 2015 15:15 - 17:15 CST
BioScience Research Collaborative 6500 Main Street, Houston, Tx 77005

15:15 CST

Poster: 'Computational Science Undergraduate Research Experiences (CSURE) in National Institute for Computational Sciences (NICS),' Kwai Wong, University of Tennessee
The Computational Science Undergraduate Research Experiences (CSURE) is a Research Experiences for Undergraduates (REU) program supported by the National Science Foundation (NSF). This poster shows the list of projects performed by the students for the last two years.

Speakers

Thursday March 5, 2015 15:15 - 17:15 CST
BioScience Research Collaborative 6500 Main Street, Houston, Tx 77005

15:15 CST

Poster: 'Depth-oriented Extended Full Waveform Inversion with an Unknown Source Function,' Lei Fu, Rice University
This study applies variable projection method to estimate the time function of seismic source in extended full waveform inversion (EFWI), which makes it feasible to reconstruct the subsurface structure robustly and accurately. In practice, without extremely low-frequency data or a good initial model of the earth that contains the long-scale structure information, seismic full waveform waveform inversion is very likely to be trapped in many stationary points apart from its global minimum and fail to recover the subsurface structure correctly. EFWI overcomes this obstacle by adding an additional dimension of freedom. Introduced by Symes, the extended modeling concept combines the global convergence of migration velocity analysis with the physical fidelity of waveform inversion. On the other hand, the recorded seismic data is influenced by both sources and the physical properties of the subsurface. As a consequence, in order to reconstruct the earth properties correctly, the seismic source either needs to be known, or it must be estimated along with the earth properties. Although source inversion is an indispensable component of seismic inversion, it is still unexplored in extended full waveform inversion. The proposed new method is expected not only to correctly reconstruct the physical properties of the subsurface with conventional seismic primary reflection data, but also to accurately estimate the source wavelet.

Speakers

Thursday March 5, 2015 15:15 - 17:15 CST
BioScience Research Collaborative 6500 Main Street, Houston, Tx 77005

15:15 CST

Poster: 'Distributive Interoperable Executive Library (DIEL) for Multi-disciplinary System-Wide Scientific Simulations,' Kwai Wong, University of Tennessee
In this poster, we present a novel integrative software platform – the Distributive Interoperable Executive Library (DIEL) - to facilitate the collaboration, exploration, and execution of multiphysics modeling projects suited for a diversified research community on emergent large-scale parallel computing platforms. It does so by providing a managing executive, a layer of numerical libraries, a number of commonly used physics modules, and two set of native communication protocols. DIEL allows users to plug in their individual modules, prescribe the interactions between those modules, and schedule communications between them. The DIEL framework is designed to be applicable for preliminary concept design, sensitivity prototyping, and productive simulation of a complex system.

Speakers

Thursday March 5, 2015 15:15 - 17:15 CST
BioScience Research Collaborative 6500 Main Street, Houston, Tx 77005

15:15 CST

Poster: 'Efficient implementation of cactus stack in work-stealing runtimes,' Chaoran Yang, Rice University
Multithreaded concurrency platforms that employ a work-stealing scheduler often requires to support a “cactus stack”, wherein multiple child functions of a common calling ancestry can exist and operate in parallel. Unfortunately, such existing concurrency platforms failed to support a cactus stack without making at least one of the following sacrifice:

- unable to interoperate with legacy or third-party serial binaries that have been compiled to use an ordinary linear stack,
- a weak time bound that cannot provide near-perfect linear speedup on some application even with sufficient parallelism,
- unbounded or extravagant use of memory for the cactus stack, or
- requires special support from the operation system kernel.

We have addressed this cactus-stack problem by carefully manage the memory used for cactus stacks, while maintaining the near-perfect linear speedup time bound. Even though our work-stealing runtime may consume up to DPS virtual-address space, the physical memory it uses is bounded by P(S + D), where D is the depth of stack, P is the number of processors, and S is the stack size. Benchmark results show that our implementation incurs very low overhead comparing to previous approaches and achieves up to 2x speedup over the state of the art CilkPlus runtime library.

Speakers

Thursday March 5, 2015 15:15 - 17:15 CST
BioScience Research Collaborative 6500 Main Street, Houston, Tx 77005

15:15 CST

Poster: 'Estimation of Relative Permeabilities in Porous Media Flow,' Caleb Magruder, Rice University
I investigate the estimation of relative permeability curves through core flooding experiments. This estimation problem is formulated as a nonlinear least squares problem governed by a coupled system of partial differential equations (PDEs) modeling the flow of water and oil through the porous medium. This complex PDE-constrained least squares problem is solved using a Gauss-Newton method. The sensitivity information required for this method is needed to derive statistical bounds for the parameter estimation error based on the Fisher Information Matrix.

The introduction of statistical confidence intervals for maximum likelihood estimators allows practitioners to quantify uncertainty due to noise in measurements for parameter estimation problems. In future steps, this approach can be extended to compute optimal experimental designs to further improve the quality of the computed parameters. This research has immediate applications to many pertinent problems in the oil and gas community including reservoir simulation, experiment design of core flooding, and history matching.

Thursday March 5, 2015 15:15 - 17:15 CST
BioScience Research Collaborative 6500 Main Street, Houston, Tx 77005

15:15 CST

Poster: 'Extending FWI into the Plane Wave Domain,' Papia Nandi-Dimitrova, Rice University
This research seeks to reduce the uncertainties in estimating acoustic velocities, a property of the subsurface playing a key role in the quality of seismic images. Our approach is a variant of Full Waveform Inversion (FWI). This technique iteratively modifies a velocity model towards one that fits the data best. It is effective in predicting short wavelength reflectivity in seismic data, but often fails to update long wavelength component. The resulting model often represents a local minimum in the solution space, and does not correctly describe Earth structure. We employ extended models, with additional parameters beyond location in the subsurface, which serve to relax the FWI problem and make its global minimum easier to attain. Previous research has suggested that this extended FWI (EFWI) can recover long wavelength component of a velocity model. This research extends earlier work using synthetic data to field OBS node and streamer data in the plane wave domain. The methods under development use existing data acquisition modes and public domain software to provide a global solution for FWI, thus adding to the growing number of solutions to tackle one of the greatest challenges facing velocity estimation today.


Thursday March 5, 2015 15:15 - 17:15 CST
BioScience Research Collaborative 6500 Main Street, Houston, Tx 77005

15:15 CST

Poster: 'Fast Step Transition and State Identification (STaSI) for Discrete Single-Molecule Data Analysis,' Bo Shuang, Rice University
Interpretation of noisy signal is critical for early disaster detection, fast dynamic exploration, weak signal transportation, etc.. Even though denoising techniques based on different models are well-developed, model selection is still subjective and experience-based. To solve this problem, we introduce an objective model selection algorithm for piecewise constant signal based on Occam's razor: "the fewer assumptions that are made, the better". According to minimum description length principle (one formalization of Occam's razor), signal interpretation based on different models require different amount of storage space in a computer, and the model with the minimum storage space should be the closest approximation to the mechanism behind the signal. Our algorithm provides comprehensive, objective analysis of multiple data sets requiring few user inputs about the underlying physical models and is faster and more precise in determining the number of states than other established and cutting-edge methods, and thus can be applied to a broad range of signals.

Speakers

Thursday March 5, 2015 15:15 - 17:15 CST
BioScience Research Collaborative 6500 Main Street, Houston, Tx 77005

15:15 CST

Poster: 'Graphene-like sheets acting as selective sieves or impermeable membranes,' Gustavo Brunetto, Rice University
Recently, it was proposed that graphene membranes could act as impermeable atomic structures to standard gases. For some other applications, a higher level of porosity is needed, and the so-called Porous Graphene (PG) and Biphenylene Carbon (BPC) membranes are good candidates to effectively work as selective sieves. In this work we have used classical molecular dynamics simulations to study the dynamics of membrane permeation of He and Ar atoms and possible selectivity effects. For the graphene membranes we did not observe any leakage through the membrane and/or membrane/substrate interface until a critical pressure limit, then a sudden membrane detachment occurs. PG and BPC membranes are not impermeable as graphene ones, but there are significant energy barriers to diffusion depending on the atom type. Our results show that this kind of porous membranes can be effectively used as selective sieves for pure and mixtures of gases.

Speakers

Thursday March 5, 2015 15:15 - 17:15 CST
BioScience Research Collaborative 6500 Main Street, Houston, Tx 77005

15:15 CST

Poster: 'Graphyne membranes functionalization: A Fully Atomistic Molecular Dynamics Investigation,' Pedro Autreto, State University of Campinas/Rice University
Graphyne is broad name for a carbon allotrope family of 2D structures, where acetylenic groups connect benzenoid rings, with the coexistence of sp and sp2 hybridized carbon atoms. In this work we have carried out,through fully atomistic reactive molecular dynamics simulations (ReaxFF), the dynamics and structural changes of the functionalization of graphdiyne, α, β, and γ graphyne . Our results have presented that the existence of different sites for hydrogen bonding, related to single and triple bonds, makes the process of incorporating hydrogen atoms into graphyne membranes much more complex than the graphene ones. Our results also show that functionalization reactions are strongly site dependent and that the sp-hybridized carbon atoms are the preferential sites to chemical attacks. In our cases, the effectiveness of the hydrogenation (estimated from the number of hydrogen atoms covalently bonded to carbon atoms) follows the α, β, γ-graphyne structure ordering.

Speakers

Thursday March 5, 2015 15:15 - 17:15 CST
BioScience Research Collaborative 6500 Main Street, Houston, Tx 77005

15:15 CST

Poster: 'High Surface Area Activated Asphalt for CO2 Capture,' Almaz Jalilov, Rice University
Research activity toward the development of new sorbents for carbon dioxide (CO2) capture have been increasing quickly. Despite the variety of existing materials with high surface areas and high CO2 uptake performances, the cost of the materials remains a dominant factor in slowing their industrial applications. Here we report preparation and CO2 uptake performance of microporous carbon materials synthesized from asphalt, a very inexpensive carbon source. Carbonization of asphalt with potassium hydroxide (KOH) at high temperatures (>600 °C) yields porous carbon materials (A-PC) with high surface areas of up to 2780 m2 g–1 and high CO2 uptake performance of 21 mmol g–1 or 93 wt % at 30 bar and 25 °C. Furthermore, nitrogen doping and reduction with hydrogen yields active N-doped materials (A-NPC and A-rNPC) containing up to 9.3% nitrogen, making them nucleophilic porous carbons with further increase in the Brunauer–Emmett–Teller (BET) surface areas up to 2860 m2 g–1 for A-NPC and CO2 uptake to 26 mmol g–1 or 114 wt % at 30 bar and 25 °C for A-rNPC. This is the highest reported CO2 uptake among the family of the activated porous carbonaceous materials. Thus, the porous carbon materials from asphalt have excellent properties for reversibly capturing CO2 at the well-head during the extraction of natural gas, a naturally occurring high pressure source of CO2. Through a pressure swing sorption process, when the asphalt-derived material is returned to 1 bar, the CO2 is released, thereby rendering a reversible capture medium that is highly efficient yet very inexpensive.

Speakers

Thursday March 5, 2015 15:15 - 17:15 CST
BioScience Research Collaborative 6500 Main Street, Houston, Tx 77005

15:15 CST

Poster: 'Hybrid agent-based-simulation framework for understanding self-organization in bacteria,' Rajesh Balagam, Rice University
Many pathogenic bacteria evade antibiotic treatments and host immune responses by forming biofilms, where bacterial cells embed themselves inside a layer of self-produced gel material and attach to the host surfaces. Inside biofilms, cells self-organize into complex 3-dimensional multicellular structures. This process requires cell communication and cell movement coordinated with number of chemical and mechanical interactions. Genetic studies have uncovered many of the biochemical signals involved; however the role of mechanical interactions is poorly understood. We investigate the role of mechanical interactions in self-organization of a model bacterium, Myxococcus xanthus, through agent-based computer simulations.

To this end, we have developed a hybrid agent-based simulation framework that can simulate interactions among thousands of cells. Each agent in this framework is based on a detailed biophysical model of single M. xanthus cell. Thus this framework allows for accurately modelling individual cell behavior and yet is able capture the emergent self-organization behavior for large number of cells. Using this framework we have investigated the mechanism of individual cell movement and self-organization behavior among cell groups (~10^2-10^3cells).

Mechanism of individual cell movement in M. xanthus is still not completely understood. By simulating pair-wise cell collisions using our model we are able to discriminate between two competing hypotheses of force generation responsible for M. xanthus movements. Comparison of our results with experimentally observed cell behavior predicts that strong adhesive attachments between cell and substrate are required for M. xanthus movement. This prediction is verified to be true in further experimental studies. Next we have extended our model to investigate the self-organization of M. xanthus cells into cell clusters during initial phase of biofilm formation. Our simulations demonstrate that this cell clustering is an emergent behavior resulting from interplay of various mechanical processes among M. xanthus cells. Furthermore, our framework is able to reproduce the distinct dynamic cell clustering behavior observed for different M. xanthus motility mutants. 

Speakers

Thursday March 5, 2015 15:15 - 17:15 CST
BioScience Research Collaborative 6500 Main Street, Houston, Tx 77005

15:15 CST

Poster: 'Macro-Dataflow Programming for Mapping on Heterogeneous Architectures,' Alina Sbirlea, Rice University
In this poster we describe a new intermediate graph representation for macro-dataflow programs, DFGR, which is capable of offering a high-level view of applications for easy programmability, while allowing the expression of complex applications using dataflow principles. DFGR makes it possible to write applications in a manner that is oblivious of the underlying parallel runtime, and can easily be targeted by both programming systems and domain experts. Coupled with a parallel runtime, DFGR offers good load balancing, performant and energy efficient execution on heterogeneous architectures. In addition, DFGR can use further optimizations in the form of graph transformations, enabling efficient task composition and assignment, for improved scalability.

Speakers

Thursday March 5, 2015 15:15 - 17:15 CST
BioScience Research Collaborative 6500 Main Street, Houston, Tx 77005

15:15 CST

Poster: 'Mixing of passive scalars advected by in compressible enstrophy-constrained flows,' Xiaoqian Xu, Rice University
Consider a diffusion-free passive scalar theta being mixed by an incompressible flow u which satisfies periodic boundary condition. Our aim is to study how well this scalar can be mixed under an enstrophy constraint on the advecting velocity field. Our main result shows that the mix-norm (H minus one norm) is bounded below by an exponential function of time. We will also perform numerical simulations and confirm that the numerically observed decay rate scales similarly to the rigorous lower bound, at least for a significant initial period of time. 

Speakers

Thursday March 5, 2015 15:15 - 17:15 CST
BioScience Research Collaborative 6500 Main Street, Houston, Tx 77005

15:15 CST

Poster: 'Model Order Reduction in Porous Media Flow Optimizations,' Mohammadreza Ghasemi, Texas A&M University
The numerical simulation of high-fidelity oil and gas reservoir models is a challenging task in computationally intensive frameworks, such as history matching, optimization and uncertainty quantification. There are many methods to alleviate this issue, such as the use of high performance computing (HPC), approximations using proxy models and surrogate modeling, to name a few. Here, we take a more rigorous mathematical approximation method using concepts from system theory, where the large scale model is replaced by a reduced complexity model with a much lower dimension and reasonable accuracy. The main benefit of going through this approximation is that computational costs for simulating the complex large-scale model can be improved several orders of magnitudes.  In this poster, I will give an overview of the model order reduction (MOR) and discuss recent and ongoing efforts to develop new techniques in applying them to porous media flow simulation and optimization. These methods are based on proper orthogonal decomposition. In particular, to reduce the computational complexity of the underlying nonlinear partial differential equations, one needs to find a way to reduce the dimension of the nonlinear terms. Here I specifically consider discrete empirical interpolation methods, trajectory piece wise linearization, and bilinear quadratic formulation. These approaches allow achieving the computational cost that is independent of the fine grid dimension.  In addition, we will discuss how MOR can accelerate the production optimization workflows. We will demonstrate these ideas by applying MOR techniques to a benchmark model of an offshore reservoir with large number of producers, injectors, and control variables that need to be adjusted during the optimization process. The results will be compared with the outputs of a high fidelity model in terms of number iterations, computational time, and the improvement in NPV. Coupling MOR and HPC is also suggested for improving the results. 

Speakers
RG

Reza Ghasemi

Reservoir Engineer, Stone Ridge Technology


Thursday March 5, 2015 15:15 - 17:15 CST
BioScience Research Collaborative 6500 Main Street, Houston, Tx 77005

15:15 CST

Poster: 'Model-Based Informiaiton Acquisition and Retrieval via Compressive Sensing,' Yun Li, Rice University
Compressive sensing can recover information from a limited number of measurements. Just like seismic inversions, once a small number of observations are gathered, we re-formate this ill-conditioned inverse problem into an optimization problem, adding additional regulations. As the data we are facing increases rapidly. Nowadays, our research becomes a very promising answer to the data delude by controlling the size of the information when we sense it. Also the use of low dimensional data acquisition equipment reduces not only the investment on the tools but also time and energy on acquisition and processing. Applications have been realized in infrared cameras, biological microscopy and medical imaging. It also has high potential in geophysical exploration. As there are high and urgent demands on seismic imaging with high quality, an important problem is the design of seismic survey. My research may help to design more efficient surveys. Another scenario of this application is well logging. Terabytes of data are acquired and transmitted and later processed remotely every day for a single inference - what kind of layer it is. If the inference can be efficiently made during the acquisition, huge amount of redundant of data can be avoided.

Speakers
avatar for Yun Li

Yun Li

PhD student at Rice University, Rice University


Thursday March 5, 2015 15:15 - 17:15 CST
BioScience Research Collaborative 6500 Main Street, Houston, Tx 77005

15:15 CST

Poster: 'NIR and MIR Charge Transfer Plasmons in Wire-Bridged Antennas,' Yue Zhang, Rice University
We investigate optical properties of wire-bridged plasmonic nanoantennas. Here we found two spectral features: a dipolar plasmon in the visible and a Charge Transfer Plasmon (CTP) in the infrared. The CTP depends sensitively on the conductance of the junction wire, offering a controllable way for tuning the plasmon resonance to the desired wavelength regime via junction geometries. Here we use single-particle dark field spectroscopy from UV, visible to IR to identify plasmonic modes in different spectrum regimes. The simulations using Finite-difference time-domain (FDTD) method are in good agreement with experiment: Increasing the junction wire width and concurrently the junction conductance blue shifts resonance positions, and simultaneously modifies scattering strengths, the linewidth of CTP and dipolar plasmon. We notice that CTP in a much longer wavelength regime and preserving a narrow line width, an important implication for designing IR plasmons with a high quality factor for enhanced spectroscopy and sensing applications. We also extend the CTP to the IR regime by increasing the wire length to create IR plasmon while keeping the line width of the resonance. Our work offers an alternative way for studying the charge transfer properties in plasmonic nanostructures. Not only it adds another degree in understanding the charge transfer properties in plasmonic nanostructures but also offers an optical platform for studying molecules transport at optical frequencies and related applications. 

Speakers

Thursday March 5, 2015 15:15 - 17:15 CST
BioScience Research Collaborative 6500 Main Street, Houston, Tx 77005

15:15 CST

Poster: 'OCCA Accelerated Lattice Boltzmann Method in Core Sample Analysis,' Zheng Wang, Rice University
The lattice Boltzmann method (LBM) is a relatively new technique in computational fluid dynamics. Its capability in coping with complicated boundary conditions arouses great interest from oil and gas industry. In this study, the LBM was used for the simulation of fluid in order to analyze the transport properties of a core sample. In addition, a combination of high-performance computing and the LBM provides the feasibility of obtaining high-resolution results. To this end, the OCCA portability library, an unified approach to multi-threading programming, was employed in the provided implementation, which enables the LBM code to utilize several different application programming interfaces (APIs) including open computing language (OpenCL), compute unified device architecture (CUDA), and open multi-processing (OpenMP). As a result, the OCCA accelerated code on a GPU had substantial speedups compared with an unoptimized serial code on CPU and realized the efficiency requirement of industrial production.

Speakers

Thursday March 5, 2015 15:15 - 17:15 CST
BioScience Research Collaborative 6500 Main Street, Houston, Tx 77005

15:15 CST

Poster: 'On Magneto-Optical Magnetic Flux Leakage Sensing,' David Trevino-Garcia, Rice University
We investigate the feasibility of using Magnetic Optic films (MO-films) as sensors for Magnetic Flux Leakage (MFL) inspection technique as well as the accuracy of existing analytical MO-film models used for interpretation. We illustrate the accuracy of MO-film sensors and their analytical models introduced by Shamonin etal [1] through a comparison between MO-film experimental responses and results computed through MO analytical models. The mathematical models are being fed by accurate data from the RiSYS testbed setup’s experimental MFL surface-scans (Hall-effect sensor) and simulations made through an analytical method known as Magnetic Dipole Model (MDM) [2] widely used in MFL field. MFL signal occurrences is produced by a electromagnetic saturated ferromagnet specimen in the presence of a surface breaking defect, the MFL components of the MFL signal (axial, radial and tangential) are measured by a Hall-effect sensor through a line-scan (sampling measurements along a line path). The implemented MO-film sensor on this work is an iron garnet film grown on [111]-oriented substrates, which is used as magneto-optic indicator film for the visualisation of magnetic field. Our analysis shows a good agreement between experimental measured data and analytical simulations with the MO-film and its analytical model.  


Thursday March 5, 2015 15:15 - 17:15 CST
BioScience Research Collaborative 6500 Main Street, Houston, Tx 77005

15:15 CST

Poster: 'Optimal Control of Miscible Displacement Equations using Discontinuous Galerkin Methods,' Brianna Lynn, Rice University
In the energy industry, reservoir simulators enable oil companies to optimize oil and gas production.  I analyze the accuracy of the discontinuous Galerkin method when solving an optimal control problem for the miscible displacement equations, which model a tertiary oil recovery process. Miscible displacement is a process in a porous media when one fluid is injected into a well to displace another fluid. An optimal control problem is a way to find a solution to a given constraint as well as optimize some control function. For the miscible displacement problem, the constraint is a partial differential equation that models the miscible displacement, where the state variables are the fluid mixture pressure and velocity, as well as the concentration of the injected fluid. The control variables are the flow rates at the injection wells, which are the variables we want to optimize. To approximate the PDE, I use two numerical methods, a discontinuous Galerkin method and a finite element method and then compare the results. Discontinuous Galerkin methods and finite element methods are numerical methods to solve PDEs using weak derivatives. In the finite element method, we assume the approximation is continuous over the domain, while in the discontinuous Galerkin method, we assume the approximation is discontinuous over the domain. In our problem, the domain is time and space. Since this PDE is time dependent, I also use a time integrator method when solving the problem. I combine these methods with optimization theory to solve the PDE while simultaneously solving the optimal control problem. 

Speakers

Thursday March 5, 2015 15:15 - 17:15 CST
BioScience Research Collaborative 6500 Main Street, Houston, Tx 77005

15:15 CST

Poster: 'Performance Analysis and Configuration Selection for Applications in the Cloud,' Ruiqi Liu, Rice University
Cloud computing is becoming increasingly popular and widely used in both industries and academia. Making the best use of cloud computing resources is critically important. Default resource configurations provided by cloud platforms are often not tailored for applications. Hardware heterogeneity in cloud computers such as Amazon EC2 leads to wide variation in performance, which provides an avenue for research in saving cost and improving performance by exploiting the heterogeneity.

I conduct exhaustive measurement studies on Amazon EC2 cloud platforms. I characterize the heterogeneity of resources and analyze the suitability of different resource configurations for various applications. Measurement results show significant performance diversity across resource configurations of different virtual machine sizes and with different processor types. Diversity in resource capacity is not the only reason for performance diversity; diagnostic measurements reveal that the influence from the cloud provider’s scheduling policy is also an important factor.

Furthermore, I propose a nearest neighbor shortlisting algorithm that selects a configuration leading to superior performance for an application by matching the characteristics of the application with that of known benchmark programs. My experimental evaluations show that nearest neighbor greatly reduces the testing overhead since only the shortlisted top configurations rather than all configurations need to be tested; the method achieves high accuracy because the target application chooses the configuration for itself via testing. Even without any test, nearest neighbor is able to obtain a configuration with less than 5% performance loss for 80% of applications.

Speakers

Thursday March 5, 2015 15:15 - 17:15 CST
BioScience Research Collaborative 6500 Main Street, Houston, Tx 77005

15:15 CST

Poster: 'Performance Characterization of Applications across different Programming Models Using Similarity Analysis,' Md Abdullah Shahneous Bari, University of Houston
To solve this problem, we propose a methodology to predict the performance behaviors (speedup, scalability, power consumption) of a given application across different programming models. Our technique predicts how a given application is going to perform using a target programming model (e.g. OpenMP) based on its existing implementation (e.g. serial version). Our methodology uses similarity analysis. We introduce new dynamic features based on hardware counters. We tested our approach using NAS parallel benchmark. We were able to achieve the prediction accuracy around 85%. 


Thursday March 5, 2015 15:15 - 17:15 CST
BioScience Research Collaborative 6500 Main Street, Houston, Tx 77005

15:15 CST

Poster: 'Portable Programming Models for Heterogeneous Platforms,' Deepak Majeti, Rice University
Heterogeneous architectures have become mainstream today and are found in a range of systems from mobile devices to supercomputers. However, these architectures with their diverse architectural features pose several programmability challenges including handling data-coherence, managing computation and data communication, and mapping of tasks and data distributions. Consequently, application programmers have to deal with new low-level programming languages that involves non-trivial learning and training. In our poster, we will present two programming models that tackle some of the aforementioned challenges. The first model is the “Concord” programming model which provides a widely used Intel Thread Building Blocks like interface and targets integrated CPU+GPU architectures with non-shared virtual memory. This model also supports a wide set of C++ language features. Concord is now an open source project from Intel. The second model is “Heterogeneous Habanero C (H2C)”, which is an implementation of the Habanero execution model for modern heterogeneous architectures. The novel features of H2C include high-level language constructs that support automatic data layout, task mapping, data distribution and a unified event framework. We will include evaluation for productivity, portability and performance of Concord and H2C on applications including popular oil&gas applications like Lattice Boltzmann Method simulation.

Speakers

Thursday March 5, 2015 15:15 - 17:15 CST
BioScience Research Collaborative 6500 Main Street, Houston, Tx 77005

15:15 CST

Poster: 'Providing vectors and matrices support to SimSQL, an analytic relational database system,' Shangyu Luo, Rice University
SimSQL is a scalable, parallel and analytic distributed database system. What makes it prominent is that it is really useful for stochastic analysis on simulated data which are the uncertain data that are not actually stored in the database, but are produced by calls to statistical distributions. Such data are commonplace in real life, due to measurement errors or prediction needs. Moreover, SimSQL is also good at dealing with large-scale Bayesian machine learning. For my research, I added native vectors and matrices support to SimSQL. Originally, the data in SimSQL was stored as an integer, a double or a string. Now the data can have two new types: vectors and matrices. By providing such complementation of types, the data can be stored in a more efficient and meaningful way. For example, suppose we want to store a student’s grades of 10 tests. In the old SimSQL, we have to create a table with 10 rows and each row has the student’s name and each grade. Obviously, the student’s name is duplicated for 10 times. In contrast, in the new SimSQL, we just need to create a table with 1 row and this row contains the student’s name as a string and 10 grades as a vector. We have reduced 9 rows of a table. Apart from saving storing space, such new data representation can also provide acceleration for many large-scale machine learning algorithms, especially those with natural vectors/matrices representations such as GMM and Lasso. Previously, we may need to scan thousands of rows to read in a 100x100 matrix, and the vector-matrix and matrix-matrix calculation are tedious. Now we just need one entry to store a 100x100 matrix and those vector/matrix calculations are much succinct. The performance improvement of the new SimSQL has been verified by some preliminary experiments. 

Speakers

Thursday March 5, 2015 15:15 - 17:15 CST
BioScience Research Collaborative 6500 Main Street, Houston, Tx 77005

15:15 CST

Poster: 'Scalable Data Mining Using the STAPL C++ library, Robert Metzger,' Texas A&M University
STAPL is a parallel programming library for C++. STAPL has adopted the generic programming philosophy of the C++ Standard Template Library (STL). It provides a collection of parallel algorithms (pAlgorithms) that process data stored in parallel containers (pContainers) defined by the library. STAPL provides abstract data types defined as parallel views (pViews), which are analogous to the concept of iterators in the STL.

Data mining is becoming a common activity in a variety of industrial settings.
We have implemented three very different data mining algorithms from a Top-10 list of data mining algorithms using STAPL. Data mining algorithms require a variety of data structures. STAPL gives the programmer a complete set of parallel data structures needed for data mining algorithms.
1) K-means Clustering uses arrays.
2) Page Rank uses graphs.
3) Frequent Itemset Mining uses maps and vectors.

STAPL provides nested parallelism with a uniform programming model across parallel delivery mechanisms. Data mining algorithms are an active research area. STAPL gives the programmer a robust implementation of nested parallelism, which can be used to develop new data mining algorithms for advanced architectures in a platform-independent manner. 

Speakers

Thursday March 5, 2015 15:15 - 17:15 CST
BioScience Research Collaborative 6500 Main Street, Houston, Tx 77005

15:15 CST

Poster: 'Software Support for Efficient Use of Modern Parallel Systems,' Milind Chabbi, Rice University
Achieving top performance on modern many-core, accelerated, multi-node systems is a daunting challenge. Production software systems suffer from performance losses at all layers of the software stack, which can be broadly attributed to three main causes: resource idleness, wasteful resource consumption, and insufficient tailoring for architectural characteristics. Effective software tools, adaptive runtime systems, and efficient algorithms developed as part of my doctoral dissertation address instances of performance problems due to each of the aforementioned causes. Tools and techniques that I developed have demonstrated their effectiveness by identifying performance losses arising from developer’s inattention to performance, inappropriate choice of data structures, inefficient algorithms, use of heavyweight abstractions, and ineffective compiler optimizations. This work has had practical impact through collaborations with various national laboratories (LBNL, LLNL, ORNL, and PNNL), industrial, and academic partners. An adaptive runtime developed in collaboration with LBNL eliminated more than 60% redundant barriers in production runs of NWChem - a flagship DOE computational chemistry code. A fine-grained instruction monitoring framework pinpointed inefficiencies, which helped us substantially improve the performance of several important codes. Idleness analysis for heterogeneous architectures, developed as part of Rice University’s HPCToolkit performance tools, helped us diagnose and correct both hardware and software causes of performance losses for important codes running on accelerated supercomputers. Novel, architecture-aware synchronization algorithms deliver high throughput, high fairness, and superior scaling of highly contended locks on deep NUMA architectures. 

Speakers

Thursday March 5, 2015 15:15 - 17:15 CST
BioScience Research Collaborative 6500 Main Street, Houston, Tx 77005

15:15 CST

Poster: 'Speeding up Big Data Jobs through better traffic flow management in file systems,' Simbarashe Dzinamarira, Rice University
Distributed file systems that underlie big data processing frameworks often employ data replication to guard against node failures. A common practice in these file systems is to couple replication with primary copy writes. Unfortunately, under real workloads, the replication traffic ends up contending arbitrarily for limited disk bandwidth against each other and with time sensitive data flows. As a result, task execution times are significantly increased and cluster-wide resource utilization becomes inefficient.

We present FC-FS(Flow controlled file system), a departure from conventional design that allows decoupling of replication from primary copy writes and performs job-aware flow control on file system traffic. Just as you would not backup your computer when it is busy, FC-FS pauses replication when computers in a cluster are busy.

FC-FS employs established flow control concepts such as credit-based flow control and weighted fair queuing in novel ways to allow the realization of a variety of policy objectives, including speeding up executing tasks, rapidly allocating under-utilized disk bandwidth to replication writes, reducing the time to reach failure resilience for important jobs, and progressively achieving increasing levels of failure resilience.

We have augmented HDFS to realize FC-FS. Extensive evaluation on a 30-node Hadoop cluster shows that FC-FS can halve the duration of a job’s write phase while reducing average job runtime by up to 20%.


Thursday March 5, 2015 15:15 - 17:15 CST
BioScience Research Collaborative 6500 Main Street, Houston, Tx 77005

15:15 CST

Poster: 'Speeding Up Your Data Center with Optical Circuit Switching,' Xin Huang, Rice University
Data center driven by optical circuit switching network, or optical data center, is emerging as an alternative to traditional data center where the electrical packet switching network is already overwhelmed by bulk data transfer. Optical data center promises high bandwidth capability, but it is set against circuit reconfiguration delays, which makes circuit management non-trivial. Optical circuit scheduling must effectively manage traffic over various architectures, traffic patterns and data center sizes. Unfortunately, the state-of-the-art circuit scheduling algorithms are not providing satisfactory solutions. They either suffer from poor scalability and bad scheduling choices under certain traffic patterns, or simply lacks the ability of adapting to different architectures. We propose Decomp, a circuit scheduling algorithm for optical data center. In contrast to the previous algorithms, Decomp remains robust under various architectures, traffic patterns and scales. It promotes sharing of the optical resources among traffic, achieving responsive performance while preserving network utilization at low computational cost. Compared with existing algorithms, Decomp is a more adaptive, robust, and scalable circuit management solution for the next-generation optical data center networks.

Speakers

Thursday March 5, 2015 15:15 - 17:15 CST
BioScience Research Collaborative 6500 Main Street, Houston, Tx 77005

15:15 CST

Poster: 'Strongly Scalable High Order Algorithm for Miscible Flooding on Massively Parallel Architecture,' Jizhou Li, Rice University
Introduction
The miscible displacement problem models the displacement of the mixture of two miscible fluids in porous media. The problem models an important process in enhanced oil recovery.
Our high order discretization based on Discontinuous Galerkin (DG) method for both Darcy flow and fluid transport is mass conservative and provides high fidelity simulation results for the miscible flooding even under highly heterogeneous, anisotropic permeability and severe grid distortion.
The DG discretizations result in larger and more ill-conditioned linear systems than the ones from commonly used lower order methods. To address this issue, we apply algebraic multigrid (AMG) and domain decomposition (DD) to construct parallel preconditioners.
With carefully designed Distributed and Unified Numerics Environment (DUNE), we are able to achieve scalability and efficiency for our miscible flow simulator.

Scalable Solver and Preconditioner
We use preconditioned Krylov subspace iterative method as our solver.
The transport system can be preconditioned by overlapping domain decomposition with SSOR or ILU preconditioners.
The Darcy system is preconditioned using overlapping domain decomposition and aggregated algebraic multigrid (AMG) method.The preconditioners yield good convergence even for problem with largely varying permeability in magnitude from 10^{-10} to 10^{-18} m^2 such as in the SPE10 model.
The solvers and preconditioners are scalable on massively parallel computing architecture as we will illustrate in our result where the pressure and concentration are approximated by piecewise quadratic elements over 1,122,000 cells with up to 512 processes on IBM iDataPlex cluster. The AMG solver which is the most time-consuming aspect of the simulation is strongly scalable as we will also demonstrate.

Speakers
JL

Jizhou Li

Graduate Student, Rice University
I am PhD student in Computational and Applied Mathematics at Rice University. I am interested in developing efficient and accurate solutions to porous media flow and transport problems, while maintaining a solid theoretical base.


Thursday March 5, 2015 15:15 - 17:15 CST
BioScience Research Collaborative 6500 Main Street, Houston, Tx 77005

15:15 CST

Poster: 'Unraveling the Sinuous Grain Boundaries in Graphene,' Yang Yang, Rice University
Graphene exhibits diverse potential applications in oil exploration. For example, the losses of the fluid to the surrounding rock are decreased by adding platelets of graphene oxide to a common water-based drilling fluid; microscopic flakes of graphene can form a thinner and lighter filter cake. Grain boundaries (GBs) in graphene are stable strings of pentagon-heptagon dislocations. The GBs have been believed to favor an alignment of dislocations, but increasing number of experiments reveal diversely sinuous GB structures whose origins have long been elusive. Based on dislocation theory and first-principles calculations, an extensive analysis of the graphene GBs is conducted and it is revealed that the sinuous GB structures, albeit being longer than the straight forms, can be energetically optimal once the global GB line cannot bisect the tilt angle. The established atomic structures closely resemble recent experimental images of typical GBs. In contrast to previously used models, the sinuous GBs show improved mechanical properties and are distinguished by a sizable electronic transport gap, which may open potential applications of poly-crystalline graphene and will be technologically promising to enable graphene devices with logic operation in functional devices.

Speakers

Thursday March 5, 2015 15:15 - 17:15 CST
BioScience Research Collaborative 6500 Main Street, Houston, Tx 77005

15:15 CST

Poster: 'Upper Cambrian Microbial Reefs, Mason, Texas: The making of Virtual Outcrops using Photogrammetry,' Pankaj Khanna, Rice University
The discovery of hydrocarbon reservoirs in pre-salt microbial accumulations offshore Brazil and Angola, in addition to a significant microbial component in some of the world’s largest carbonate reservoirs in the Pri-Caspian Basin, has renewed interest in microbial deposits. Spectacular outcrops of Upper Cambrian microbial reefs in Mason County, Texas, offer unique opportunities to assess varying scales of their spatial variation and potentially serve as subsurface analogs to improve reservoir correlation and modeling. These outcrops are available on the Shepard and Zesch private Ranches, along the Llano and James rivers and Mill creek. These ranches became recently accessible to carry out geological field work.

Understanding the heterogeneities and scales within the microbial unit could be best understood by making measurements of their size and spatial characteristics. Photogrammetry, a science to take measurements through photographs is utilized in this study. An aerial survey was conducted to collect digital photographs over ten outcrops (three pavements and seven cliffs). Camerawings, an Aerial Photography Company, was hired to conduct the survey with a drone which has the longest flight time, to be able to cover larger area in single flight, carried Sony NEX-7, 24.3 MP camera, best resolution camera available at the time of data acquisition, a gimble, a tool which keeps camera horizontal during flight and a GPS, to locate the position at which the photograph was taken. The collected data was processed using Agisoft 1.0 which yielded three major products; Orthophotograph, Digital Elevation Model and KMZ (Google Earth readable file format). The initial results from these products include identification of various scales of microbial bioherms, their external morphology and their spatial continuity. Future studies include understanding the architecture and statistically analyzing different scales with a final goal to produce a geological geocellular model of the microbial unit.

The Rice/Trinity Industry Microbial Research Consortium is funded by Chevron, ConocoPhillips, Shell, and Statoil.

Speakers

Thursday March 5, 2015 15:15 - 17:15 CST
BioScience Research Collaborative 6500 Main Street, Houston, Tx 77005

15:15 CST

Poster: 'Variable Projection Method for Extended Born Waveform Inversion with Shot Coordinate Model Extension,' Yin Huang, Rice University
The model of the earth is separated into the smooth long scale background model and the short scale model. Model extension allows the short scale model to depend on shot coordinate. Extended Born modeling is linear in the short scale model and nonlinear in the background model. The objective function of extended Born waveform inversion is the least squares misfit function of extended Born modeling plus a differential semblance term. Local optimization method directly applied to this objective function suffers from cycle skipping problem as classic full waveform inversion does.

In this poster, we use a variable projection method to eliminate the short scale model and obtain a smooth objective function over the background model, which is called a reduced problem. The same optimization method applied to this reduced problem converges faster than the original problem and does not suffer from cycle skipping problem.

Inversion examples using Marmousi model with acoustic constant density modeling illustrates that the reduced problem could converge to a reasonably good background velocity, while the original objective function does not seem to converge after the same time of computation, which suggests that this variable projection has the ability to overcome cycle skipping problem.

Speakers

Thursday March 5, 2015 15:15 - 17:15 CST
BioScience Research Collaborative 6500 Main Street, Houston, Tx 77005

15:15 CST

Poster: 'Wall Thickness Measurement System based on Magnetic Flux Leakage for External Robotic Inspection of Oil and Gas Pipelines,' Issam Ben Moallem, Rice University
Pipelines transport valuable energy resources such as oil and natural gas, keeping substantial utilities in service. A continuous supply is then necessary, while ensuring the integrity of the piping system in terms of safety of the process. Steel pipes are prone by time to defects like corrosion that attacks the wall and may have brutal impacts on both economic and environmental aspects. Hence, pipeline inspection is of great importance to prevent catastrophic failures. The diagnosis of pipelines can be performed using nondestructive testing techniques that evaluate the properties of a material without causing damage. Magnetic flux leakage (MFL) is one method that uses powerful magnetic field which can be generated by portable permanent magnets to locally magnetize and saturate a portion of the pipe under examination. Large wall thinning defects are serious problems threatening thousands of kilometers of aging pipelines around the world. This work follow a numeric approach to measure the wall thickness of a pipe specimen, resulting in a calibrated curve serving as a reference to estimate with a good precision the pipe wall thickness in the real applications. The proposed technique is generic and can be applied systematically for a pipe with specific size and material property. It represents an enhancement over the current practices by avoid doing multiple physical experiments. The method is implemented and simulated using the finite element method package ANSYS. The MFL sensing system is designed for scanning external pipes, crawling smoothly on top of the outer surface by a robotic platform and carrying magnets that generate a strong axial magnetic field.  At areas where there is metal loss, the magnetic flux flowing in the pipe leaks from the steel. The leaking flux is then captured by a Hall effect sensor and the wall thickness is estimated by referring to the calibrated curve.


Thursday March 5, 2015 15:15 - 17:15 CST
BioScience Research Collaborative 6500 Main Street, Houston, Tx 77005

15:15 CST

Poster: 'Well Rate Optimization of Oil Reservoirs,' Xiaodi Deng, Rice University
I develop efficient numerical methods for reservoir well rate optimization in problems that involve a large number of wells and that are modeled by complex reservoir simulations.

In the oil reservoir secondary production stage, water is injected into the reservoir in order to drive oil to production wells. Between hundreds of the subsurface well bores, liquid flows through the heterogeneous porous media. By setting well injection and production rates, reservoir operators attempt to control the subsurface flow pattern to improve water flooding sweep efficiency and production water oil ratio.

In this initial phase of my research, the simulation uses a water oil two-phase immiscible incompressible model with the SPE10 data with highly heterogeneous reservoir porosity and permeability. After a finite volume discretization of the 3D time-dependent system of partial differential equations (PDEs), I have millions of unknowns. To efficiently solve the large-scale PDEs I use the Trilinos framework and MPI to assemble PDE matrices, construct preconditioners, and solve linear systems in parallel.

The reservoir optimization problem involves a large number of optimization variables subject to bound constraints. The number of optimization variables is proportional to the number of wells multiplied by the number of time steps. Each objective function evaluation requires an expensive reservoir simulation. To compute the gradient of the objective function, I use the adjoint equation method. The implementation separates the optimization algorithm and the simulation, allowing independent development of both.

I present numerical results for the SPE10 using gradient projection based optimization methods.

In the next stage of this project, I will incorporate a more complicated reservoir model, improve scalability, investigate other optimization algorithms, and consider model reduction as a way to reduce computation load.

Speakers

Thursday March 5, 2015 15:15 - 17:15 CST
BioScience Research Collaborative 6500 Main Street, Houston, Tx 77005