The Community for Technology Leaders
SC18: International Conference for High Performance Computing, Networking, Storage and Analysis (2018)
Dallas, Texas, USA
Nov 11, 2018 to Nov 16, 2018
ISBN: 978-1-5386-8384-2
TABLE OF CONTENTS

SP-Cache: Load-Balanced, Redundancy-Free Cluster Caching with Selective Partition (PDF)

Yinghao Yu , Hong Kong University of Science and Technology
Renfei Huang , Hong Kong University of Science and Technology
Wei Wang , Hong Kong University of Science and Technology
Jun Zhang , Hong Kong University of Science and Technology
Khaled Ben Letaief , Hong Kong University of Science and Technology
pp. 1-13

BESPOKV: Application Tailored Scale-Out Key-Value Stores (PDF)

Ali Anwar , IBM Research?Almaden
Yue Cheng , George Mason University
Hai Huang , IBM Research?T.J. Watson
Jingoo Han , Virginia Tech
Hyogi Sim , Oak Ridge National Laboratory
Dongyoon Lee , Virginia Tech
Fred Douglis , Perspecta Labs
Ali R. Butt , Virginia Tech
pp. 14-29

Scaling Embedded In-Situ Indexing with DeltaFS (PDF)

Qing Zheng , Carnegie Mellon Universit
Charles D. Cranor , Carnegie Mellon Universit
Danhao Guo , Carnegie Mellon Universit
Gregory R. Ganger , Carnegie Mellon Universit
George Amvrosiadis , Carnegie Mellon Universit
Garth A. Gibson , Carnegie Mellon Universit
Bradley W. Settlemyer , Los Alamos National Laboratory
Gary Grider , Los Alamos National Laboratory
Fan Guo , Los Alamos National Laboratory
pp. 30-44

Fine-Grained, Multi-Domain Network Resource Abstraction as a Fundamental Primitive to Enable High-Performance, Collaborative Data Sciences (PDF)

Qiao Xiang , Tongji University
J. Jensen Zhang , Tongji University
X. Tony Wang , Tongji University
Y. Jace Liu , Tongji University
Chin Guok , Lawrence Berkeley National Laboratory
Franck Le , IBM T.J. Watson Research Center
John MacAuley , Lawrence Berkeley National Laboratory
Harvey Newman , California Institute of Technology
Y. Richard Yang , Tongji University
pp. 58-70

Light-Weight Protocols for Wire-Speed Ordering (PDF)

Hans Eberle , NVIDIA Research
Larry Dennison , NVIDIA Research
pp. 71-82

GPU Age-Aware Scheduling to Improve the Reliability of Leadership Jobs on Titan (PDF)

Christopher Zimmer , Oak Ridge National Laboratory
Don Maxwell , Oak Ridge National Laboratory
Stephen McNally , Oak Ridge National Laboratory
Scott Atchley , Oak Ridge National Laboratory
Sudharshan S. Vazhkudai , Oak Ridge National Laboratory
pp. 83-93

FlipTracker: Understanding Natural Error Resilience in HPC Applications (PDF)

Luanzheng Guo , EECS, UC Merced
Dong Li , EECS, UC Merced
Ignacio Laguna , Lawrence Livermore National Laboratory
Martin Schulz , Technical University of Munich
pp. 94-107

Doomsday: Predicting Which Node Will Fail When on Supercomputers (PDF)

Anwesha Das , North Carolina State University
Frank Mueller , North Carolina State University
Paul Hargrove , Lawrence Berkeley National Laboratory
Eric Roman , Lawrence Berkeley National Laboratory
Scott Baden , Lawrence Berkeley National Laboratory
pp. 108-121

Extreme Scale De Novo Metagenome Assembly (PDF)

Evangelos Georganas , Parallel Computing Lab, Intel Corp
Rob Egan , Lawrence Berkeley National Laboratory
Steven Hofmeyr , Computational Research Division
Eugene Goltsman , Lawrence Berkeley National Laboratory
Bill Arndt , National Energy Research Scientific Computing Center
Andrew Tritt , National Energy Research Scientific Computing Center
Aydin Buluç , Computational Research Division
Leonid Oliker , Computational Research Division
Katherine Yelick , Computational Research Division
pp. 122-134

Optimizing High Performance Distributed Memory Parallel Hash Tables for DNA k-mer Counting (PDF)

Tony C. Pan , Georgia Institute of Technology
Sanchit Misra , Intel Corporation
Srinivas Aluru , Georgia Institute of Technology
pp. 135-147

Redesigning LAMMPS for Peta-Scale and Hundred-Billion-Atom Simulation on Sunway TaihuLight (PDF)

Xiaohui Duan , Shandong University
Ping Gao , Shandong University
Tingjian Zhang , Shandong University
Meng Zhang , Shandong University
Weiguo Liu , Shandong University
Wusheng Zhang , Tsinghua University
Wei Xue , Tsinghua University
Haohuan Fu , Tsinghua University
Lin Gan , Tsinghua University
Dexun Chen , Tsinghua University
Xiangxu Meng , Shandong University
Guangwen Yang , Tsinghua University
pp. 148-159

Large-Scale Hierarchical k-means for Heterogeneous Many-Core Supercomputers (PDF)

Liandeng Li , Tsinghua University
Teng Yu , University of St Andrews
Wenlai Zhao , Tsinghua University
Haohuan Fu , Tsinghua University
Chenyu Wang , University of St Andrews
Li Tan , Beijing Technology and Business University
Guangwen Yang , Tsinghua University
John Thomson , University of St Andrews
pp. 160-170

TriCore: Parallel Triangle Counting on GPUs (PDF)

Yang Hu , The George Washington University
Hang Liu , University of Massachusetts Lowell
H. Howie Huang , The George Washington University
pp. 171-182

Distributed-Memory Hierarchical Compression of Dense SPD Matrices (PDF)

Chenhan D. Yu , The University of Texas at Austin
Severin Reiz , Technical University of Munich
George Biros , The University of Texas at Austin
pp. 183-197

A Parallelism Profiler with What-If Analyses for OpenMP Programs (PDF)

Nader Boushehrinejadmoradi , Rutgers University
Adarsh Yoga , Rutgers University
Santosh Nagarakatte , Rutgers University
pp. 198-211

Energy Efficiency Modeling of Parallel Applications (PDF)

Mark Endrei , The University of Queensland
Chao Jin , The University of Queensland
Minh Ngoc Dinh , The University of Queensland
David Abramson , The University of Queensland
Heidi Poxon , Cray Inc.
Luiz DeRose , Cray Inc.
Bronis R. de Supinski , Lawrence Livermore National Laboratory
pp. 212-224

HiCOO: Hierarchical Storage of Sparse Tensors (PDF)

Jiajia Li , Georgia Institute of Technology
Jimeng Sun , Georgia Institute of Technology
Richard Vuduc , Georgia Institute of Technology
pp. 238-252

Distributed Memory Sparse Inverse Covariance Matrix Estimation on High-Performance Computing Architectures (PDF)

Aryan Eftekhari , Universita della Svizzera italiana
Matthias BollhöFer , TU Braunschweig
Olaf Schenk , Universita della Svizzera italiana
pp. 253-264

PruneJuice: Pruning Trillion-edge Graphs to a Precise Pattern-Matching Solution (PDF)

Tahsin Reza , Lawrence Livermore National Laboratory
Matei Ripeanu , University of British Columbia
Nicolas Tripoul , University of British Columbia
Geoffrey Sanders , Lawrence Livermore National Laboratory
Roger Pearce , Lawrence Livermore National Laboratory
pp. 265-281

Many-Core Graph Workload Analysis (PDF)

Stijn Eyerman , Intel Corporation
Wim Heirman , Intel Corporation
Kristof Du Bois , Intel Corporation
Joshua B. Fryman , Intel Corporation
Ibrahim Hur , Intel Corporation
pp. 282-292

Lessons Learned from Analyzing Dynamic Promotion for User-Level Threading (PDF)

Shintaro Iwasaki , The University of Tokyo
Abdelhalim Amer , Argonne National Laboratory
Kenjiro Taura , The University of Tokyo
Pavan Balaji , Argonne National Laboratory
pp. 293-304

Topology-Aware Space-Shared Co-Analysis of Large-Scale Molecular Dynamics Simulations (PDF)

Preeti Malakar , Argonne National Laboratory
Todd Munson , Argonne National Laboratory
Christopher Knight , Argonne National Laboratory
Venkatram Vishwanath , Argonne National Laboratory
Michael E. Papka , Argonne National Laboratory
pp. 305-319

Evaluation of an Interference-free Node Allocation Policy on Fat-tree Clusters (PDF)

Samuel D. Pollard , University of Oregon
Nikhil Jain , Lawrence Livermore National Laboratory
Stephen Herbein , Lawrence Livermore National Laboratory
Abhinav Bhatele , Lawrence Livermore National Laboratory
pp. 333-345

Mitigating Inter-Job Interference Using Adaptive Flow-Aware Routing (PDF)

Staci A. Smith , University of Arizona
Clara E. Cromey , University of Arizona
David K. Lowenthal , University of Arizona
Jens Domke , Tokyo Institute of Technology
Nikhil Jain , Lawrence Livermore National Laboratory
Jayaraman J. Thiagarajan , Lawrence Livermore National Laboratory
Abhinav Bhatele , Lawrence Livermore National Laboratory
pp. 346-360

Cooperative Rendezvous Protocols for Improved Performance and Overlap (PDF)

S. Chakraborty , The Ohio State University
M. Bayatpour , The Ohio State University
J. Hashmi , The Ohio State University
H. Subramoni , The Ohio State University
D. K. Panda , The Ohio State University
pp. 361-373

Framework for Scalable Intra-Node Collective Operations using Shared Memory (PDF)

Surabhi Jain , Intel Corporation
Rashid Kaleem , Intel Corporation
Marc Gamell Balmana , Intel Corporation
Akhil Langer , Intel Corporation
Dmitry Durnov , Intel Corporation
Alexander Sannikov , Intel Corporation
Maria Garzaran , Intel Corporation
pp. 374-385

Characterization of MPI Usage on a Production Supercomputer (PDF)

Sudheer Chunduri , Argonne National Laboratory
Scott Parker , Argonne National Laboratory
Pavan Balaji , Argonne National Laboratory
Kevin Harms , Argonne National Laboratory
Kalyan Kumaran , Argonne National Laboratory
pp. 386-400

Runtime Data Management on Non-Volatile Memory-based Heterogeneous Memory for Task-Parallel Programs (PDF)

Kai Wu , University of California, Merced
Jie Ren , University of California, Merced
Dong Li , University of California, Merced
pp. 401-413

DRAGON: Breaking GPU Memory Capacity Limits with Direct NVM Access (PDF)

Pak Markthub , Tokyo Institute of Technology
Mehmet E. Belviranli , Oak Ridge National Laboratory
Seyong Lee , Oak Ridge National Laboratory
Jeffrey S. Vetter , Oak Ridge National Laboratory
Satoshi Matsuoka , Oak Ridge National Laboratory
pp. 414-426

Siena: Exploring the Design Space of Heterogeneous Memory Systems (PDF)

Ivy B. Peng , Oak Ridge National Laboratory
Jeffrey S. Vetter , Oak Ridge National Laboratory
pp. 427-440

Dynamic Tracing: Memoization of Task Graphs for Dynamic Task-Based Runtimes (PDF)

Wonchan Lee , Stanford University
Elliott Slaughter , SLAC National Accelerator Laboratory
Michael Bauer , NVIDIA
Todd Warszawski , Stanford University
Alex Aiken , Stanford University
pp. 441-453

Runtime-Assisted Cache Coherence Deactivation in Task Parallel Programs (PDF)

Paul Caheny , miBarcelona Supercomputing Center
Lluc Alvarez , Barcelona Supercomputing Center
Mateo Valero , Barcelona Supercomputing Center
Miquel Moretó , Barcelona Supercomputing Center
Marc Casas , Barcelona Supercomputing Center
pp. 454-465

A Divide and Conquer Algorithm for DAG Scheduling under Power Constraints (PDF)

Gökalp Demirci , University of Chicago
Ivana Marincic , University of Chicago
Henry Hoffmann , University of Chicago
pp. 466-477

A Reference Architecture for Datacenter Scheduling: Design, Validation, and Experiments (PDF)

Georgios Andreadis , Delft University of Technology
Laurens Versluis , Vrije Universiteit Amsterdam
Fabian Mastenbroek , Delft University of Technology
Alexandru Iosup , Delft University of Technology
pp. 478-492

Dynamically Negotiating Capacity Between On-demand and Batch Clusters (PDF)

Feng Liu , University of Minnesota
Kate Keahey , Argonne National Laboratory
Pierre Riteau , University of Chicago
Jon Weissman , University of Minnesota
pp. 493-503

A Lightweight Model for Right-Sizing Master-Worker Applications (PDF)

Nathaniel Kremer-Herman , University of Notre Dame
Benjamin Tovar , University of Notre Dame
Douglas Thain , University of Notre Dame
pp. 504-516

Simulating the Wenchuan Earthquake with Accurate Surface Topography on Sunway TaihuLight (PDF)

Bingwei Chen , Tsinghua University
Haohuan Fu , Tsinghua University
Yanwen Wei , Tsinghua University
Conghui He , Tsinghua University
Wenqiang Zhang , University of Science and Technology of China
Yuxuan Li , Tsinghua University
Wubin Wan , National Supercomputing Center in Wuxi
Wei Zhang , National Supercomputing Center in Wuxi
Lin Gan , Tsinghua University
Wei Zhang , Southern University of Science and Technology
Zhenguo Zhang , Southern University of Science and Technology
Guangwen Yang , Tsinghua University
Xiaofei Chen , Southern University of Science and Technology
pp. 517-528

Accelerating Quantum Chemistry with Vectorized and Batched Integrals (PDF)

Hua Huang , Georgia Institute of Technology
Edmond Chow , Georgia Institute of Technology
pp. 529-542

High-Performance Dense Tucker Decomposition on GPU Clusters (PDF)

Jee Choi , IBM T. J. Watson Research Center
Xing Liu , IBM T. J. Watson Research Center
Venkatesan Chakaravarthy , IBM India Research Lab
pp. 543-553

Lessons Learned from Memory Errors Observed Over the Lifetime of Cielo (PDF)

Scott Levy , Sandia National Laboratories
Kurt B. Ferreira , Sandia National Laboratories
Nathan DeBardeleben , Los Alamos National Laboratory
Taniya Siddiqua , Advanced Micro Devices, Inc.
Vilas Sridharan , Advanced Micro Devices, Inc.
Elisabeth Baseman , Los Alamos National Laboratory
pp. 554-565

Partial Redundancy in HPC Systems with Non-Uniform Node Reliabilities (PDF)

Zaeem Hussain , University of Pittsburgh
Taieb Znati , University of Pittsburgh
Rami Melhem , University of Pittsburgh
pp. 566-576

Evaluating and Accelerating High-Fidelity Error Injection for HPC (PDF)

Chun-Kai Chang , University of Texas at Austin
Sangkug Lym , University of Texas at Austin
Nicholas Kelly , University of Texas at Austin
Michael B. Sullivan , University of Texas at Austin
Mattan Erez , University of Texas at Austin
pp. 577-589

Associative Instruction Reordering to Alleviate Register Pressure (PDF)

Prashant Singh Rawat , The Ohio State University
Aravind Sukumaran-Rajam , The Ohio State University
Atanas Rountev , The Ohio State University
Fabrice Rastello , University Grenoble Alpes
Louis-Noël Pouchet , Colorado State University
P. Sadayappan , The Ohio State University
pp. 590-602

Harnessing GPU Tensor Cores for Fast FP16 Arithmetic to Speed up Mixed-Precision Iterative Refinement Solvers (PDF)

Azzam Haidar , University of Tennessee
Stanimire Tomov , University of Tennessee
Jack Dongarra , University of Tennessee
Nicholas J. Higham , University of Manchester
pp. 603-613

ADAPT: Algorithmic Differentiation Applied to Floating-Point Precision Tuning (PDF)

Harshitha Menon , Lawrence Livermore National Laboratory
Michael O. Lam , James Madison University
Daniel Osei-Kuffuor , Lawrence Livermore National Laboratory
Markus Schordan , Lawrence Livermore National Laboratory
Scott Lloyd , Lawrence Livermore National Laboratory
Kathryn Mohror , Lawrence Livermore National Laboratory
Jeffrey Hittinger , Lawrence Livermore National Laboratory
pp. 614-626

A Fast Scalable Implicit Solver for Nonlinear Time-Evolution Earthquake City Problem on Low-Ordered Unstructured Finite Elements with Artificial Intelligence and Transprecision Computing (PDF)

Kohei Fujita , The University of Tokyo
Takuma Yamaguchi , The University of Tokyo
Akira Naruse , NVIDIA Corporation
Jack C. Wells , Oak Ridge National Laboratory
Thomas C. Schulthess , Swiss National Supercomputing Centre
Tjerk P. Straatsma , Oak Ridge National Laboratory
Christopher J. Zimmer , Oak Ridge National Laboratory
Maxime Martinasso , Swiss National Supercomputing Centre
Kengo Nakajima , The University of Tokyo
Muneo Hori , The University of Tokyo
Lalith Maddegedara , The University of Tokyo
pp. 627-637

167-PFlops Deep Learning for Electron Microscopy: From Learning Physics to Atomic Manipulation (PDF)

Robert M. Patton , Oak Ridge National Laboratory
J. Travis Johnston , Oak Ridge National Laboratory
Steven R. Young , Oak Ridge National Laboratory
Catherine D. Schuman , Oak Ridge National Laboratory
Don D. March , Oak Ridge National Laboratory
Thomas E. Potok , Oak Ridge National Laboratory
Derek C. Rose , Oak Ridge National Laboratory
Seung-Hwan Lim , Oak Ridge National Laboratory
Thomas P. Karnowski , Oak Ridge National Laboratory
Maxim A. Ziatdinov , Oak Ridge National Laboratory
Sergei V. Kalinin , Oak Ridge National Laboratory
pp. 638-648

Exascale Deep Learning for Climate Analytics (PDF)

Thorsten Kurth , Lawrence Berkeley National Laboratory
Joshua Romero , NVIDIA
Mayur Mudigonda , Lawrence Berkeley National Laboratory
Nathan Luehr , NVIDIA
Ankur Mahesh , Lawrence Berkeley National Laboratory
Michael Matheson , Oak Ridge National Laboratory
Jack Deslippe , Lawrence Berkeley National Laboratory
* Prabhat , Lawrence Berkeley National Laboratory
pp. 649-660

The Design, Deployment, and Evaluation of the CORAL Pre-Exascale Systems (PDF)

Sudharshan S. Vazhkudai , Oak Ridge National Laboratory
Bronis R. de Supinski , Lawrence Livermore National Laboratory
Arthur S. Bland , Oak Ridge National Laboratory
Al Geist , Oak Ridge National Laboratory
Jim Kahle , IBM
Christopher J. Zimmer , Oak Ridge National Laboratory
Scott Atchley , Oak Ridge National Laboratory
Sarp Oral , Oak Ridge National Laboratory
Don E. Maxwell , Oak Ridge National Laboratory
Veronica G. Vergara Larrea , Oak Ridge National Laboratory
Adam Bertsch , Lawrence Livermore National Laboratory
Robin Goldstone , Lawrence Livermore National Laboratory
Wayne Joubert , Oak Ridge National Laboratory
Chris Chambreau , Lawrence Livermore National Laboratory
Ben Casses , Lawrence Livermore National Laboratory
Matthew A. Ezell , Oak Ridge National Laboratory
Elsa Gonsiorowski , Lawrence Livermore National Laboratory
Ian Karlin , Lawrence Livermore National Laboratory
Matthew L. Leininger , Lawrence Livermore National Laboratory
Dustin Leverman , Oak Ridge National Laboratory
Adam Moody , Lawrence Livermore National Laboratory
Ramesh Pankajakshan , Lawrence Livermore National Laboratory
James H. Rogers , Oak Ridge National Laboratory
Drew Schmidt , Oak Ridge National Laboratory
Mallikarjun Shankar , Oak Ridge National Laboratory
Feiyi Wang , Oak Ridge National Laboratory
Py Watson , Lawrence Livermore National Laboratory
Lance D. Weems , Lawrence Livermore National Laboratory
Junqi Yin , Oak Ridge National Laboratory
pp. 661-672

Best Practices and Lessons from Deploying and Operating a Sustained-Petascale System: The Blue Waters Experience (PDF)

Gregory H. Bauer , University of Illinois
Brett Bode , University of Illinois
Jeremy Enos , University of Illinois
William T. Kramer , University of Illinois
Scott Lathrop , University of Illinois
Celso L. Mendes , University of Illinois
Roberto R. Sisneros , University of Illinois
pp. 673-684

Performance Evaluation of a Vector Supercomputer SX-Aurora TSUBASA (PDF)

Kazuhiko Komatsu , Tohoku University
Shintaro Momose , Tohoku University, NEC Corporation
Yoko Isobe , Tohoku University, NEC Corporation
Osamu Watanabe , Tohoku University, NEC Corporation
Akihiro Musa , Tohoku University, NEC Corporation
Mitsuo Yokokawa , Kobe University
Toshikazu Aoyama , NEC Corporation
Masayuki Sato , Tohoku University
Hiroaki Kobayashi , Tohoku University
pp. 685-696

Simulating the Weak Death of the Neutron in a Femtoscale Universe with Near-Exascale Computing (PDF)

Evan Berkowitz , Institut fur Kernphysik and Institute for Advanced Simulation
M.A. Clark , NVIDIA Corporation
Arjun Gambhir , Lawrence Livermore National Laboratory; Lawrence Berkeley National Laboratory; University of California, Berkeley
Ken McElvain , University of California, Berkeley; Lawrence Berkeley National Laboratory
Amy Nicholson , University of North Carolina
Enrico Rinaldi , Brookhaven National Laboratory; Lawrence Berkeley National Laboratory
Pavlos Vranas , Lawrence Livermore National Laboratory; Lawrence Berkeley National Laboratory;
André Walker-Loud , Lawrence Berkeley National Laboratory; Lawrence Livermore National Laboratory
Chia Cheng Chang , Lawrence Berkeley National Laboratory
Bálint Joó , Thomas Jefferson National Accelerator Facility
Thorsten Kurth , Lawrence Berkeley National Laboratory
Kostas Orginos , The College of William & Mary
pp. 697-705

ShenTu: Processing Multi-Trillion Edge Graphs on Millions of Cores in Seconds (PDF)

Heng Lin , Tsinghua University; Fma Technology
Xiaowei Zhu , Tsinghua University; Qatar Computing Research Institute
Bowen Yu , Tsinghua University
Xiongchao Tang , Tsinghua University; Qatar Computing Research Institute
Wei Xue , Tsinghua University
Wenguang Chen , Tsinghua University
Lufei Zhang , State Key Laboratory of Mathematical Engineering and Advanced Computing
Torsten Hoefler , ETH Zurich
Xiaosong Ma , Qatar Computing Research Institute
Xin Liu , National Research Centre of Parallel Computer Engineering and Technology
Weimin Zheng , Tsinghua University
Jingfang Xu , Beijing Sogou Technology Development Co., Ltd.
pp. 706-716

Attacking the Opioid Epidemic: Determining the Epistatic and Pleiotropic Genetic Architectures for Chronic Pain and Opioid Addiction (PDF)

Wayne Joubert , Oak Ridge National Laboratory
Deborah Weighill , Oak Ridge National Laboratory; University of Tennessee
David Kainer , Oak Ridge National Laboratory
Sharlee Climer , University of Missouri-St. Louis
Amy Justice , Yale University/Department of Veterans Affairs
Kjiersten Fagnan , DOE Joint Genome Institute
Daniel Jacobson , Oak Ridge National Laboratory
pp. 717-730

iSpan: Parallel Identification of Strongly Connected Components with Spanning Trees (PDF)

Yuede Ji , George Washington University
Hang Liu , University of Massachusetts, Lowell
H. Howie Huang , George Washington University
pp. 731-742

Adaptive Anonymization of Data using b-Edge Cover (PDF)

Arif Khan , Pacific Northwest National Laboratory
Krzysztof Choromanski , Google Brain Robotics, New York
Alex Pothen , Purdue University
S. M. Ferdous , Purdue University
Mahantesh Halappanavar , Pacific Northwest National Laboratory
Antonino Tumeo , Pacific Northwest National Laboratory
pp. 743-753

faimGraph: High Performance Management of Fully-Dynamic Graphs Under Tight Memory Constraints on the GPU (PDF)

Martin Winter , Graz University of Technology, Austria
Daniel Mlakar , Graz University of Technology, Austria
Rhaleb Zayer , missingMax Planck Institute for Informatics
Hans-Peter Seidel , Max Planck Institute for Informatics
Markus Steinberger , Max Planck Institute for Informatics
pp. 754-766

Dynamic Data Race Detection for OpenMP Programs (PDF)

Yizi Gu , Rice University
John Mellor-Crummey , Rice University
pp. 767-778

ParSy: Inspection and Transformation of Sparse Matrix Computations for Parallelism (PDF)

Kazem Cheshmi , University of Toronto
Shoaib Kamil , Adobe Research
Michelle Mills Strout , University of Arizona
Maryam Mehri Dehnavi , University of Toronto
pp. 779-793

Detecting MPI Usage Anomalies via Partial Program Symbolic Execution (PDF)

Fangke Ye , Georgia Institute of Technology
Jisheng Zhao , Georgia Institute of Technology
Vivek Sarkar , Georgia Institute of Technology
pp. 794-806

Exploring Flexible Communications for Streamlining DNN Ensemble Training Pipelines (PDF)

Randall Pittman , North Carolina State University
Hui Guan , North Carolina State University
Xipeng Shen , North Carolina State University
Seung-Hwan Lim , Oak Ridge National Laboratory
Robert M. Patton , Oak Ridge National Laboratory
pp. 807-818

CosmoFlow: Using Deep Learning to Learn the Universe at Scale (PDF)

Amrita Mathuriya , Intel Corporation
Deborah Bard , Lawrence Berkeley National Laboratory
Peter Mendygral , Cray Inc.
Lawrence Meadows , Intel Corporation
James Arnemann , U.C. Berkeley
Lei Shao , Intel Corporation
Siyu He , Flatiron Institute
Tuomas Kärnä , Intel Corporation
Diana Moise , Cray Inc.
Simon J. Pennycook , Intel Corporation
Kristyn Maschhoff , Cray Inc.
Jason Sewall , Intel Corporation
Nalini Kumar , Intel Corporation
Shirley Ho , Flatiron Institute
* Prabhat , Lawrence Berkeley National Laboratory
Victor Lee , Intel Corporation
pp. 819-829

Anatomy of High-Performance Deep Learning Convolutions on SIMD Architectures (PDF)

Evangelos Georganas , Intel Corporation
Sasikanth Avancha , Intel Corporation
Kunal Banerjee , Intel Corporation
Dhiraj Kalamkar , Intel Corporation
Greg Henry , Intel Corporation
Hans Pabst , Intel Corporation
Alexander Heinecke , Intel Corporation
pp. 830-841

Fault Tolerant One-sided Matrix Decompositions on Heterogeneous Systems with GPUs (PDF)

Jieyang Chen , University of California, Riverside
Hongbo Li , University of California, Riverside
Sihuan Li , University of California, Riverside
Xin Liang , University of California, Riverside
Panruo Wu , University of Houston
Dingwen Tao , University of Alabama
Kaiming Ouyang , University of California, Riverside
Yuanlai Liu , University of California, Riverside
Kai Zhao , University of California, Riverside
Qiang Guan , Kent State University
Zizhong Chen , University of California, Riverside
pp. 854-865

PRISM: Predicting Resilience of GPU Applications Using Statistical Methods (PDF)

Charu Kalra , Northeastern University
Fritz Previlon , Northeastern University
Xiangyu Li , Northeastern University
Norman Rubin , NVIDIA
David Kaeli , Northeastern University
pp. 866-879

Phase Asynchronous AMR Execution for Productive and Performant Astrophysical Flows (PDF)

Muhammad Nufail Farooqi , Koc University
Tan Nguyen , Lawrence Berkeley National Laboratory
Weiqun Zhang , Lawrence Berkeley National Laboratory
Ann S. Almgren , Lawrence Berkeley National Laboratory
John Shalf , Lawrence Berkeley National Laboratory
Didem Unat , Koc University
pp. 880-893

Computing Planetary Interior Normal Modes with a Highly Parallel Polynomial Filtering Eigensolver (PDF)

Jia Shi , Rice University
Ruipeng Li , Lawrence Livermore National Laboratory
Yuanzhe Xi , Emory University
Yousef Saad , University of Minnesota
Maarten V. de Hoop , Rice University
pp. 894-906

Dac-Man: Data Change Management for Scientific Datasets on HPC systems (PDF)

Devarshi Ghoshal , Lawrence Berkeley National Laboratory
Lavanya Ramakrishnan , Lawrence Berkeley National Laboratory
Deborah Agarwal , Lawrence Berkeley National Laboratory
pp. 907-919

Stacker: An Autonomic Data Movement Engine for Extreme-Scale Data Staging-Based In-Situ Workflows (PDF)

Pradeep Subedi , Rutgers University
Philip Davis , Rutgers University
Shaohua Duan , Rutgers University
Scott Klasky , Oak Ridge National Laboratory
Hemanth Kolla , Sandia National Laboratories
Manish Parashar , Rutgers University
pp. 920-930

A Year in the Life of a Parallel File System (PDF)

Glenn K. Lockwood , Lawrence Berkeley National Laboratory
Shane Snyder , Argonne National Laboratory
Teng Wang , Lawrence Berkeley National Laboratory
Suren Byna , Lawrence Berkeley National Laboratory
Philip Carns , Argonne National Laboratory
Nicholas J. Wright , Lawrence Berkeley National Laboratory
pp. 931-943

Author Index (PDF)

pp. 945
79 ms
(Ver 3.3 (11022016))