Wednesday September 16
2015 IEEE High Performance
Extreme Computing Conference
(HPEC ‘15)
Nineteenth Annual HPEC Conference
15 - 17 September 2015
Westin Hotel, Waltham, MA USA
Plenary Session
9:00-10:00 in Eden Vale B
Chair: Bob Bond / MIT Lincoln Laboratory
Keynote Speaker:
Delivering an Exascale Ecosystem for Science
Dr. Jeff Nichols (Associate Director - Oak Ridge National Laboratory)
Break
10:00 - 10:20
Advanced ASIC & FPGA Technologies
10:20-12:00 in Eden Vale A1-A2
Chair: David Cousins / BBN
[Best Student Paper Finalist]
Hardware-Efficient Compressed Sensing Encoder Designs for ECG
Jiayi Sheng, Chen Yang, Martin C. Herbordt, Boston University
Coarse Grain Reconfigurable ASIC through Multiplexer Based Switches
Karen Gettings, Marc Burke, Jeremy Muldavin, Michael Vai, MIT Lincoln Laboratory
Performance and Productivity Evaluation of Hybrid-Threading HLS versus HDLs
Gongyu Wang, Herman Lam, Alan George, University of Florida; Glen Edwards, Convey Computer Corporation
Aparapi-UCores: A High Level Programming Framework for Unconventional Cores
Oren Segal, Philip Colangelo, Nasibeh Nasiri, Zhuo Qian, Martin Margala University of Massachusetts Lowell
High Performance User Space Sockets on Low Power System on a Chip Platforms
Catherine H. Crawford, Piotr Padkowski, Tomasz Baranski, Angela Czubak, ukasz Raszka IBM Research
Intel Science & Technology Center on Big Data
10:20-12:00 in Eden Vale C1 - C2
Chair: Tim Mattson / Intel
Invited Talk: The Future of Big Data: Polystore in BigDAWG
Dr. Tim Mattson (Principal Engineer - Intel)
Invited Talk: Analytics on Small High Performance Clusters (SHPC)
Prof. Tim Kraska (Dept of Computer Science - Brown)
Invited Talk: Sparse Matrix Multiply with Julia and TileDB
Dr. Stavros Papadopoulos (Senior Research Scientist - Intel)
Invited Talk: ForeCache: Dynamic Prefetching of Data Tiles for Interactive Visualization
Ms. Leilani Battle (MIT CSAIL)
Invited Talk: Integrating Query Processing with Parallel Languages
Mr. Brandon Myers (Dept of Computer Science - University of Washington)
Lunch; View Posters and Demos
12:00-1:00 in Emerson
Posters
An Evaluation of CUDA Unified Memory Access on NVIDIA Tegra K1
John Joseph, Boston University, Kurt Keville, MIT
A Tag Based Vector Reduction Circuit Ming Wei, Yi-hua Huang
Sun Yat-sen University
Accelerating Laue Depth Reconstruction Algorithm With CUDA
Yue Ke, Nicholas Schwarz, Jonathan Z. Tischler, Argonne National Lab
Debugger for Multi-level Hybrid Parallel Programs on Heterogeneous Accelerator Cluster Architectures – Survey and
Challenges
Shamjith K V, Mangala N, Prahlada Rao BB, Sarat Chandra Babu N Centre for Development of Advanced Computing
Program Fracture and Recombination for Efficient Automatic Code Reuse
Peter Amidon, Eli Davis, Stelios Sidiroglou-Douskos, Martin Rinard MIT Computer Science & AI Laboratory
Using Deep Convolutional Networks for Occlusion Edge Detection in RGB-D Frames
Soumik Sarkar, Iowa State University, Vivek Venugopalan, Kishore Reddy, Michael Giering, Julian Ryde, UTRC, Navdeep Jaitly,
Google
Accelerating the Distributed Simulations of Agent-Based Models using Community Detection
Antoniya Petkova, Sumit Jha, Narsingh Deo, Charles Hughes, University of Central Florida, Martin Dimitrov, Intel Corporation
Manycore Computing 1
1:00-2:40 in Eden Vale A1 - A2
Chair: Patrick Dreher / MIT
[Best Paper Finalist]
Boosting Irregular Array Reductions through In-lined Block-ordering on Fast Processors
Jan Ciesko, Sergi Mateo, Xavier Teruel, Vicenc Beltran, Xavier Martorell, Jesus Labarta, Barcelona Supercomputing Center
[Best Paper Finalist]
MAGMA Embedded: Towards a Dense Linear Algebra Library for Energy Efficient Extreme Computing
Azzam Haidar, Stanimire Tomov, Piotr Luszczek, Jack Dongarra, University of Tennessee Knoxville
[Best Paper Finalist]
Optimizing Space Time Adaptive Processing Through Accelerating Memory-bounded Operations
Tze Meng Low, Qi Guo, Franz Franchetti, Carnegie Mellon University
[Best Student Paper Finalist]
A Near-Real-Time, Parallel and Distributed Adaptive Object Detection and Re-training Framework based on AdaBoost
Algorithm
Munther Abualkibash, Ausif Mahmood, Saeid Moslehpour, University of Bridgeport
Implementing Image Processing Algorithms for the Epiphany Many-Core Coprocessor with Threaded MPI
James Ross, U.S. Army Research Laboratory, David Richie, Brown Deer Technology, Song Park, U.S. Army Research Laboratory,
Dale Shires, U.S. Army Research Laboratory
Extreme Form Factors
1:00-2:40 in Eden Vale A3
Chair: Ken Gregson / MIT Lincoln Laboratory
Invited Talk: SpaceVPX Embedded Computing to Meet the Demands of Space
Dr. Charles Patrick Collier, Air Force Research Laboratory
Invited Talk: TBD
Prof. Sertac Karaman, Dept of Aero/Astro - MIT
Invited Talk: The IEEE Rebooting Computing Initiative
Prof. Tom Conte, Georgia Tech - President IEEE Computer Society
Invited Talk: Standards to Facilitate Open Architecture and a COTS Ecosystem
Mr. Greg Rocco / MIT Lincoln Laboratory, Dave Tremper / Office of Naval Research
Agile Condor: A Scalable High Performance Embedded Computing Architecture
Mark Barnell, Courtney Raymond, Air Force Research Lab, Christopher Capraro, Darrek Isereau, SRC
Graph & Sparse Data 1
1:00-2:40 in Eden Vale C1 - C2
Chair: Richard Lethin / Reservoir
Invited Talk: Faster Parallel Graph BLAS Kernels and New Graph Algorithms in Matrix Algebra
Dr. Aydin Buluc, Research Scientist - Lawrence Berkeley National Lab
[Best Student Paper Finalist]
Graphulo Implementation of Server-Side Sparse Matrix Multiply in the Accumulo Database
Dylan Hutchison, University of Washington, Jeremy Kepner, Vijay Gadepally, MIT Lincoln Laboratory, Adam Fuchs, Sqrrl
An Accelerated Procedure for Hypergraph Coarsening on the GPU
Lin Cheng, Hyunsu Cho, Peter Yoon, Trinity College
[Best Paper Finalist]
A Task-Based Linear Algebra Building Blocks Approach for Scal-able Graph Analytics
Michael M. Wolf, Jonathan W. Berry, Dylan T. Stark, Sandia
Sampling Large Graphs for Anticipatory Analytics
Lauren Edwards, Luke Johnson, Maja Milosavljevic, Vijay Gadepally, Benjamin A. Miller, MIT Lincoln Laboratory
Break
2:40-3:00
Manycore Computing 2
3:00-4:40 in Eden Vale A1 - A2
Chair: David Cousins / BBN
Heterogeneous Work-stealing across CPU and DSP Cores
Vivek Kumar, Alina Sbîrlea, Zoran Budimlic, Deepak Majeti, Vivek Sarkar, Rice University
Achieving Low Latency, Reduced Memory Footprint and Low Power Consumption with Data Streaming
Olivier Bockenbach, ContextVision, Murtaza Ali, Texas Instruments, Ian Wainwright, High Performance Consulting, Mark Nadeski,
Texas Instruments
Embedded Second-Order Cone Programming with Radar Applications
Paul Mountcastle, Tom Henretty, Aale Naqvi, Richard Lethin, Reservoir Labs
Efficient Parallelization of Path Planning Workload on Single-chip Shared-memory Multicores
Masab Ahmad, Omer Khan, University of Connecticut
Monte Carlo Simulations on Intel Xeon Phi: Offl oad and Native Mode
Bryar M. Shareef, Elise de Doncker, Western Michigan University
Graphs & Sparse Data 2
3:00-4:40 in Eden Vale C1 - C2
Chair: Michael Wolf / Sandia
Invited Talk: Graph Programming Interface
Dr. José Moreira, IBM Thomas J. Watson Research Center
Improving the Performance of Graph Analysis Through Partition-ing with Sampling
Michael M. Wolf, Sandia, Benjamin A. Miller, MIT Lincoln Laboratory
Optimization of Symmetric Tensor Computations
Jonathon Cai, Yale, Muthu Baskaran, Benoît Meister, Richard Lethin, Reservoir Labs
Using a Power Law Distribution to Describe Big Data
Vijay Gadepally, Jeremy Kepner, MIT Lincoln Laboratory
Invited Talk: Photonically-Optimized Graph Processors
Dr. Jag Shah (Senior Scientist - IDA)
Best Student Paper Award Presentation
4:40 in Eden Vale B
Chair: Brian Sroka / MITRE
Best Paper Award Presentation
4:50 in Eden Vale B
Chair: Jeremy Kepner / MIT Lincoln Laboratory
Reception; View Posters and Demos; Attend BoFs
5:00-8:00 in Emerson & Foyer & Eden Vale
BoFs
6:00 - 7:00
High Performance Storage
6:00-7:00 in Eden Vale A1
Chair: Torben Petersen / Seagate
Abstract: “Disk drives - a thing of the past? Will flash take over from spinning drives? Is HPC storage dead or with a strong
future? Are parallel file-systems being replaced by cloud? Data reliability? Archiving? Many of these questions are being asked
and few answered. This session will straighten out some and provide Seagate’s perspective on the future of storage solutions.”
Bio: Torben Kling Petersen, Ph.D. has worked in high performance computing since 1994 and is currently the Principal Architect
for Seagate HPC Storage Solutions. At Sun and subsequently Oracle, he has worked in a number of capacities including lead
architect for enterprise datacenter infrastructure for several telecommunication OEMs, technical lead for IPTV, product specialist
for high-end visualization, and global architect for the Lustre BdM team as well as the Cloud Computing infrastructure architect.
Torben has worked on many large HPC systems around the world including Sandia Red Sky in the US, ANU/BOM in Australia,
CHPC in South Africa, ETH Zürich and many others in Europe. In addition to compute and storage systems, Torben has worked
on data center design including the new CSCS datacenter in Lugano, Switzerland.
Dr. Petersen majored in Marine Biology and Biochemisty with his doctorate from Göteborg University, Sweden.
MGHPCC BoF
6:00-7:00 in Eden Vale A2
Chair: Chris Hill / MIT EAPS
System-on-Chip
6:00-7:00 in Eden Vale A3
Chair: Kurt Keville / MIT ISN
Faster parallel Graph BLAS kernels and new graph algorithms in matrix algebra BoF
6:00-7:00 in Eden Vale C1
Chair: Aydin Buluc / LBL
Abstract: I will give an overview of recent research in minimizing communication and hence improving the performance and
scalability of one of the most important Graph BLAS functions: the sparse matrix-matrix product. Our new implementation, which
does not asymptotically increase the memory requirements for sufficiently sparse matrices, relies on a particular interpretation of
the 3D algorithmic paradigm. It also takes advantage of in-node multithreading capabilities using a scalable multithreaded sub-
matrix multiplication algorithm. Multithreading is often helpful for reducing network contention.
I will then describe how to map two relatively complex graph algorithms into the language of matrices. The first algorithm,
triangle counting, relies on a masked version of sparse matrix-matrix multiplication. The second class of algorithms uses sparse
matrix-sparse vector multiplication and semiring specialization to implement various maximal-cardinality matching algorithms on
bipartite graphs, showing impressive results on distributed-memory architectures.
Bio: Aydın Buluç is a computational research scientist at the Lawrence Berkeley National Laboratory (LBNL). His research
interests include parallel computing, combinatorial scientific computing, high performance graph analysis, sparse matrix
computations, computational genomics and neuroscience. Previously, he was a Luis W. Alvarez postdoctoral fellow at LBNL and
a visiting scientist at the Simons Institute for the Theory of Computing. He received his PhD in Computer Science from the
University of California, Santa Barbara in 2010 and his BS in Computer Science and Engineering from Sabanci University,
Turkey in 2005. Dr. Buluç is the recipient of a DOE Early Career Award in 2013. He is also a founding associate editor of the
ACM Transactions on Parallel Computing.
SciDB BoF
6:00-7:00 in Eden Vale C2
Chair: Marilyn Matz / Paradigm4
Accumulo BoF (tentative)
6:00-7:00 in Eden Vale C3
Chair: Adam Fuchs / Sqrrl
Demos:
AHA Products Group
Annapolis
BittWare
Curtiss-Wright
Cyntony
Dynatem
FuturePlus
SpiralGen
X3-C