2018 IEEE High Performance Extreme Computing Conference (HPEC ‘18) Twenty-second Annual HPEC Conference 25 - 27 September 2018 Westin Hotel, Waltham, MA USA
Hyperscaling Internet Graph Analysis with D4M on the MIT SuperCloud Vijay Gadepally (MIT Lincoln Laboratory)* Detecting anomalous behavior in network traffic is a major challenge due to the volume and velocity of network traffic. For example, a 10 Gigabit Ethernet connection can generate over 50 MB/s of packet headers. For global network providers, this challenge can be amplified by many orders of magnitude. Development of novel computer network traffic analytics requires: high level programming environments, massive amount of packet capture (PCAP) data, and diverse data products for “at scale” algorithm pipeline development.  D4M (Dynamic Distributed Dimensional Data Model) combines the power of sparse linear algebra, associative arrays, parallel processing, and distributed databases (such as SciDB and Apache Accumulo) to provide a scalable data and computation system that addresses the big data problems associated with network analytics development.  Combining D4M with the MIT SuperCloud manycore processors and parallel storage system enables network analysts to interactively process massive amounts of data in minutes.  To demonstrate these capabilities, we have implemented a representative analytics pipeline in D4M and benchmarked it on 96 hours of Gigabit PCAP data with MIT SuperCloud.  The entire pipeline from uncompressing the raw files to database ingest was implemented in 135 lines of D4M code and achieved speedups of over 20,000. Performance of Graph Analytics Applications on Many-Core Processors Manoj Kumar (IBM)* Attaining good performance on graph analytics applications on modern day many-core processors is challenging, because these processors have complex pipelines to manage out of order execution of hundreds of instructions in flight. These pipelines  have been optimized for high performance computing (HPC) applications, not for graph analytics.  It is preferable to leave the task of attaining good performance to the system developers, and to separate the performance concern from the application programmer's concerns.  In this paper, we show that the linear algebra formulation of graph-analytics effectively handles the aforementioned separation of concerns. This formulation is a better fit for many-core processors as the many-core processors are optimized for HPC applications which have a substantial linear algebra component.  We show that on POWER8, a many- core processor, an eight-fold performance advantage can be attained on the Graph500 benchmark by adopting the linear algebra formulation.  We also present the CPI stack analysis of three graph analytics kernels, and show that the linear algebra implementations of these kernels make efficient use of the POWER8 core.  Inhibitors to still better performance are discussed. Towards Triangle Counting on GPU using Stable Radix binning Nishith Tirpankar (University of Utah, School of Computing)*; Hari Sundar (University of Utah) The pattern of computations of graph algorithms makes them difficult to parallelize. They suffer from erratic data access patterns. We propose a set of algorithmic patterns that enable users to take advantage of fine grained parallelism provided by modern CPU and GPU architectures. This allows accesses to be regularized which improves hierarchical cache access. We also propose a parallel stable binning algorithm that can be used for computing set intersection. This is illustrated through its application to triangle counting in large graphs. Chapel HyperGraph Library (CHGL) Louis Jenkins (Pacific Northwest National Laboratory); Tanveer Bhuiyan (Mississippi State University); Sarah Harun (Mississippi State University); Christopher Lightsey (Mississippi State University); David Mentgen (Mississippi State University); Sinan Aksoy (Pacific Northwest National Laboratory); Timothy Stavenger (Pacific Northwest National Laboratory); Marcin Zalewski (PNNL)*; Hugh Medal ("Assistant Professor, Industrial and Systems Engineering, Mississippi State"); Cliff Joslyn (Pacific Northwest National Laboratory) We present the Chapel Hpergraph Library (CHGL), a library for hypergraph computation in the emerging Chapel language.  Hypergraphs generalize graphs, where a hypergraph edge can connect any number of vertices. Thus, hypergraphs capture high- order, high-dimensional interactions between multiple entities that are not directly expressible in graphs.  CHGL is designed to provide HPC-class computation with high-level abstractions and modern language support for parallel computing on shared memory and distributed memory systems.  In this paper we describe the design of CHGL, including first principles, data structures, and algorithms, and we present preliminary performance results based on a graph generation use case. We also discuss ongoing work of codesign with Chapel, which is currently centered on improving performance. Performance Effects of Dynamic Graph Data Structures in Community Detection Algorithms Rohit Varkey Thankachan (Georgia Institute of Technology)*; Brian Swenson (Georgia Tech Research Institute); James Fairbanks (Georgia Tech Research Institute) Community detection algorithms create a dynamic graph as an internal data structure for tracking agglomerative merges. This community (block) graph is modified heavily through operations derived from moving vertices between candidate communities. We study the problem of choosing the optimal graph representation for this data structure and analyze the performance implications theoretically and empirically. These costs are analyzed in the context of Peixoto’s Markov Chain Monte Carlo algorithm for stochastic block model inference, but apply to agglomerative hierarchical community detection algorithms more broadly. This cost model allows for evaluating data structures for implementing this algorithms and we identify inherent properties of the algorithm that exclude certain optimizations.
Wednesday September 26, 2018
Graphs & Sparse Data 2 1:00-2:40 in Eden Vale A3 Chair: Scott McMillan / CMU
Hyperscaling Internet Graph Analysis with D4M on the MIT SuperCloud Vijay Gadepally (MIT Lincoln Laboratory)* Detecting anomalous behavior in network traffic is a major challenge due to the volume and velocity of network traffic. For example, a 10 Gigabit Ethernet connection can generate over 50 MB/s of packet headers. For global network providers, this challenge can be amplified by many orders of magnitude. Development of novel computer network traffic analytics requires: high level programming environments, massive amount of packet capture (PCAP) data, and diverse data products for “at scale” algorithm pipeline development.  D4M (Dynamic Distributed Dimensional Data Model) combines the power of sparse linear algebra, associative arrays, parallel processing, and distributed databases (such as SciDB and Apache Accumulo) to provide a scalable data and computation system that addresses the big data problems associated with network analytics development.  Combining D4M with the MIT SuperCloud manycore processors and parallel storage system enables network analysts to interactively process massive amounts of data in minutes.  To demonstrate these capabilities, we have implemented a representative analytics pipeline in D4M and benchmarked it on 96 hours of Gigabit PCAP data with MIT SuperCloud.  The entire pipeline from uncompressing the raw files to database ingest was implemented in 135 lines of D4M code and achieved speedups of over 20,000. Performance of Graph Analytics Applications on Many-Core Processors Manoj Kumar (IBM)* Attaining good performance on graph analytics applications on modern day many-core processors is challenging, because these processors have complex pipelines to manage out of order execution of hundreds of instructions in flight. These pipelines  have been optimized for high performance computing (HPC) applications, not for graph analytics.  It is preferable to leave the task of attaining good performance to the system developers, and to separate the performance concern from the application programmer's concerns.  In this paper, we show that the linear algebra formulation of graph- analytics effectively handles the aforementioned separation of concerns. This formulation is a better fit for many-core processors as the many-core processors are optimized for HPC applications which have a substantial linear algebra component.  We show that on POWER8, a many-core processor, an eight-fold performance advantage can be attained on the Graph500 benchmark by adopting the linear algebra formulation.  We also present the CPI stack analysis of three graph analytics kernels, and show that the linear algebra implementations of these kernels make efficient use of the POWER8 core.  Inhibitors to still better performance are discussed. Towards Triangle Counting on GPU using Stable Radix binning Nishith Tirpankar (University of Utah, School of Computing)*; Hari Sundar (University of Utah) The pattern of computations of graph algorithms makes them difficult to parallelize. They suffer from erratic data access patterns. We propose a set of algorithmic patterns that enable users to take advantage of fine grained parallelism provided by modern CPU and GPU architectures. This allows accesses to be regularized which improves hierarchical cache access. We also propose a parallel stable binning algorithm that can be used for computing set intersection. This is illustrated through its application to triangle counting in large graphs. Chapel HyperGraph Library (CHGL) Louis Jenkins (Pacific Northwest National Laboratory); Tanveer Bhuiyan (Mississippi State University); Sarah Harun (Mississippi State University); Christopher Lightsey (Mississippi State University); David Mentgen (Mississippi State University); Sinan Aksoy (Pacific Northwest National Laboratory); Timothy Stavenger (Pacific Northwest National Laboratory); Marcin Zalewski (PNNL)*; Hugh Medal ("Assistant Professor, Industrial and Systems Engineering, Mississippi State"); Cliff Joslyn (Pacific Northwest National Laboratory) We present the Chapel Hpergraph Library (CHGL), a library for hypergraph computation in the emerging Chapel language.  Hypergraphs generalize graphs, where a hypergraph edge can connect any number of vertices. Thus, hypergraphs capture high- order, high-dimensional interactions between multiple entities that are not directly expressible in graphs.  CHGL is designed to provide HPC-class computation with high-level abstractions and modern language support for parallel computing on shared memory and distributed memory systems.  In this paper we describe the design of CHGL, including first principles, data structures, and algorithms, and we present preliminary performance results based on a graph generation use case. We also discuss ongoing work of codesign with Chapel, which is currently centered on improving performance. Performance Effects of Dynamic Graph Data Structures in Community Detection Algorithms Rohit Varkey Thankachan (Georgia Institute of Technology)*; Brian Swenson (Georgia Tech Research Institute); James Fairbanks (Georgia Tech Research Institute) Community detection algorithms create a dynamic graph as an internal data structure for tracking agglomerative merges. This community (block) graph is modified heavily through operations derived from moving vertices between candidate communities. We study the problem of choosing the optimal graph representation for this data structure and analyze the performance implications theoretically and empirically. These costs are analyzed in the context of Peixoto’s Markov Chain Monte Carlo algorithm for stochastic block model inference, but apply to agglomerative hierarchical community detection algorithms more broadly. This cost model allows for evaluating data structures for implementing this algorithms and we identify inherent properties of the algorithm that exclude certain optimizations.
Wednesday September 26, 2018
Graphs & Sparse Data 2 1:00-2:40 in Eden Vale A3 Chair: Scott McMillan / CMU
HPEC 2018 25 - 27 September 2018 Westin Hotel, Waltham, MA USA