2018 IEEE High Performance
Extreme Computing Conference
(HPEC ‘18)
Twenty-second Annual HPEC Conference
25 - 27 September 2018
Westin Hotel, Waltham, MA USA
Hyperscaling Internet Graph Analysis with D4M on the MIT SuperCloud
Vijay Gadepally (MIT Lincoln Laboratory)*
Detecting anomalous behavior in network traffic is a major challenge due to the volume and velocity of network traffic. For
example, a 10 Gigabit Ethernet connection can generate over 50 MB/s of packet headers. For global network providers, this
challenge can be amplified by many orders of magnitude. Development of novel computer network traffic analytics requires: high
level programming environments, massive amount of packet capture (PCAP) data, and diverse data products for “at scale”
algorithm pipeline development. D4M (Dynamic Distributed Dimensional Data Model) combines the power of sparse linear
algebra, associative arrays, parallel processing, and distributed databases (such as SciDB and Apache Accumulo) to provide a
scalable data and computation system that addresses the big data problems associated with network analytics development.
Combining D4M with the MIT SuperCloud manycore processors and parallel storage system enables network analysts to
interactively process massive amounts of data in minutes. To demonstrate these capabilities, we have implemented a
representative analytics pipeline in D4M and benchmarked it on 96 hours of Gigabit PCAP data with MIT SuperCloud. The entire
pipeline from uncompressing the raw files to database ingest was implemented in 135 lines of D4M code and achieved speedups
of over 20,000.
Performance of Graph Analytics Applications on Many-Core Processors
Manoj Kumar (IBM)*
Attaining good performance on graph analytics applications on modern day many-core processors is challenging, because these
processors have complex pipelines to manage out of order execution of hundreds of instructions in flight. These pipelines have
been optimized for high performance computing (HPC) applications, not for graph analytics. It is preferable to leave the task of
attaining good performance to the system developers, and to separate the performance concern from the application
programmer's concerns. In this paper, we show that the linear algebra formulation of graph-analytics effectively handles the
aforementioned separation of concerns. This formulation is a better fit for many-core processors as the many-core processors
are optimized for HPC applications which have a substantial linear algebra component. We show that on POWER8, a many-
core processor, an eight-fold performance advantage can be attained on the Graph500 benchmark by adopting the linear algebra
formulation. We also present the CPI stack analysis of three graph analytics kernels, and show that the linear algebra
implementations of these kernels make efficient use of the POWER8 core. Inhibitors to still better performance are discussed.
Towards Triangle Counting on GPU using Stable Radix binning
Nishith Tirpankar (University of Utah, School of Computing)*; Hari Sundar (University of Utah)
The pattern of computations of graph algorithms makes them difficult to parallelize. They suffer from erratic data access patterns.
We propose a set of algorithmic patterns that enable users to take advantage of fine grained parallelism provided by modern
CPU and GPU architectures. This allows accesses to be regularized which improves hierarchical cache access. We also
propose a parallel stable binning algorithm that can be used for computing set intersection. This is illustrated through its
application to triangle counting in large graphs.
Chapel HyperGraph Library (CHGL)
Louis Jenkins (Pacific Northwest National Laboratory); Tanveer Bhuiyan (Mississippi State University); Sarah Harun (Mississippi
State University); Christopher Lightsey (Mississippi State University); David Mentgen (Mississippi State University); Sinan Aksoy
(Pacific Northwest National Laboratory); Timothy Stavenger (Pacific Northwest National Laboratory); Marcin Zalewski (PNNL)*;
Hugh Medal ("Assistant Professor, Industrial and Systems Engineering, Mississippi State"); Cliff Joslyn (Pacific Northwest
National Laboratory)
We present the Chapel Hpergraph Library (CHGL), a library for hypergraph computation in the emerging Chapel language.
Hypergraphs generalize graphs, where a hypergraph edge can connect any number of vertices. Thus, hypergraphs capture high-
order, high-dimensional interactions between multiple entities that are not directly expressible in graphs. CHGL is designed to
provide HPC-class computation with high-level abstractions and modern language support for parallel computing on shared
memory and distributed memory systems. In this paper we describe the design of CHGL, including first principles, data
structures, and algorithms, and we present preliminary performance results based on a graph generation use case. We also
discuss ongoing work of codesign with Chapel, which is currently centered on improving performance.
Performance Effects of Dynamic Graph Data Structures in Community Detection Algorithms
Rohit Varkey Thankachan (Georgia Institute of Technology)*; Brian Swenson (Georgia Tech Research Institute); James
Fairbanks (Georgia Tech Research Institute)
Community detection algorithms create a dynamic graph as an internal data structure for tracking agglomerative merges. This
community (block) graph is modified heavily through operations derived from moving vertices between candidate communities.
We study the problem of choosing the optimal graph representation for this data structure and analyze the performance
implications theoretically and empirically. These costs are analyzed in the context of Peixoto’s Markov Chain Monte Carlo
algorithm for stochastic block model inference, but apply to agglomerative hierarchical community detection algorithms more
broadly. This cost model allows for evaluating data structures for implementing this algorithms and we identify inherent properties
of the algorithm that exclude certain optimizations.
Wednesday September 26, 2018
Graphs & Sparse Data 2
1:00-2:40 in Eden Vale A3
Chair: Scott McMillan / CMU