2019 IEEE High Performance
Extreme Computing Conference
(HPEC ‘19)
Twenty-third Annual HPEC Conference
24 - 26 September 2019
Westin Hotel, Waltham, MA USA
Graphs, Networks & Sparse Data 1
10:20-12:00 in Eden Vale C1
Chair: Michael Wolf / Sandia
[Best Paper Finalist] Efficient implementation of sparse matrix- sparse vector multiplication for large scale graph analytics
Mauricio Serrano (IBM)
We developed a parallel algorithm to improve the cache behavior and overall performance for multiplication of sparse matrices with
sparse vectors (SpMSpV), an operation used increasingly in large graph analytics, particularly dynamic graphs in social networks and
homeland security applications. The proposed algorithm builds upon the two-phase approach of partitioning the multiplication into a
scaling phase and an aggregation phase, to achieve more cache-friendly access patterns individually in each phase [6], [3]. However, to
handle dynamic graphs and achieve better load balancing for parallel implementation, we use a combination of private and shared bins,
with synchronized access to shared bins to exchange the product terms between the two phases. The new algorithm accumulates
product terms in private bins for each thread. The algorithm then performs a bulk transfer between a private bin and a shared bin, when
the private bin becomes full. Then results are aggregated from the shared bins. In addition, we employ heuristics to decide the best
algorithm for SpMSpV based on the number of nonzeros involved in the operation. When the number of nonzeros is large, it may be
better to perform the operation as SpMV (sparse matrix times dense vector) despite the added conversion cost. Also, if the number of
nonzeros is low it is advantageous to use a simplified algorithm. We compared our algorithm with existing algorithms for SpMSpV, and
our evaluation shows that execution time is reduced by several times when large graphs are considered.
Combinatorial Multigrid: Advanced Preconditioners For Ill-Conditioned Linear Systems
M. Harper Langston, Pierre-David Letourneau, Richard Lethin, James Ezick, Mitchell Harris (Reservoir Labs)
The Combinatorial Multigrid (CMG) technique is a practical and adaptable solver and combinatorial preconditioner for solving certain
classes of large, sparse systems of linear equations. CMG is similar to Algebraic Multigrid (AMG) but replaces large groupings of fine-
level variables with a single coarse-level one, resulting in simple and fast interpolation schemes. These schemes further provide control
over the refinement strategies at different levels of the solver hierarchy depending on the condition number of the system being solved.
While many pre-existing solvers may be able to solve large, sparse systems with relatively low complexity, inversion may require
$O(n^2)$ space; whereas, if we know that a linear operator has $\tilde{n} = O(n)$ nonzero elements, we desire to use $O(n)$ space in
order to reduce communication as much as possible. Being able to invert sparse linear systems of equations, asymptotically as fast as
the values can be read from memory, has been identified by the DARPA and the Department of Energy as increasingly necessary for
scalable solvers and energy-efficient algorithm in scientific computing. Further, as industry and government agencies move towards
exascale, fast solvers and communication-avoidance will be more necessary In this paper, we present an optimized implementation of
the Combinatorial Multigrid in C using Petsc and analyze the solution of various systems using the CMG approach as a preconditioner
on much larger problems than have been presented thus far. We compare the number of iterations, setup times and solution times
against other popular preconditioners for such systems, including Incomplete Cholesky and a Multigrid approach in Pets' against
common problems, further exhibiting superior performance by the CMG.
Concurrent Katz Centrality for Streaming Graphs
Chunxing Yin, Jason Riedy (Georgia Institute of Technology)
Most current frameworks for streaming graph analysis “stop the world” and halt ingesting data while updating analysis results. Others
add overhead for different forms of version control. In both methods, adding additional analysis kernels adds additional overhead to the
entire system. A new formal model of concurrent analysis lets some algorithms, those valid for the model, update results concurrently
with data ingest without synchronization. Additional kernels incur very little overhead. Here we present the first experimental results for
the new model, considering the performance and result latency of updating Katz centrality on a low-power edge platform. The Katz
centrality values remain close to the synchronous algorithm while reducing latency delay from 12.8× to 179×.
Graph Algorithms in PGAS: Chapel and UPC++
Jesun Sahariar Firoz, Louis Jenkins, Marcin Zalewski, Cliff Joslyn, Mark Raugas (PNNL)
The Partitioned Global Address Space (PGAS) programming model can be implemented either with programming language features or
with runtime library APIs, each implementation favoring different aspects (e.g., productivity, abstraction, flexibility, or performance).
Certain language and runtime features, such as collectives, explicit and asynchronous communication primitives, and constructs
facilitating overlap of communication and computation (such as futures and conjoined futures) can enable better performance and
scaling for irregular applications, in particular for distributed graph analytics. We compare graph algorithms in one of each of these
environments: the Chapel PGAS programming language and the the UPC++ PGAS runtime library. We implement algorithms for
breadth-first search and triangle counting graph kernels in both environments. We discuss the code in each of the environments, and
compile performance data on a Cray Aries and an Infiniband platform. Our results show that the library-based approach of UPC++
currently provides strong performance while Chapel provides a high-level abstraction that, harder to optimize, still provides comparable
performance.
Wednesday, September 25, 2019