2015 IEEE High Performance
Extreme Computing Conference
(HPEC ‘15)
Nineteenth Annual HPEC Conference
15 - 17 September 2015
Westin Hotel, Waltham, MA USA
Graphs & Sparse Data 2
3:00-4:40 in Eden Vale C1 - C2
Chair: Michael Wolf / Sandia
Invited Talk: Graph Programming Interface
Dr. José Moreira, IBM Thomas J. Watson Research Center
Improving the Performance of Graph Analysis Through Partition-ing with Sampling
Michael M. Wolf, Sandia, Benjamin A. Miller, MIT Lincoln Laboratory
Numerous applications focus on the analysis of entities and the connections between them, and such data are naturally
represented as graphs. In particular, the detection of a small subset of vertices with anomalous coordinated connectivity is of
broad interest, for problems such as detecting strange traffic in a computer network or unknown communities in a social
network. Eigenspace analysis of large-scale graphs is useful for dimensionality reduction of these large, noisy data sets into a
more tractable analysis problem. When performing this sort of analysis across many parallel processes, the data partitioning
scheme may have a significant impact on the overall running time. Previous work demonstrated that partitioning based on a
sampled subset of edges still yields a substantial improvement in running time. In this work, we study this further, exploring how
different sampling strategies, graph community structure, and the vertex degree distribution affect the partitioning quality. We
show that sampling is an effective technique when partitioning for data analytics problems with community-like structure.
Optimization of Symmetric Tensor Computations
Jonathon Cai, Yale, Muthu Baskaran, Benoît Meister, Richard Lethin, Reservoir Labs
For applications that deal with large amounts of high dimensional multi-aspect data, it is natural to represent such data as
tensors or multi-way arrays. A singularly important class of tensors is the symmetric tensor, which shows up in real-world
applications such as higher-order statistics, signal processing, and data analysis. Tensor computations, such as tensor
decompositions, are used to analyze such data. In this paper, we describe novel optimizations that exploit the symmetry in
tensors in order to parallelize and reduce redundant computations and storage in operations involving symmetric tensors.
Specifically, we apply our optimizations on matricized tensor times Khatri Rao product (mttkrp) operation, a key operation in
tensor decomposition algorithms such as INDSCAL (individual differences in scaling) for partially symmetric tensors. We
demonstrate improved performance for both sequential and parallel execution using our techniques on both synthetic and real
data sets.
Using a Power Law Distribution to Describe Big Data
Vijay Gadepally, Jeremy Kepner, MIT Lincoln Laboratory
The gap between data production and user ability to access, compute and produce meaningful results calls for tools that
address the challenges associated with big data volume, velocity and variety. One of the key hurdles is the inability to
methodically remove expected or uninteresting elements from large data sets. This difficulty often wastes valuable researcher
and computational time by expending resources on uninteresting parts of data. Social sensors, or sensors which produce data
based on human activity, such as Wikipedia, Twitter, and Facebook have an underlying structure which can be thought of as
having a Power Law distribution. Such a distribution implies that few nodes generate large amounts of data. In this article, we
propose a technique to take an arbitrary dataset and compute a power law distributed background model that bases its
parameters on observed statistics. This model can be used to determine the suitability of using a power law or automatically
identify high degree nodes for filtering and can be scaled to work with big data.
Invited Talk: Photonically-Optimized Graph Processors
Dr. Jag Shah (Senior Scientist - IDA)
Wednesday September 16