Home Welcome Message Committee Invited Speakers Program Demos
2015 IEEE High Performance Extreme Computing Conference (HPEC ‘15) Nineteenth Annual HPEC Conference 15 - 17 September 2015 Westin Hotel, Waltham, MA USA
Graphs & Sparse Data 2 3:00-4:40 in Eden Vale C1 - C2 Chair: Michael Wolf / Sandia Invited Talk: Graph Programming Interface Dr. José Moreira, IBM Thomas J. Watson Research Center Improving the Performance of Graph Analysis Through Partition-ing with Sampling Michael M. Wolf, Sandia, Benjamin A. Miller, MIT Lincoln Laboratory Numerous applications focus on the analysis of entities and the connections between them, and such data are naturally represented as graphs. In particular, the detection of a small subset of vertices with anomalous coordinated connectivity is of broad interest, for problems such as detecting strange traffic in a computer network or unknown communities in a social network. Eigenspace analysis of large-scale graphs is useful for dimensionality reduction of these large, noisy data sets into a more tractable analysis problem. When performing this sort of analysis across many parallel processes, the data partitioning scheme may have a significant impact on the overall running time. Previous work demonstrated that partitioning based on a sampled subset of edges still yields a substantial improvement in running time. In this work, we study this further, exploring how different sampling strategies, graph community structure, and the vertex degree distribution affect the partitioning quality. We show that sampling is an effective technique when partitioning for data analytics problems with community-like structure. Optimization of Symmetric Tensor Computations Jonathon Cai, Yale, Muthu Baskaran, Benoît Meister, Richard Lethin, Reservoir Labs For applications that deal with large amounts of high dimensional multi-aspect data, it is natural to represent such data as tensors or multi-way arrays. A singularly important class of tensors is the symmetric tensor, which shows up in real-world applications such as higher-order statistics, signal processing, and data analysis. Tensor computations, such as tensor decompositions, are used to analyze such data.  In this paper, we describe novel optimizations that exploit the symmetry in tensors in order to parallelize and reduce redundant computations and storage in operations involving symmetric tensors. Specifically, we apply our optimizations on matricized tensor times Khatri Rao product (mttkrp) operation, a key operation in tensor decomposition algorithms such as INDSCAL (individual differences in scaling) for partially symmetric tensors. We demonstrate improved performance for both sequential and parallel execution using our techniques on both synthetic and real data sets. Using a Power Law Distribution to Describe Big Data Vijay Gadepally, Jeremy Kepner, MIT Lincoln Laboratory The gap between data production and user ability to access, compute and produce meaningful results calls for tools that address the challenges associated with big data volume, velocity and variety.  One of the key hurdles is the inability to methodically remove expected or uninteresting elements from large data sets. This difficulty often wastes valuable researcher and computational time by expending resources on uninteresting parts of data. Social sensors, or sensors which produce data based on human activity, such as Wikipedia, Twitter, and Facebook have an underlying structure which can be thought of as having a Power Law distribution. Such a distribution implies that few nodes generate large amounts of data. In this article, we propose a technique to take an arbitrary dataset and compute a power law distributed background model that bases its parameters on observed statistics. This model can be used to determine the suitability of using a power law or automatically identify high degree nodes for filtering and can be scaled to work with big data. Invited Talk: Photonically-Optimized Graph Processors Dr. Jag Shah (Senior Scientist - IDA)
Wednesday September 16
2015 IEEE High Performance Extreme Computing Conference (HPEC ‘15) Nineteenth Annual HPEC Conference 15 - 17 September 2015 Westin Hotel, Waltham, MA USA
Graphs & Sparse Data 2 3:00-4:40 in Eden Vale C1 - C2 Chair: Michael Wolf / Sandia Invited Talk: Graph Programming Interface Dr. José Moreira, IBM Thomas J. Watson Research Center Improving the Performance of Graph Analysis Through Partition-ing with Sampling Michael M. Wolf, Sandia, Benjamin A. Miller, MIT Lincoln Laboratory Numerous applications focus on the analysis of entities and the connections between them, and such data are naturally represented as graphs. In particular, the detection of a small subset of vertices with anomalous coordinated connectivity is of broad interest, for problems such as detecting strange traffic in a computer network or unknown communities in a social network. Eigenspace analysis of large-scale graphs is useful for dimensionality reduction of these large, noisy data sets into a more tractable analysis problem. When performing this sort of analysis across many parallel processes, the data partitioning scheme may have a significant impact on the overall running time. Previous work demonstrated that partitioning based on a sampled subset of edges still yields a substantial improvement in running time. In this work, we study this further, exploring how different sampling strategies, graph community structure, and the vertex degree distribution affect the partitioning quality. We show that sampling is an effective technique when partitioning for data analytics problems with community-like structure. Optimization of Symmetric Tensor Computations Jonathon Cai, Yale, Muthu Baskaran, Benoît Meister, Richard Lethin, Reservoir Labs For applications that deal with large amounts of high dimensional multi-aspect data, it is natural to represent such data as tensors or multi-way arrays. A singularly important class of tensors is the symmetric tensor, which shows up in real-world applications such as higher-order statistics, signal processing, and data analysis. Tensor computations, such as tensor decompositions, are used to analyze such data.  In this paper, we describe novel optimizations that exploit the symmetry in tensors in order to parallelize and reduce redundant computations and storage in operations involving symmetric tensors. Specifically, we apply our optimizations on matricized tensor times Khatri Rao product (mttkrp) operation, a key operation in tensor decomposition algorithms such as INDSCAL (individual differences in scaling) for partially symmetric tensors. We demonstrate improved performance for both sequential and parallel execution using our techniques on both synthetic and real data sets. Using a Power Law Distribution to Describe Big Data Vijay Gadepally, Jeremy Kepner, MIT Lincoln Laboratory The gap between data production and user ability to access, compute and produce meaningful results calls for tools that address the challenges associated with big data volume, velocity and variety.  One of the key hurdles is the inability to methodically remove expected or uninteresting elements from large data sets. This difficulty often wastes valuable researcher and computational time by expending resources on uninteresting parts of data. Social sensors, or sensors which produce data based on human activity, such as Wikipedia, Twitter, and Facebook have an underlying structure which can be thought of as having a Power Law distribution. Such a distribution implies that few nodes generate large amounts of data. In this article, we propose a technique to take an arbitrary dataset and compute a power law distributed background model that bases its parameters on observed statistics. This model can be used to determine the suitability of using a power law or automatically identify high degree nodes for filtering and can be scaled to work with big data. Invited Talk: Photonically-Optimized Graph Processors Dr. Jag Shah (Senior Scientist - IDA)
Wednesday September 16
Home