2019 IEEE High Performance Extreme Computing Conference (HPEC ‘19) Twenty-third Annual HPEC Conference 24 - 26 September 2019 Westin Hotel, Waltham, MA USA
Thursday, September 26, 2019 High Performance Data Analysis 1 10:20-12:00 in Eden Vale C3 Chair: Nikos Pitsianis / Aristotle Auxillary Maximum Likelihood Estimation for Noisy Point Cloud Registration Cole Campton, Xiaobai Sun (Duke) We establish first a theoretical foundation for the use of Gromov-Hausdorff (GH) distance for point set registration with homeomorphic deformation maps perturbed by Gaussian noise. We then present a probabilistic, deformable registration framework. At the core of the framework is a highly efficient iterative algorithm with guaranteed convergence to a local minimum of the GH-based objective function. The framework has two other key components -- a multi-scale stochastic shape descriptor and a data compression scheme. We also present an experimental comparison between our method and two existing influential methods on non-rigid motion between digital anthropomorphic phantoms extracted from physical data of multiple individuals. Multi-spectral Reuse Distance: Divining Spatial Information from Temporal Data Anthony M. Cabrera (Washington Univ. St. Louis, Arm Research), Roger D. Chamberlain (Washington Univ. St. Louis), and Jonathan C. Beard (Arm Research) Abstract—The problem of efficiently feeding processing elements and finding ways to reduce data movement is pervasive in computing. Efficient modeling of both temporal and spatial locality of memory references is invaluable in identifying superfluous data movement in a given application.  To this end, we present a new way to infer both spatial and temporal locality using reuse distance analysis. This is accomplished by performing reuse distance analysis at different data block granularities: specifically, 64B, 4KiB, and 2MiB sizes. This process of simultaneously observing reuse distance with multiple granularities is called multi-spectral reuse distance. This approach allows for a qualitative analysis of spatial locality, through observing the shifting of mass in an application’s reuse signature at different granularities. Furthermore, the shift of mass is empirically measured by calculating the Earth Mover’s Distance between reuse signatures of an application.  From the characterization, it is possible to determine how spatially dense the memory references of an application are based on the degree to which the mass has shifted (or not shifted) and how close (or far) the Earth Mover’s Distance is to zero as the data block granularity is increased. It is also possible to determine an appropriate page size from this information, and whether or not a given page is being fully utilized. From the applications profiled, it is observed that not all applications will benefit from having a larger page size. Additionally, larger data block granularities subsuming smaller ones suggest that larger pages will allow for more spatial locality exploitation, but examining the memory footprint will show whether those larger pages are fully utilized or not. Many-target, Many-sensor Ship Tracking and Classification Leonard Kosta (Boston Univ.), John Irvine (Draper), Laura Seaman (Draper), Hongwei Xi (Boston Univ.) Government agencies such as DARPA wish to know the numbers, locations, tracks, and types of vessels moving through strategically important regions of the ocean. We implement a multiple hypothesis testing algorithm to simultaneously track dozens of ships with longitude and latitude data from many sensors, then use a combination of behavioral fingerprinting and deep learning techniques to classify each vessel by type. The number of targets is unknown a priori. We achieve both high track purity and high classification accuracy on several datasets. Streaming 1.9 Billion Hypersparse Network Updates per Second with D4M Jeremy Kepner, Vijay Gadepally, Lauren Milechin, Siddharth Samsi, William Arcand, David Bestor, William Bergeron, Chansup Byun, Matthew Hubbell, Michael Houle, Michael Jones, Anne Klein, Peter Michaleas, Julie Mullen, Andrew Prout, Antonio Rosa, Charles Yee, Albert Reuther (MIT) The Dynamic Distributed Dimensional Data Model (D4M) library implements associative arrays in a variety of languages (Python, Julia, and Matlab/Octave) and provides a lightweight in-memory database implementation of hypersparse arrays that are ideal for analyzing many types of network data. D4M relies on associative arrays which combine properties of spreadsheets, databases, matrices, graphs, and networks, while providing rigorous mathematical guarantees, such as linearity. Streaming updates of D4M associative arrays put enormous pressure on a the memory hierarchy. This work describes the design and performance optimization of an implementation of hierarchical associative arrays that reduces memory pressure and dramatically increases the update rate into an associative array. The parameters of hierarchical associative arrays rely on controlling the number of entries in each level in the hierarchy before an update is cascaded. The parameters are easily tunable to achieve optimal performance for a variety of applications. Hierarchical arrays achieve over 40,000 updates per second in a single instance. Scaling to 34,000 instances of hierarchical D4M associative arrays on 1,100 server nodes on the MIT SuperCloud achieved a sustained update rate of 1,900,000,000 updates per second. This capability allows the MIT SuperCloud to analyze extremely large streaming network data sets.
Thursday, September 26, 2019 High Performance Data Analysis 1 10:20-12:00 in Eden Vale C3 Chair: Nikos Pitsianis / Aristotle Auxillary Maximum Likelihood Estimation for Noisy Point Cloud Registration Cole Campton, Xiaobai Sun (Duke) We establish first a theoretical foundation for the use of Gromov- Hausdorff (GH) distance for point set registration with homeomorphic deformation maps perturbed by Gaussian noise. We then present a probabilistic, deformable registration framework. At the core of the framework is a highly efficient iterative algorithm with guaranteed convergence to a local minimum of the GH-based objective function. The framework has two other key components -- a multi-scale stochastic shape descriptor and a data compression scheme. We also present an experimental comparison between our method and two existing influential methods on non-rigid motion between digital anthropomorphic phantoms extracted from physical data of multiple individuals. Multi-spectral Reuse Distance: Divining Spatial Information from Temporal Data Anthony M. Cabrera (Washington Univ. St. Louis, Arm Research), Roger D. Chamberlain (Washington Univ. St. Louis), and Jonathan C. Beard (Arm Research) Abstract—The problem of efficiently feeding processing elements and finding ways to reduce data movement is pervasive in computing. Efficient modeling of both temporal and spatial locality of memory references is invaluable in identifying superfluous data movement in a given application.  To this end, we present a new way to infer both spatial and temporal locality using reuse distance analysis. This is accomplished by performing reuse distance analysis at different data block granularities: specifically, 64B, 4KiB, and 2MiB sizes. This process of simultaneously observing reuse distance with multiple granularities is called multi-spectral reuse distance. This approach allows for a qualitative analysis of spatial locality, through observing the shifting of mass in an application’s reuse signature at different granularities. Furthermore, the shift of mass is empirically measured by calculating the Earth Mover’s Distance between reuse signatures of an application.  From the characterization, it is possible to determine how spatially dense the memory references of an application are based on the degree to which the mass has shifted (or not shifted) and how close (or far) the Earth Mover’s Distance is to zero as the data block granularity is increased. It is also possible to determine an appropriate page size from this information, and whether or not a given page is being fully utilized. From the applications profiled, it is observed that not all applications will benefit from having a larger page size. Additionally, larger data block granularities subsuming smaller ones suggest that larger pages will allow for more spatial locality exploitation, but examining the memory footprint will show whether those larger pages are fully utilized or not. Many-target, Many-sensor Ship Tracking and Classification Leonard Kosta (Boston Univ.), John Irvine (Draper), Laura Seaman (Draper), Hongwei Xi (Boston Univ.) Government agencies such as DARPA wish to know the numbers, locations, tracks, and types of vessels moving through strategically important regions of the ocean. We implement a multiple hypothesis testing algorithm to simultaneously track dozens of ships with longitude and latitude data from many sensors, then use a combination of behavioral fingerprinting and deep learning techniques to classify each vessel by type. The number of targets is unknown a priori. We achieve both high track purity and high classification accuracy on several datasets. Streaming 1.9 Billion Hypersparse Network Updates per Second with D4M Jeremy Kepner, Vijay Gadepally, Lauren Milechin, Siddharth Samsi, William Arcand, David Bestor, William Bergeron, Chansup Byun, Matthew Hubbell, Michael Houle, Michael Jones, Anne Klein, Peter Michaleas, Julie Mullen, Andrew Prout, Antonio Rosa, Charles Yee, Albert Reuther (MIT) The Dynamic Distributed Dimensional Data Model (D4M) library implements associative arrays in a variety of languages (Python, Julia, and Matlab/Octave) and provides a lightweight in-memory database implementation of hypersparse arrays that are ideal for analyzing many types of network data. D4M relies on associative arrays which combine properties of spreadsheets, databases, matrices, graphs, and networks, while providing rigorous mathematical guarantees, such as linearity. Streaming updates of D4M associative arrays put enormous pressure on a the memory hierarchy. This work describes the design and performance optimization of an implementation of hierarchical associative arrays that reduces memory pressure and dramatically increases the update rate into an associative array. The parameters of hierarchical associative arrays rely on controlling the number of entries in each level in the hierarchy before an update is cascaded. The parameters are easily tunable to achieve optimal performance for a variety of applications. Hierarchical arrays achieve over 40,000 updates per second in a single instance. Scaling to 34,000 instances of hierarchical D4M associative arrays on 1,100 server nodes on the MIT SuperCloud achieved a sustained update rate of 1,900,000,000 updates per second. This capability allows the MIT SuperCloud to analyze extremely large streaming network data sets.