2018 IEEE High Performane Extreme Computing (HPEC 2016)

Designed and maintained by Ballos Associates Web Design

2018 IEEE High Performance Extreme Computing Conference (HPEC ‘18) Twenty-second Annual HPEC Conference 25 - 27 September 2018 Westin Hotel, Waltham, MA USA

Database Operations in D4M.jl Lauren Milechin (MIT EAPS)*; Vijay Gadepally (MIT Lincoln Laboratory); Jeremy Kepner (MIT Lincoln Laboratory) Each step in the data analytics pipeline is important, including database ingest and query. The D4M-Accumulo database connector has allowed analysts to quickly and easily ingest to and query from Apache Accumulo using MATLAB®/GNU Octave syntax. D4M.jl, a Julia implementation of D4M, provides much of the functionality of the original D4M implementation to the Julia community. In this work, we extend D4M.jl to include many of the same database capabilities that the MATLAB®/GNU Octave implementation provides. Here we will describe the D4M.jl database connector, demonstrate how it can be used, and show that it has comparable or better performance to the original implementation in MATLAB®/GNU Octave. High-Performance Embedded Computing (HPEC) and Machine Learning Demonstrated in Flight Using Agile Condor® Mark Barnell (Air Force Research Laboratory)* ABSTRACT: For the first time ever, advanced machine learning (ML) compute architectures, techniques, and methods were demonstrated in flight (in June-August 2017 and May 2018) on the recently invented high-performance embedded computing (HPEC) architecture called Agile Condor (U.S. Patent Pending #5497944). The Air Force Research Laboratory (AFRL) Information Directorate Advanced Computing and Communications Division continues to develop and demonstrate new computing architectures, designed to provide HPEC ground and airborne (pod-based) solutions to meet operational and tactical, real-time processing for intelligence, surveillance, and reconnaissance (ISR) mission needs. Agile Condor is a scalable system based on open industry standards that continues to demonstrate the ability to increase, far beyond the current state-of-the-art, computational capability within the restrictive size, weight and power (SWaP) constraints of unmanned aircraft systems’ external “pod” payloads. This system is enabling the exploration and development of innovative system solutions to meet future Air Force real-time HPEC needs; e.g., multi-mission and multi-function ISR processing and exploitation. The Agile Condor system innovations include: (1) a cost-effective and flexible compute architecture, (2) support for multiple missions, (3) facilitating realistic, repeatable experimentation, and (4) enabling related experimentation and applications for operational exploitation of a wide range of information products. On the recent collection, demonstration and data collection efforts, information was simultaneously processed in a parallelized approach using two distinct ML approaches. This approach enabled real-time trade-space analyses and the ability to immediately contrast and compare the approaches. The data processing also included the exploitation of data from multiple sensors, such as optical, full-motion video (FMV), and radar. Thereby, Agile Condor’s heterogenous computing architecture continues to accelerate the development of extreme computing technologies and ML algorithms necessary to exploit data on a neuromorphic compute platform upstream, closer to the sensors. The ML techniques that can be utilized include, but are not limited to, specialized deep neural networks (DNN), convolutional neural networks (CNN), and recurrent neural networks (RNN) that support sequential/temporal data products and applications for exploitation, pattern recognition, and autonomous operation. All-at-once Decomposition of Coupled Billion-scale Tensors in Apache Spark Aditya Gudibanda (Reservoir Labs)*; Thomas Henretty (Reservoir Labs); Muthu M Baskaran (Reservoir Labs); James Ezick (Reservoir Labs); Richard Lethin (Reservoir Labs) As the scale of unlabeled data rises, it becomes increasingly valuable to perform scalable, unsupervised data analysis. Tensor decompositions, which have been empirically successful at finding meaningful cross-dimensional patterns in multidimensional data, are a natural candidate to test for scalability and meaningful pattern discovery in these massive real-world datasets. Furthermore, the production of big data of different types necessitates the ability to mine patterns across disparate sources. The coupled tensor decomposition framework captures this idea by supporting the decomposition of several tensors from different data sources together. We present a scalable implementation of coupled tensor decomposition on Apache Spark. We introduce nonnegativity and sparsity constraints, and perform all-at-once quasi-Newton optimization of all factor matrix parameters. We present results showing the billion-scale scalability of this novel implementation and also demonstrate the high level of interpretability in the components produced, suggesting that coupled, all-at-once tensor decompositions on Apache Spark represent a promising framework for large- scale, unsupervised pattern discovery. Interactive Launch of 16,000 Microsoft Windows Instances on a Supercomputer Michael S Jones (MIT Lincoln Laboratory)*; Jeremy Kepner (MIT Lincoln Laboratory) Simulation, machine learning, and data analysis require a wide range of software which can be dependent upon specific operating systems, such as Microsoft Windows. Running this software interactively on massively parallel supercomputers can present many challenges. Traditional methods of scaling Microsoft Windows applications to run on thousands of processors have typically relied on heavyweight virtual machines that can be inefficient and slow to launch on modern manycore processors. This paper describes a unique approach using the Lincoln Laboratory LLMapReduce technology in combination with the Wine Windows compatibility layer to rapidly and simultaneously launch and run Microsoft Windows applications on thousands of cores on a supercomputer. Specifically, this work demonstrates launching 16,000 Microsoft Windows applications in 5 minutes running on 16,000 processor cores. This capability significantly broadens the range of applications that can be run at large scale on a supercomputer.

Thursday, September 27, 2018

High Performance Data Analysis 2 1:00-2:40 in Eden Vale C1/C2 Chair: Sid Samsi / MIT

Designed and maintained by Ballos Associates Web Design

IEEE Nondiscrimination Policy

Database Operations in D4M.jl Lauren Milechin (MIT EAPS)*; Vijay Gadepally (MIT Lincoln Laboratory); Jeremy Kepner (MIT Lincoln Laboratory) Each step in the data analytics pipeline is important, including database ingest and query. The D4M-Accumulo database connector has allowed analysts to quickly and easily ingest to and query from Apache Accumulo using MATLAB®/GNU Octave syntax. D4M.jl, a Julia implementation of D4M, provides much of the functionality of the original D4M implementation to the Julia community. In this work, we extend D4M.jl to include many of the same database capabilities that the MATLAB®/GNU Octave implementation provides. Here we will describe the D4M.jl database connector, demonstrate how it can be used, and show that it has comparable or better performance to the original implementation in MATLAB®/GNU Octave. High-Performance Embedded Computing (HPEC) and Machine Learning Demonstrated in Flight Using Agile Condor® Mark Barnell (Air Force Research Laboratory)* ABSTRACT: For the first time ever, advanced machine learning (ML) compute architectures, techniques, and methods were demonstrated in flight (in June-August 2017 and May 2018) on the recently invented high-performance embedded computing (HPEC) architecture called Agile Condor (U.S. Patent Pending #5497944). The Air Force Research Laboratory (AFRL) Information Directorate Advanced Computing and Communications Division continues to develop and demonstrate new computing architectures, designed to provide HPEC ground and airborne (pod-based) solutions to meet operational and tactical, real-time processing for intelligence, surveillance, and reconnaissance (ISR) mission needs. Agile Condor is a scalable system based on open industry standards that continues to demonstrate the ability to increase, far beyond the current state-of-the- art, computational capability within the restrictive size, weight and power (SWaP) constraints of unmanned aircraft systems’ external “pod” payloads. This system is enabling the exploration and development of innovative system solutions to meet future Air Force real-time HPEC needs; e.g., multi-mission and multi-function ISR processing and exploitation. The Agile Condor system innovations include: (1) a cost-effective and flexible compute architecture, (2) support for multiple missions, (3) facilitating realistic, repeatable experimentation, and (4) enabling related experimentation and applications for operational exploitation of a wide range of information products. On the recent collection, demonstration and data collection efforts, information was simultaneously processed in a parallelized approach using two distinct ML approaches. This approach enabled real-time trade-space analyses and the ability to immediately contrast and compare the approaches. The data processing also included the exploitation of data from multiple sensors, such as optical, full-motion video (FMV), and radar. Thereby, Agile Condor’s heterogenous computing architecture continues to accelerate the development of extreme computing technologies and ML algorithms necessary to exploit data on a neuromorphic compute platform upstream, closer to the sensors. The ML techniques that can be utilized include, but are not limited to, specialized deep neural networks (DNN), convolutional neural networks (CNN), and recurrent neural networks (RNN) that support sequential/temporal data products and applications for exploitation, pattern recognition, and autonomous operation. All-at-once Decomposition of Coupled Billion-scale Tensors in Apache Spark Aditya Gudibanda (Reservoir Labs)*; Thomas Henretty (Reservoir Labs); Muthu M Baskaran (Reservoir Labs); James Ezick (Reservoir Labs); Richard Lethin (Reservoir Labs) As the scale of unlabeled data rises, it becomes increasingly valuable to perform scalable, unsupervised data analysis. Tensor decompositions, which have been empirically successful at finding meaningful cross-dimensional patterns in multidimensional data, are a natural candidate to test for scalability and meaningful pattern discovery in these massive real-world datasets. Furthermore, the production of big data of different types necessitates the ability to mine patterns across disparate sources. The coupled tensor decomposition framework captures this idea by supporting the decomposition of several tensors from different data sources together. We present a scalable implementation of coupled tensor decomposition on Apache Spark. We introduce nonnegativity and sparsity constraints, and perform all-at-once quasi-Newton optimization of all factor matrix parameters. We present results showing the billion-scale scalability of this novel implementation and also demonstrate the high level of interpretability in the components produced, suggesting that coupled, all-at-once tensor decompositions on Apache Spark represent a promising framework for large-scale, unsupervised pattern discovery. Interactive Launch of 16,000 Microsoft Windows Instances on a Supercomputer Michael S Jones (MIT Lincoln Laboratory)*; Jeremy Kepner (MIT Lincoln Laboratory) Simulation, machine learning, and data analysis require a wide range of software which can be dependent upon specific operating systems, such as Microsoft Windows. Running this software interactively on massively parallel supercomputers can present many challenges. Traditional methods of scaling Microsoft Windows applications to run on thousands of processors have typically relied on heavyweight virtual machines that can be inefficient and slow to launch on modern manycore processors. This paper describes a unique approach using the Lincoln Laboratory LLMapReduce technology in combination with the Wine Windows compatibility layer to rapidly and simultaneously launch and run Microsoft Windows applications on thousands of cores on a supercomputer. Specifically, this work demonstrates launching 16,000 Microsoft Windows applications in 5 minutes running on 16,000 processor cores. This capability significantly broadens the range of applications that can be run at large scale on a supercomputer.

Thursday, September 27, 2018

High Performance Data Analysis 2 1:00-2:40 in Eden Vale C1/C2 Chair: Sid Samsi / MIT

HPEC 2018 25 - 27 September 2018 Westin Hotel, Waltham, MA USA