2019 IEEE High Performance Extreme Computing Conference (HPEC ‘19) Twenty-third Annual HPEC Conference 24 - 26 September 2019 Westin Hotel, Waltham, MA USA
Wednesday, September 25, 2019 BRAIDS: Boosting Resilience through Artificial Intelligence and Decision Support 1 1:00-2:40 in Eden Vale A1 Chair: Alexia Schulz / MIT-LL, Pierre Trepagnier / MIT-LL, Igor Linkov / ACE, Matthew Bates / ACE Proactive Cyber Situation Awareness via High Performance Computing Allan Wollaber, Jaime Peña, Benjamin Blease, Leslie Shing, Kenneth Alperin, Serge Vilvovsky, Pierre Trepagnier (MIT-LL), Neal Wagner (STR), Leslie Leonard (U.S. Army ERDC) Cyber situation awareness technologies have largely been focused on present-state conditions, with limited abilities to forward-project nominal conditions in a contested environment. We demonstrate an approach that uses data-driven, high performance computing (HPC) simulations of attacker/defender activities in a logically connected network environment that enables this capability for interactive, operational decision making in real time. Our contributions are three-fold: (1) we link live cyber data to inform the parameters of a cybersecurity model, (2) we perform HPC simulations and optimizations with a genetic algorithm to evaluate and recommend risk remediation strategies that inhibit attacker lateral movement, and (3) we provide a prototype platform to allow cyber defenders to assess the value of their own alternative risk reduction strategies on a relevant timeline. We present an overview of the data and software architectures, and results are presented that demonstrate operational utility alongside HPC-enabled runtimes. Hypersparse Neural Network Analysis of Large-Scale Internet Traffic Jeremy Kepner (MIT LLSC), Kenjiro Cho (Internet Initiative Japan), KC Claffy (UCSD), Vijay Gadepally (MIT LLSC), Peter Michaleas (MIT LLSC), Lauren Milechin (MIT EAPS) The Internet is transforming our society, necessitating a quantitative understanding of Internet traffic. Our team collects and curates the largest publicly available Internet traffic data containing 50 billion packets. Utilizing a novel hypersparse neural network analysis of “video” streams of this traffic using 10,000 processors in the MIT SuperCloud reveals a new phenomena: the importance of otherwise unseen leaf nodes and isolated links in Internet traffic. Our neural network approach further shows that a two-parameter modified Zipf-Mandelbrot distribution accurately describes a wide variety of source/destination statistics on moving sample windows ranging from 100,000 to 100,000,000 packets over collections that span years and continents. The inferred model parameters distinguish different network streams and the model leaf parameter strongly correlates with the fraction of the traffic in different underlying network topologies. The hypersparse neural network pipeline is highly adaptable and different network statistics and training models can be incorporated with simple changes to the image filter functions. Hardware IP Classification through Weighted Characteristics Brendan McGeehan (University of Arkansas); Flora Smith (University of Arkansas); Thao Le (University of Arkansas); Hunter Nauman (University of Arkansas); Jia Di (University of Arkansas)* Today’s business model for hardware designs frequently incorporates third-party Intellectual Property (IP) mainly due to economic motivations. However, allowing third-party involvement also increases the possibility of malicious attacks, such as hardware Trojan insertion, which is a particularly dangerous security threat because functional testing can often leave the Trojan undetected. This research provides an improvement on a Trojan detection method and tool known as Structural Checking which analyzes Register-Transfer Level (RTL) soft IPs. Given an unknown IP, the tool will break down the design and label ports and signals with assets. Analyzing the asset patterns reveals how the IP is structured and provides information about its overall functionality. The tool incorporates a library of known designs referred to as the Golden Reference Library (GRL). All entries in the library, grouped into known-clean and know-infested, are analyzed in the same manner. A weighted percent match for each library entry against the unknown IP is calculated. A report is generated detailing all mismatched locations where users need to take a closer look. Due to the structural variability of soft IP designs, it is vital to provide the best possible weighting to best match the unknown IP to the most similar library entry. This paper provides a statistical approach to finding the best weights to optimize the tool’s matching algorithm. Cyber Baselining: Statistical properties of cyber time series and the search for stability Alexia Schulz (MIT Lincoln Laboratory)*; Pierre Trepagnier (MIT Lincoln Laboratory); Allan Wollaber (MIT Lincoln Laboratory); Ethan Aubin (MIT LIncoln Laboratory) Many predictive cyber analytics assume, implicitly or explicitly, that the underlying statistical processes they treat have simple properties. Often statistics predicated on Wiener processes are used, but even if not, assumptions on statistical stationarity, ergodicity, and memorylessness are often present. We present here empirical observations of several common network time series, and demonstrate that these assumptions are false; the series are non-stationary, non-ergodic, and possess complicated correlation structures. We compute several statistical tests, borrowed from other disciplines, for the evaluation of network time series. We discuss the implications of these results on the larger goal of constructing a meaningful cyber baseline of a network or host, intended to establish the bounds of “normal” behavior. For many common network observables used in defensive cyber operations, it may prove to be unrealistic to establish such a baseline, or detect significant deviations from it. Combining Tensor Decompositions and Graph Analytics to Provide Cyber Situational Awareness at HPC Scale James Ezick, Tom Henretty, Muthu Baskaran, Richard Lethin (Reservoir Labs), John Feo (PNNL), Tai-Ching Tuan, Christopher Coley (Univ. Maryland), Leslie Leonard, Rajeev Agrawal, William Glodek, Ben Parsons (U.S. Army ERDC) This paper describes MADHAT (Multidimensional Anomaly Detection fusing HPC, Analytics, and Tensors), an integrated workflow that demonstrates the applicability of HPC resources to the problem of maintaining cyber situational awareness. MADHAT combines two high- performance packages: ENSIGN for large-scale sparse tensor decompositions and HAGGLE for graph analytics. Tensor decompositions isolate coherent patterns of network behavior in ways that common clustering methods based on distance metrics cannot. Parallelized graph analysis then uses directed queries on a representation that combines the elements of identified patterns with other available information (such as additional log fields, domain knowledge, network topology, whitelists and blacklists, prior feedback, and published alerts) to confirm or reject a threat hypothesis, collect context, and raise alerts. MADHAT was developed using the collaborative HPC Architecture for Cyber Situational Awareness (HACSAW) research environment and evaluated on structured network sensor logs collected from Defense Research and Engineering Network (DREN) sites using HPC resources at the U.S. Army Engineer Research and Development Center DoD Supercomputing Resource Center (ERDC DSRC). To date, MADHAT has analyzed logs with over 650 million entries. [Best Student Paper Finalist] A Survey of Attacks and Defenses of Edge-Deployed Neural Networks Mihailo Isakov (Boston Univ.), Vijay Gadepally (MIT-LL), Karen M. Gettings (MIT-LL), Michel A. Kinsy (Boston Univ.) Deep Neural Network (DNN) workloads are quickly moving from datacenters onto edge devices, for latency, privacy, or energy reasons. While datacenter networks can be protected using conventional cybersecurity measures, edge neural networks bring a host of new security challenges. Unlike classic IoT applications, edge neural networks are typically very compute and memory intensive, their execution is data- independent, and they are robust to noise and faults. Neural network models may be very expensive to develop, and can potentially reveal information about the private data they were trained on, requiring special care in distribution. The hidden states and outputs of the network can also be used in reconstructing user inputs, potentially violating users' privacy. Furthermore, neural networks are vulnerable to adversarial attacks, which may cause misclassifications and violate the integrity of the output. These properties add challenges when securing edge- deployed DNNs, requiring new considerations, threat models, priorities, and approaches in securely and privately deploying DNNs to the edge. In this work, we cover the landscape of attacks on, and defenses, of neural networks deployed in edge devices and provide a taxonomy of attacks and defenses targeting edge DNNs.