2015 IEEE High Performance Extreme Computing Conference (HPEC ‘15) Nineteenth Annual HPEC Conference 15 - 17 September 2015 Westin Hotel, Waltham, MA USA
Resilient/Secure/Parallel Computing 3 3:00-4:40 in Eden Vale A1 - A2 Chair: Patrick Dreher / MIT Automatic Cluster Parallelization and Minimizing Communication via Selective Data Replication Sanket Tavarageri, Benoit Meister, Muthu Baskaran, Benoit Pradelle, Tom Henretty, Athanasios Konstantinidis, Ann Johnson, Richard Lethin Reservoir Labs The technology scaling has initiated two distinct trends that are likely to continue into future: first, the increased parallelism in hardware and second, the increasing performance and energy cost of communication relative to computation. Both of the above trends call for development of compiler and runtime systems to automatically parallelize programs and reduce communication in parallel computations to achieve the desired high performance in an energy-efficient fashion. The tasks of parallelization and orchestrating efficient data movement are more complicated in the context of clusters because of a lack of shared memory.  In this paper, we propose the design of an integrated compiler and runtime system that auto-parallelizes loop-nests to clusters and, a novel communication avoidance method that reduces data movement between processors. Communication minimization is achieved via data replication: data is replicated so that a larger share of the whole data set may be mapped to a processor and hence, non-local memory accesses reduced. The runtime performs data replication in a resource-aware, application characteristics- aware fashion and maintains data coherence. Experiments on a number of benchmarks show the effectiveness of the approach. Enabling On-Demand Database Computing with MIT SuperCloud Database Management System Andrew Prout, Jeremy Kepner, Peter Michaleas, William Arcand, David Bestor, Bill Bergeron, Chansup Byun, Lauren Edwards, Vijay Gadepally, Matthew Hubbell, Julie Mullen, Antonio Rosa, Charles Yee, Albert Reuther, MIT Lincoln Laboratory The MIT SuperCloud database management system allows for rapid creation and flexible execution of a variety of the latest scientific databases, including Apache Accumulo and SciDB. It is designed to permit these databases to run on a High Performance Computing Cluster (HPCC) platform as seamlessly as any other HPCC job. It ensures the seamless migration of the databases to the resources assigned by the HPCC scheduler and centralized storage of the database files when not running. It also permits snapshotting of databases to allow researchers to experiment and push the limits of the technology without concerns for data or productivity loss if the database becomes unstable. FIDES: Enhancing Trust in Reconfigurable Based Hardware Systems Effective Parallelization Strategies for Scalable, High Performance Radio Frequency Ray Tracing Christiaan Gribble, Jefferson Amstutz Extensive use of third party IP cores (e.g., HDL, netlist) and open source tools in the FPGA application design and development process in conjunction with the inadequate bitstream protection measures have raised crucial security concerns in the past for reconfigurable hardware systems. Designing high fidelity and secure methodologies for FPGAs are still infancy and in particular, there are almost no concrete methods/techniques that can ensure trust in FPGA applications not entirely designed and/or developed in a trusted environment. This work strongly suggests the need for an anomaly detection capability within the FPGAs that can continuously monitor the behavior of the underlying FPGA IP cores and the communication activities of IP cores with other IP cores or peripherals for any abnormalities. To capture this need, we propose a technique called FIDelity Enhancing Security (FIDES) methodology for FPGAs that uses a combination of access control policies and behavior learning techniques for anomaly detection.   FIDES essentially comprises of two components: (i) {\em Trusted Wrappers}, a layer of monitors with sensing capabilities distributed across the FPGA fabric; these wrappers embed the output of each IP core $i$ with a tag $\tau_i$ according to the pre-defined security policy $\Pi$ and  also verifies the embeddings of each input to the IP core to detect any violation of policies. The use of tagging and tracking enables us to capture  the generalized interactions of each IP core with its environment (e.g., other IP cores, memory, OS or I/O ports). {\em Trusted Wrappers} also  monitors the statistical properties exhibited by each IP core functions on execution such as  power consumption, number of clock cycles and timing variations to detect any anomalous operations; (ii) a {\em Trusted Anchor} that monitors the communication between the  IP cores and the peripherals with regard to the centralized security policies $\Psi$ and the statistical properties produced by the peripherals. We target FIDES architecture on a Xilinx Zynq 7020 device for a red-black system comprising of sensitive and non-sensitive IP cores. Our FIDES implementation leads to only 1-2\% overhead in terms of the logic resources and latency per wrapper. Furthermore, we observe a latency of 1.5X , measured in terms of clock cycles, as compared to the baseline implementation, when all the communications are routed to the Trusted Anchor for centralized policy checking and verification; this clearly manifests the advantage of using distributed wrappers within the system on contrary to centralized policy checking. Effective Parallelization Strategies for Scalable, High Performance Radio Frequency Ray Tracing Christiaan Gribble*, SURVICE Engineering Company, USA; Jefferson Amstutz, SURVICE Engineering Company, USA We present StingRay, an interactive environment for combined RF simulation and visualization based on ray tracing. StingRay is explicitly designed to support scalable, high performance simulation and visualization of RF energy propagation in complex urban environments using modern, highly parallel computer architectures. We explore three strategies for exploiting parallelism in StingRay and provide evaluations of their scalability and performance on a modern workstation-class system. Results show that a more scalable, higher performing version of StingRay is possible with careful attention to the expression of task-level parallelism in OpenMP.
Thursday, September 17
2015 IEEE High Performance Extreme Computing Conference (HPEC ‘15) Nineteenth Annual HPEC Conference 15 - 17 September 2015 Westin Hotel, Waltham, MA USA
Resilient/Secure/Parallel Computing 3 3:00-4:40 in Eden Vale A1 - A2 Chair: Patrick Dreher / MIT Automatic Cluster Parallelization and Minimizing Communication via Selective Data Replication Sanket Tavarageri, Benoit Meister, Muthu Baskaran, Benoit Pradelle, Tom Henretty, Athanasios Konstantinidis, Ann Johnson, Richard Lethin Reservoir Labs The technology scaling has initiated two distinct trends that are likely to continue into future: first, the increased parallelism in hardware and second, the increasing performance and energy cost of communication relative to computation. Both of the above trends call for development of compiler and runtime systems to automatically parallelize programs and reduce communication in parallel computations to achieve the desired high performance in an energy-efficient fashion. The tasks of parallelization and orchestrating efficient data movement are more complicated in the context of clusters because of a lack of shared memory.  In this paper, we propose the design of an integrated compiler and runtime system that auto-parallelizes loop-nests to clusters and, a novel communication avoidance method that reduces data movement between processors. Communication minimization is achieved via data replication: data is replicated so that a larger share of the whole data set may be mapped to a processor and hence, non-local memory accesses reduced. The runtime performs data replication in a resource-aware, application characteristics- aware fashion and maintains data coherence. Experiments on a number of benchmarks show the effectiveness of the approach. Enabling On-Demand Database Computing with MIT SuperCloud Database Management System Andrew Prout, Jeremy Kepner, Peter Michaleas, William Arcand, David Bestor, Bill Bergeron, Chansup Byun, Lauren Edwards, Vijay Gadepally, Matthew Hubbell, Julie Mullen, Antonio Rosa, Charles Yee, Albert Reuther, MIT Lincoln Laboratory The MIT SuperCloud database management system allows for rapid creation and flexible execution of a variety of the latest scientific databases, including Apache Accumulo and SciDB. It is designed to permit these databases to run on a High Performance Computing Cluster (HPCC) platform as seamlessly as any other HPCC job. It ensures the seamless migration of the databases to the resources assigned by the HPCC scheduler and centralized storage of the database files when not running. It also permits snapshotting of databases to allow researchers to experiment and push the limits of the technology without concerns for data or productivity loss if the database becomes unstable. FIDES: Enhancing Trust in Reconfigurable Based Hardware Systems Effective Parallelization Strategies for Scalable, High Performance Radio Frequency Ray Tracing Christiaan Gribble, Jefferson Amstutz Extensive use of third party IP cores (e.g., HDL, netlist) and open source tools in the FPGA application design and development process in conjunction with the inadequate bitstream protection measures have raised crucial security concerns in the past for reconfigurable hardware systems. Designing high fidelity and secure methodologies for FPGAs are still infancy and in particular, there are almost no concrete methods/techniques that can ensure trust in FPGA applications not entirely designed and/or developed in a trusted environment. This work strongly suggests the need for an anomaly detection capability within the FPGAs that can continuously monitor the behavior of the underlying FPGA IP cores and the communication activities of IP cores with other IP cores or peripherals for any abnormalities. To capture this need, we propose a technique called FIDelity Enhancing Security (FIDES) methodology for FPGAs that uses a combination of access control policies and behavior learning techniques for anomaly detection.   FIDES essentially comprises of two components: (i) {\em Trusted Wrappers}, a layer of monitors with sensing capabilities distributed across the FPGA fabric; these wrappers embed the output of each IP core $i$ with a tag $\tau_i$ according to the pre-defined security policy $\Pi$ and  also verifies the embeddings of each input to the IP core to detect any violation of policies. The use of tagging and tracking enables us to capture  the generalized interactions of each IP core with its environment (e.g., other IP cores, memory, OS or I/O ports). {\em Trusted Wrappers} also  monitors the statistical properties exhibited by each IP core functions on execution such as  power consumption, number of clock cycles and timing variations to detect any anomalous operations; (ii) a {\em Trusted Anchor} that monitors the communication between the  IP cores and the peripherals with regard to the centralized security policies $\Psi$ and the statistical properties produced by the peripherals. We target FIDES architecture on a Xilinx Zynq 7020 device for a red-black system comprising of sensitive and non- sensitive IP cores. Our FIDES implementation leads to only 1-2\% overhead in terms of the logic resources and latency per wrapper. Furthermore, we observe a latency of 1.5X , measured in terms of clock cycles, as compared to the baseline implementation, when all the communications are routed to the Trusted Anchor for centralized policy checking and verification; this clearly manifests the advantage of using distributed wrappers within the system on contrary to centralized policy checking. Effective Parallelization Strategies for Scalable, High Performance Radio Frequency Ray Tracing Christiaan Gribble*, SURVICE Engineering Company, USA; Jefferson Amstutz, SURVICE Engineering Company, USA We present StingRay, an interactive environment for combined RF simulation and visualization based on ray tracing. StingRay is explicitly designed to support scalable, high performance simulation and visualization of RF energy propagation in complex urban environments using modern, highly parallel computer architectures. We explore three strategies for exploiting parallelism in StingRay and provide evaluations of their scalability and performance on a modern workstation-class system. Results show that a more scalable, higher performing version of StingRay is possible with careful attention to the expression of task-level parallelism in OpenMP.
Thursday, September 17