2015 IEEE High Performance
Extreme Computing Conference
(HPEC ‘15)
Nineteenth Annual HPEC Conference
15 - 17 September 2015
Westin Hotel, Waltham, MA USA
Resilient/Secure/Parallel Computing 3
3:00-4:40 in Eden Vale A1 - A2
Chair: Patrick Dreher / MIT
Automatic Cluster Parallelization and Minimizing
Communication via Selective Data Replication
Sanket Tavarageri, Benoit Meister, Muthu Baskaran, Benoit
Pradelle, Tom Henretty, Athanasios Konstantinidis, Ann
Johnson, Richard Lethin
Reservoir Labs
The technology scaling has initiated two distinct
trends that are likely to continue into future: first,
the increased parallelism in hardware and second,
the increasing performance and energy cost of
communication relative to computation. Both of the
above trends call for development of compiler and
runtime systems to automatically parallelize
programs and reduce communication in parallel
computations to achieve the desired high
performance in an energy-efficient fashion. The
tasks of parallelization and orchestrating efficient
data movement are more complicated in the
context of clusters because of a lack of shared
memory. In this paper, we propose the design of
an integrated compiler and runtime system that
auto-parallelizes loop-nests to clusters and, a
novel communication avoidance method that
reduces data movement between processors.
Communication minimization is achieved via data
replication: data is replicated so that a larger share
of the whole data set may be mapped to a
processor and hence, non-local memory accesses
reduced. The runtime performs data replication in
a resource-aware, application characteristics-
aware fashion and maintains data coherence.
Experiments on a number of benchmarks show the
effectiveness of the approach.
Enabling On-Demand Database Computing with
MIT SuperCloud Database Management System
Andrew Prout, Jeremy Kepner, Peter Michaleas, William
Arcand, David Bestor, Bill Bergeron, Chansup Byun, Lauren
Edwards, Vijay Gadepally, Matthew Hubbell, Julie Mullen,
Antonio Rosa, Charles Yee, Albert Reuther, MIT Lincoln
Laboratory
The MIT SuperCloud database management
system allows for rapid creation and flexible
execution of a variety of the latest scientific
databases, including Apache Accumulo and SciDB.
It is designed to permit these databases to run on
a High Performance Computing Cluster (HPCC)
platform as seamlessly as any other HPCC job. It
ensures the seamless migration of the databases
to the resources assigned by the HPCC scheduler
and centralized storage of the database files when
not running. It also permits snapshotting of
databases to allow researchers to experiment and
push the limits of the technology without concerns
for data or productivity loss if the database
becomes unstable.
FIDES: Enhancing Trust in Reconfigurable Based
Hardware Systems Effective Parallelization
Strategies for Scalable, High Performance Radio
Frequency Ray Tracing
Christiaan Gribble, Jefferson Amstutz
Extensive use of third party IP cores (e.g., HDL,
netlist) and open source tools in the FPGA
application design and development process in
conjunction with the inadequate bitstream
protection measures have raised crucial security
concerns in the past for reconfigurable hardware
systems. Designing high fidelity and secure
methodologies for FPGAs are still infancy and in
particular, there are almost no concrete
methods/techniques that can ensure trust in FPGA
applications not entirely designed and/or
developed in a trusted environment. This work
strongly suggests the need for an anomaly
detection capability within the FPGAs that can
continuously monitor the behavior of the underlying
FPGA IP cores and the communication activities of
IP cores with other IP cores or peripherals for any
abnormalities. To capture this need, we propose a
technique called FIDelity Enhancing Security
(FIDES) methodology for FPGAs that uses a
combination of access control policies and
behavior learning techniques for anomaly
detection. FIDES essentially comprises of two
components: (i) {\em Trusted Wrappers}, a layer of
monitors with sensing capabilities distributed
across the FPGA fabric; these wrappers embed
the output of each IP core $i$ with a tag $\tau_i$
according to the pre-defined security policy $\Pi$
and also verifies the embeddings of each input to
the IP core to detect any violation of policies. The
use of tagging and tracking enables us to capture
the generalized interactions of each IP core with its
environment (e.g., other IP cores, memory, OS or
I/O ports). {\em Trusted Wrappers} also monitors
the statistical properties exhibited by each IP core
functions on execution such as power
consumption, number of clock cycles and timing
variations to detect any anomalous operations; (ii)
a {\em Trusted Anchor} that monitors the
communication between the IP cores and the
peripherals with regard to the centralized security
policies $\Psi$ and the statistical properties
produced by the peripherals. We target FIDES
architecture on a Xilinx Zynq 7020 device for a
red-black system comprising of sensitive and non-
sensitive IP cores. Our FIDES implementation
leads to only 1-2\% overhead in terms of the logic
resources and latency per wrapper. Furthermore,
we observe a latency of 1.5X , measured in terms
of clock cycles, as compared to the baseline
implementation, when all the communications are
routed to the Trusted Anchor for centralized policy
checking and verification; this clearly manifests the
advantage of using distributed wrappers within the
system on contrary to centralized policy checking.
Effective Parallelization Strategies for Scalable,
High Performance Radio Frequency Ray Tracing
Christiaan Gribble*, SURVICE Engineering Company,
USA; Jefferson Amstutz, SURVICE Engineering
Company, USA
We present StingRay, an interactive environment
for combined RF simulation and visualization
based on ray tracing. StingRay is explicitly
designed to support scalable, high performance
simulation and visualization of RF energy
propagation in complex urban environments using
modern, highly parallel computer architectures. We
explore three strategies for exploiting parallelism in
StingRay and provide evaluations of their
scalability and performance on a modern
workstation-class system. Results show that a
more scalable, higher performing version of
StingRay is possible with careful attention to the
expression of task-level parallelism in OpenMP.
Thursday, September 17