2015 IEEE High Performance
Extreme Computing Conference
(HPEC ‘15)
Nineteenth Annual HPEC Conference
15 - 17 September 2015
Westin Hotel, Waltham, MA USA
Resilient/Secure/Parallel Computing 1
10:20-12:00 in Eden Vale A1 - A2
Chair: Franz Franchetti / CMU
Invited Talk
Internet-of-Things Security and Forensics - A New Frontier of Risk and Reward
Prof. Tony Skjellum, Director - Auburn University Cyber Research Center
Enabling Application Resilience through Programming Model based Fault Amelioration
Saurabh Hukerikar, Pedro C. Diniz, Robert F. Lucas, University of Southern California
High Performance Computing applications running on future exascale class systems will encounter accelerated rates of faults
and errors. Therefore, resilience is a key challenge for HPC applications that will run on these large scale systems. The most
widely used resiliency approach today, based on Checkpoint and Rollback (C/R) recovery, is not expected to remain viable in
the presence of frequent errors and failures. In this paper, we present a framework for enabling application recovery from error
states through fault amelioration. This is accomplished through programming model extensions which enable algorithmic fault
amelioration knowledge to be expressed as intrinsic features of the programming environment. Our approach is based on a set
of language extensions which are supported by a compiler infrastructure and a runtime system. We experimentally
demonstrate that the framework enables recovery from errors in the program state with low overhead to the application
performance.
Secure Architecture for Embedded Systems
Michael Vai, Ben Nahill, Josh Kramer, Michael Geis, Dan Utin, David Whelihan, Roger Khazan, MIT Lincoln Laboratory
Devices connected to the internet are increasingly the targets of deliberate and sophisticated attacks [1]. Embedded system
engineers tend to focus on well-defined functional capabilities rather than “obscure” security and resilience. However, “after-the-
fact” system hardening could be prohibitively expensive or even impossible. The co-design of security and resilience with
functionality has to overcome a major challenge; rarely can the security and resilience requirements be accurately identified
when the design begins. This paper describes an embedded system architecture that decouples secure and functional design
aspects.
DDR Memory Errors caused by Row Hammer
Barbara Aichinger, FuturePlus Systems Corporation
DDR3 memory is at the heart of almost all cloud computing servers today. A recently publicized failure mechanism in DDR3
memory, coined Row Hammer, has been shown to not only be a reliability issue but also a security risk. No industry standards
group, government agency or trade association has signed up to address this issue. Data Centers and end users are on their
own. This paper will discuss briefly the problem, mitigation strategies and a unique testing tool to determine what applications
have the potential to create these types of failures. Alive demonstration can be shown in the demo area.
Invited Talk: Kickstarting Parallel Computing for the Masses
Mr. Andreas Olofsson, CEO Adapteva
Thursday, September 17