Home Welcome Message Committee Invited Speakers Program Demos
2015 IEEE High Performance Extreme Computing Conference (HPEC ‘15) Nineteenth Annual HPEC Conference 15 - 17 September 2015 Westin Hotel, Waltham, MA USA
Resilient/Secure/Parallel Computing 1 10:20-12:00 in Eden Vale A1 - A2 Chair: Franz Franchetti / CMU Invited Talk Internet-of-Things Security and Forensics - A New Frontier of Risk and Reward  Prof. Tony Skjellum, Director - Auburn University Cyber Research Center   Enabling Application Resilience through Programming Model based Fault Amelioration Saurabh Hukerikar, Pedro C. Diniz, Robert F. Lucas, University of Southern California High Performance Computing applications running on future exascale class systems will encounter accelerated rates of faults and errors.  Therefore, resilience is a key challenge for HPC applications that will run on these large scale systems. The most widely used resiliency approach today, based on Checkpoint and Rollback (C/R) recovery, is not expected to remain viable in the presence of frequent errors and failures. In this paper, we present a framework for enabling application recovery from error states through fault amelioration. This is accomplished through programming model extensions which enable algorithmic fault amelioration knowledge to be expressed as intrinsic features of the programming environment. Our approach is based on a set of language extensions which are supported by a compiler infrastructure and a runtime system.   We experimentally demonstrate that the framework enables recovery from errors in the program state with low overhead to the application performance. Secure Architecture for Embedded Systems Michael Vai, Ben Nahill, Josh Kramer, Michael Geis, Dan Utin, David Whelihan, Roger Khazan, MIT Lincoln Laboratory Devices connected to the internet are increasingly the targets of deliberate and sophisticated attacks [1]. Embedded system engineers tend to focus on well-defined functional capabilities rather than “obscure” security and resilience. However, “after-the- fact” system hardening could be prohibitively expensive or even impossible. The co-design of security and resilience with functionality has to overcome a major challenge; rarely can the security and resilience requirements be accurately identified when the design begins. This paper describes an embedded system architecture that decouples secure and functional design aspects. DDR Memory Errors caused by Row Hammer Barbara Aichinger, FuturePlus Systems Corporation DDR3 memory is at the heart of almost all cloud computing servers today.  A recently publicized failure mechanism in DDR3 memory, coined Row Hammer,  has been shown to not only be a reliability issue but also a security risk.  No industry standards group, government agency or trade association has signed up to address this issue.  Data Centers and end users are on their own.  This paper will discuss briefly the problem, mitigation strategies and a unique testing tool to determine what applications have the potential to create these types of failures.  Alive demonstration can be shown in the demo area. Invited Talk: Kickstarting Parallel Computing for the Masses Mr. Andreas Olofsson, CEO Adapteva
Thursday, September 17
2015 IEEE High Performance Extreme Computing Conference (HPEC ‘15) Nineteenth Annual HPEC Conference 15 - 17 September 2015 Westin Hotel, Waltham, MA USA
Resilient/Secure/Parallel Computing 1 10:20-12:00 in Eden Vale A1 - A2 Chair: Franz Franchetti / CMU Invited Talk Internet-of-Things Security and Forensics - A New Frontier of Risk and Reward  Prof. Tony Skjellum, Director - Auburn University Cyber Research Center   Enabling Application Resilience through Programming Model based Fault Amelioration Saurabh Hukerikar, Pedro C. Diniz, Robert F. Lucas, University of Southern California High Performance Computing applications running on future exascale class systems will encounter accelerated rates of faults and errors.  Therefore, resilience is a key challenge for HPC applications that will run on these large scale systems. The most widely used resiliency approach today, based on Checkpoint and Rollback (C/R) recovery, is not expected to remain viable in the presence of frequent errors and failures. In this paper, we present a framework for enabling application recovery from error states through fault amelioration. This is accomplished through programming model extensions which enable algorithmic fault amelioration knowledge to be expressed as intrinsic features of the programming environment. Our approach is based on a set of language extensions which are supported by a compiler infrastructure and a runtime system.   We experimentally demonstrate that the framework enables recovery from errors in the program state with low overhead to the application performance. Secure Architecture for Embedded Systems Michael Vai, Ben Nahill, Josh Kramer, Michael Geis, Dan Utin, David Whelihan, Roger Khazan, MIT Lincoln Laboratory Devices connected to the internet are increasingly the targets of deliberate and sophisticated attacks [1]. Embedded system engineers tend to focus on well-defined functional capabilities rather than “obscure” security and resilience. However, “after- the-fact” system hardening could be prohibitively expensive or even impossible. The co-design of security and resilience with functionality has to overcome a major challenge; rarely can the security and resilience requirements be accurately identified when the design begins. This paper describes an embedded system architecture that decouples secure and functional design aspects. DDR Memory Errors caused by Row Hammer Barbara Aichinger, FuturePlus Systems Corporation DDR3 memory is at the heart of almost all cloud computing servers today.  A recently publicized failure mechanism in DDR3 memory, coined Row Hammer,  has been shown to not only be a reliability issue but also a security risk.  No industry standards group, government agency or trade association has signed up to address this issue.  Data Centers and end users are on their own.  This paper will discuss briefly the problem, mitigation strategies and a unique testing tool to determine what applications have the potential to create these types of failures.  Alive demonstration can be shown in the demo area. Invited Talk: Kickstarting Parallel Computing for the Masses Mr. Andreas Olofsson, CEO Adapteva
Thursday, September 17
Home