2018 IEEE High Performane Extreme Computing (HPEC 2016)

Designed and maintained by Ballos Associates Web Design

2018 IEEE High Performance Extreme Computing Conference (HPEC ‘18) Twenty-second Annual HPEC Conference 25 - 27 September 2018 Westin Hotel, Waltham, MA USA

Comprehensive EDA Tools Addressing the Intersection of Classical and Quantum Computing Jamil Kawa (Synopsys., Inc)*; Antun Domic (Synopsys); Robert Freeman (Synopsys); Kishore SInghal (Synopsys) The technical challenges and the high economic cost associated with scaling CMOS beyond 5nm heated the debate about what technology will lead the post-CMOS era. Will Josephson Junction (JJ) based superconducting electronics (SCE) operated at cryogenic temperatures be the natural candidate? Is SCE embarking on its own Moore’s law, promising to reach a level of VLSI realization? How about Quantum Computing (QC) -- Can QC play a leading role in advancing the computational needs currently fulfilled by the most powerful supercomputers? Is QC capable of efficiently executing algorithms handled by classical computers? At the same time this debate was taking place, the news from the QC front has been very encouraging. Chips with 49, 50, and 72 qubits have been implemented. The talk of “quantum supremacy” is creating heated discussion and excitement. New technologies for QC with longer de-coherence times and improved error rates are equally encouraging. However, a reality is emerging that an efficient QC system is one that encompasses tight interaction with CMOS based classical computers and memory stacks. Cold CMOS will likely have a significant role to play. New high speed, high throughput, and energy-efficient communication links will be required to handle the interactions. Optical communications links has a role to play there. Our expectation of further advancement in this area will require the design community to leverage mature, robust, and tightly integrated Engineering Design Automation (EDA) software tools for the realization of every component of such a system, be it CMOS, JJ based SCE or QC, as well as the integrated SoC and System architecture and environment. We at Synopsys have maintained leadership in EDA tools for CMOS with a set of tools that encompass the whole flow, from device and process simulation (TCAD) through full system verification. Synopsys EDA tools are stand-alone modules that are designed to interact seamlessly with each other and with industry standards, culminating in the best point tools to enable a comprehensive flow for system realization. It is through this modularization that some tools in the CMOS flow can be immediately leveraged with some enhancement to be utilized for such a comprehensive flow. Others must be created. It is also no secret that we at Synopsys are involved in a 5-year Superconducting Electronics (SCE) project funded by IARPA to develop add-ons to our EDA tools and to develop a full flow for SCE circuit design. This enhancement effort touches practically all our EDA tools from TCAD through device Spice modeling, simulation, extraction, verification, library development, place and route, timing, and verification. As we progress in executing this project, we see many parallels to the kinds of challenges that we and the EDA and CMOS industry faced 30 years ago especially in Analog and Mixed Signal designs. For a future consideration of QC and links of QC to Classical computing, we will look for as much reuse as possible, but we may find the need to create and insert completely new tools as well. In this talk I will cover our CMOS EDA tools and flow, I will relate our experience in developing EDA tools for SCE and cold CMOS as well as our EDA efforts in integrating CMOS and photonics. I will discuss what we see as the challenges faced by the EDA and the design community in realizing an automated eco system that integrates QC, cold CMOS, CMOS and photonics. A Parallel Implementation of FANO using OpenMP and MPI Plamen Krastev (Lincoln Laboratory MIT)*; Michael Chrisp (Lincoln Laboratory, MIT); Albert Reuther (MIT Lincoln Laboratory); Chansup Byun (MIT Lincoln Laboratory) We present a parallel implementation of the Fast Accurate NURBS Optimization (FANO) program using OpenMP and MPI. The software is used for designing imaging freeform optical systems comprised of NURBS surfaces. An important step in the design process is the optimization of the shape and position of the optical surfaces within the optical system. FANO uses the Levenberg- Marquardt (LM) algorithm for minimization of the merit function. The parallelization of the code is achieved without modifying readily available commercial or open source implementations of the LM algorithm. Instead, MPI instructions are being used to distribute the computation of the Jacobian over multiple nodes, each of which performs the computationally intensive task of raytracing. The results from the raytracing are collected on the master and used for calculating the values of the variable parameters for the next iteration. Speed increases of ~100x and more are possible when running on the cluster of the MIT Lincoln Laboratory Super Computing Center (LLSC). Towards Energy-Proportional Anomaly Detection in the Smart Grid Spencer Drakontaidis (United States Military Academy); Michael Stanchi (United States Military Academy); Gabriel Glazer (United States Military Academy); Jason Hussey (United States Military Academy); Aaron St. Leger (United States Military Academy); Suzanne J Matthews (United States Military Academy)* Phasor Measurement Unit (PMU) deployment is increasing throughout national power grids in an effort to improve operator situational awareness of rapid oscillations and other fluctuations that could indicate a future disruption of service. However, the quantity of data produced by PMU deployment makes real-time analysis extremely challenging, causing grid designers to invest in large centralized analysis systems that consume significant amounts of energy. In this paper, we argue for a more energy- proportional approach to anomaly detection, and advocate for a decentralized, heterogeneous architecture to keep computational load at acceptable levels for lower-energy chipsets. Our results demonstrate how anomalies can be detected at real-time speeds using single board computers for on-line analysis, and in minutes when running off-line historical analysis using a multicore server running Apache Spark. Soft-Core, Multiple-Lane, FPGA-based ADCs for a Liquid Helium Environment Zikun Xiang (University of Science and Technology of China); Tianqi Wang (University of Science and Technology of China)*; tong geng (Boston University); Tian Xiang (University of Science and Technology of China); Xi Jin (University of Science and Technology of China); Martin Herbordt (Boston University) Collecting analog signals and system control are fundamental tasks in many applications as are found in automotive, communication, and sensor network domains. In many of those applications, latency is critical and FPGA-based systems an attractive alternative. Traditionally, external ADC (analog to digital converter) chips are used for analog to digital signal conversion and transfer of the digital signals to FPGA. In large-scale quantum system experiments, the implementation of this classic control infrastructure is a challenge. In particular, the FPGA-based control system must work in a liquid helium environment. Also, to improve the system's reliability, we need to integrate multiple lanes of soft core ADC into the FPGA. In this paper, we propose a method of building high-speed ADCs with time to digital converters (TDCs). The experimental results show that the ADC can achieve a sampling rate of 100Msa/s with a 6 bits resolution for signals ranging from 0 to 3 V. In our design the ADC uses primarily the ISERDES logic of Xilinx FPGA plus a small amount of CLBs. Thus our design can integrate 24 lanes of soft-core ADCs into a Xilinx XC7A100t-2csg324 FPGA.

Thursday, September 27. 2018

New Frontiers & Quantum Computing 10:20-12:00 in Eden Vale A1/A2 Chair: J.Cortese / MIT

Designed and maintained by Ballos Associates Web Design

IEEE Nondiscrimination Policy

Comprehensive EDA Tools Addressing the Intersection of Classical and Quantum Computing Jamil Kawa (Synopsys., Inc)*; Antun Domic (Synopsys); Robert Freeman (Synopsys); Kishore SInghal (Synopsys) The technical challenges and the high economic cost associated with scaling CMOS beyond 5nm heated the debate about what technology will lead the post-CMOS era. Will Josephson Junction (JJ) based superconducting electronics (SCE) operated at cryogenic temperatures be the natural candidate? Is SCE embarking on its own Moore’s law, promising to reach a level of VLSI realization? How about Quantum Computing (QC) -- Can QC play a leading role in advancing the computational needs currently fulfilled by the most powerful supercomputers? Is QC capable of efficiently executing algorithms handled by classical computers? At the same time this debate was taking place, the news from the QC front has been very encouraging. Chips with 49, 50, and 72 qubits have been implemented. The talk of “quantum supremacy” is creating heated discussion and excitement. New technologies for QC with longer de-coherence times and improved error rates are equally encouraging. However, a reality is emerging that an efficient QC system is one that encompasses tight interaction with CMOS based classical computers and memory stacks. Cold CMOS will likely have a significant role to play. New high speed, high throughput, and energy-efficient communication links will be required to handle the interactions. Optical communications links has a role to play there. Our expectation of further advancement in this area will require the design community to leverage mature, robust, and tightly integrated Engineering Design Automation (EDA) software tools for the realization of every component of such a system, be it CMOS, JJ based SCE or QC, as well as the integrated SoC and System architecture and environment. We at Synopsys have maintained leadership in EDA tools for CMOS with a set of tools that encompass the whole flow, from device and process simulation (TCAD) through full system verification. Synopsys EDA tools are stand-alone modules that are designed to interact seamlessly with each other and with industry standards, culminating in the best point tools to enable a comprehensive flow for system realization. It is through this modularization that some tools in the CMOS flow can be immediately leveraged with some enhancement to be utilized for such a comprehensive flow. Others must be created. It is also no secret that we at Synopsys are involved in a 5- year Superconducting Electronics (SCE) project funded by IARPA to develop add-ons to our EDA tools and to develop a full flow for SCE circuit design. This enhancement effort touches practically all our EDA tools from TCAD through device Spice modeling, simulation, extraction, verification, library development, place and route, timing, and verification. As we progress in executing this project, we see many parallels to the kinds of challenges that we and the EDA and CMOS industry faced 30 years ago especially in Analog and Mixed Signal designs. For a future consideration of QC and links of QC to Classical computing, we will look for as much reuse as possible, but we may find the need to create and insert completely new tools as well. In this talk I will cover our CMOS EDA tools and flow, I will relate our experience in developing EDA tools for SCE and cold CMOS as well as our EDA efforts in integrating CMOS and photonics. I will discuss what we see as the challenges faced by the EDA and the design community in realizing an automated eco system that integrates QC, cold CMOS, CMOS and photonics. A Parallel Implementation of FANO using OpenMP and MPI Plamen Krastev (Lincoln Laboratory MIT)*; Michael Chrisp (Lincoln Laboratory, MIT); Albert Reuther (MIT Lincoln Laboratory); Chansup Byun (MIT Lincoln Laboratory) We present a parallel implementation of the Fast Accurate NURBS Optimization (FANO) program using OpenMP and MPI. The software is used for designing imaging freeform optical systems comprised of NURBS surfaces. An important step in the design process is the optimization of the shape and position of the optical surfaces within the optical system. FANO uses the Levenberg- Marquardt (LM) algorithm for minimization of the merit function. The parallelization of the code is achieved without modifying readily available commercial or open source implementations of the LM algorithm. Instead, MPI instructions are being used to distribute the computation of the Jacobian over multiple nodes, each of which performs the computationally intensive task of raytracing. The results from the raytracing are collected on the master and used for calculating the values of the variable parameters for the next iteration. Speed increases of ~100x and more are possible when running on the cluster of the MIT Lincoln Laboratory Super Computing Center (LLSC). Towards Energy-Proportional Anomaly Detection in the Smart Grid Spencer Drakontaidis (United States Military Academy); Michael Stanchi (United States Military Academy); Gabriel Glazer (United States Military Academy); Jason Hussey (United States Military Academy); Aaron St. Leger (United States Military Academy); Suzanne J Matthews (United States Military Academy)* Phasor Measurement Unit (PMU) deployment is increasing throughout national power grids in an effort to improve operator situational awareness of rapid oscillations and other fluctuations that could indicate a future disruption of service. However, the quantity of data produced by PMU deployment makes real-time analysis extremely challenging, causing grid designers to invest in large centralized analysis systems that consume significant amounts of energy. In this paper, we argue for a more energy-proportional approach to anomaly detection, and advocate for a decentralized, heterogeneous architecture to keep computational load at acceptable levels for lower-energy chipsets. Our results demonstrate how anomalies can be detected at real-time speeds using single board computers for on-line analysis, and in minutes when running off-line historical analysis using a multicore server running Apache Spark. Soft-Core, Multiple-Lane, FPGA-based ADCs for a Liquid Helium Environment Zikun Xiang (University of Science and Technology of China); Tianqi Wang (University of Science and Technology of China)*; tong geng (Boston University); Tian Xiang (University of Science and Technology of China); Xi Jin (University of Science and Technology of China); Martin Herbordt (Boston University) Collecting analog signals and system control are fundamental tasks in many applications as are found in automotive, communication, and sensor network domains. In many of those applications, latency is critical and FPGA-based systems an attractive alternative. Traditionally, external ADC (analog to digital converter) chips are used for analog to digital signal conversion and transfer of the digital signals to FPGA. In large-scale quantum system experiments, the implementation of this classic control infrastructure is a challenge. In particular, the FPGA-based control system must work in a liquid helium environment. Also, to improve the system's reliability, we need to integrate multiple lanes of soft core ADC into the FPGA. In this paper, we propose a method of building high-speed ADCs with time to digital converters (TDCs). The experimental results show that the ADC can achieve a sampling rate of 100Msa/s with a 6 bits resolution for signals ranging from 0 to 3 V. In our design the ADC uses primarily the ISERDES logic of Xilinx FPGA plus a small amount of CLBs. Thus our design can integrate 24 lanes of soft- core ADCs into a Xilinx XC7A100t-2csg324 FPGA.

Thursday, September 27. 2018

New Frontiers & Quantum Computing 10:20-12:00 in Eden Vale A1/A2 Chair: J.Cortese / MIT

HPEC 2018 25 - 27 September 2018 Westin Hotel, Waltham, MA USA