2015 IEEE High Performance
Extreme Computing Conference
(HPEC ‘15)
Nineteenth Annual HPEC Conference
15 - 17 September 2015
Westin Hotel, Waltham, MA USA
Advanced ASIC & FPGA Technologies
10:20-12:00 in Eden Vale A1-A2
Chair: David Cousins / BBN
[Best Student Paper Finalist]
Hardware-Efficient Compressed Sensing Encoder
Designs for ECG
Jiayi Sheng, Chen Yang, Martin C. Herbordt, Boston
University
Implanted sensors, as might be used with wireless
body sensor networks, must have minimal size and
power consumption. In this work we examine digital-
based compressed sensing encoders for WBSN-
enable ECG monitoring, an area that has received
much recent attention. We have two major
contributions. The first is using a random Binary
Toeplitz matrix rather than Bernoulli. The second is
reducing the number of accumulators thereby trading
off space for operating frequency. Compared with
previous implementations, our new design
consumes 1-to-2 orders of magnitude less area and
power while still meeting timing constraints and
achieving comparable recovery quality.
Coarse Grain Reconfigurable ASIC through
Multiplexer Based Switches
Karen Gettings, Marc Burke, Jeremy Muldavin, Michael
Vai, MIT Lincoln Laboratory
We present an ASIC architecture with coarse-grain
reconfigurability that uses accelerators to improve
performance over fine-grain reconfigurable
architectures. A reconfigurable FFT ASIC was built
as a proof of concept, and it successfully
demonstrated valid switch operation for
reconfiguration.
Performance and Productivity Evaluation of Hybrid-
Threading HLS versus HDLs
Gongyu Wang, Herman Lam, Alan George, University of
Florida; Glen Edwards, Convey Computer Corporation
FPGA-based reconfigurable computing is finding its
way into a wide range of application areas in which
high performance and low-power consumption are
paramount. However, FPGA-application
development using hardware-description languages
(HDLs) faces many productivity challenges that limit
its wide adoption, including a steep learning curve
and lengthy compilation. High-level synthesis (HLS)
languages and tools aim to overcome these
challenges by providing familiar high-level languages
and tools for FPGA-application development. In
using HLS, however, an important consideration is
the cost-benefit tradeoff for performance and
productivity. Hybrid-threading (HT) is a new open-
source HLS toolset from Convey Computer, Corp.
that features a programming language based on
C/C++ and a set of tools for efficient compilation,
verification, and implementation. In this paper, we
present a performance and productivity tradeoff
study of HT HLS versus HDLs using three RC-
amenable kernels, each chosen for their distinctive
computational requirements. Our results show that
for all three kernels, HT achieved over 80%
performance for a fraction of development time, in
comparison to corresponding optimized HDL-based
designs.
Aparapi-UCores: A High Level Programming
Framework for Unconventional Cores
Oren Segal, Philip Colangelo, Nasibeh Nasiri, Zhuo Qian,
Martin Margala University of Massachusetts Lowell
Combining several types of devices and
architectures is at the heart of heterogeneous
computing's power efficiency advantage, but the
strength of heterogeneous systems is also their
Achilles heel, i.e. the diversity of the devices and
ecosystems needed to maintain them present major
technological challenges. Some of the biggest
challenges are in the realm of system programing.
We believe that for heterogeneous systems
computing to become a mainstream system design
choice, high level and standard system design flows
need to be adopted in order to achieve transparency
when dealing with diverse devices and architectures.
In this paper we present an open source high level
framework and design flow that allows working with
any type of device that supports OpenCL. In addition
we test our design flow and framework on an N-body
simulation across multiple device types and show
how such high level framework and heterogeneous
system design can deliver a more power efficient
solution when compared to a single general purpose
device and dual CPU+GPU device type approach.
High Performance User Space Sockets on Low
Power System on a Chip Platforms
Catherine H. Crawford, Piotr Padkowski, Tomasz Baranski,
Angela Czubak, ukasz Raszka IBM Research
With the introduction of low power System on a Chip
(SoC) processor architectures in enterprise server
configurations, there is a growing need to develop
the software that will support scale-out, data
intensive cloud applications that are deployed in
data centers today. In this paper, we describe the
design and implementation of a low latency user
space fully compliant TCP/IP socket stack on a low
power System on a Chip (SoC) architecture and
demonstrate that this library can become the basis
for “Big Data” applications that require both high
throughput and low latency capabilities all on a
power optimized system platform. For our work, we
are specifically targeting cloud applications that are
developed on runtimes which are seeing great
growth in programmer communities and enterprise
deployment as well as for which the I/O bottlenecks
outweigh the compute requirements, e.g.
memcached. On low-power embedded-class SoC
servers, these I/O bottlenecks can be prohibitively
expensive for performance and scaling requirements
of such applications, even when the CPU efficiency
and memory bandwidth are adequate. Our approach
removes this bottleneck by leveraging available SoC
integrated Network Interface Cards (NICs) as well as
user space communication – thereby improving
pathlength to data as well as preserving CPU cycles
from context switching. Our experiments show that
we can achieve sub 5 μsec ping-pong latency for 8B
packets, and also provide substantive improvement
to the memslap benchmark not just when compared
to memcached running on the T4240 with the kernel
stack (3.5 times better for 16B SETs) but also when
compared to a standard x86 64 server with
ConnectX 10GbE adapters when power based
metrics are used (close to a factor of 2 improvement
with power normalized metrics).
Wednesday September 16