2018 IEEE High Performance
Extreme Computing Conference
(HPEC ‘18)
Twenty-second Annual HPEC Conference
25 - 27 September 2018
Westin Hotel, Waltham, MA USA
Too Many Secants: A Hierarchical Approach to Secant-based Dimensionality Reduction on Large Data Sets
Henry Kvinge (Colorado State University)*; Elin R Farnell (Colorado State University); Michael Kirby (Colorado State
University); Chris Peterson (Colorado State University)
A fundamental question in many data analysis settings is the problem of discerning the ``natural'' dimension of a data set. That
is, when a data set is drawn from a manifold (possibly with noise), a meaningful aspect of the data is the dimension of that
manifold. Various approaches exist for estimating this dimension, such as the method of Secant-Avoidance Projection (SAP).
Intuitively, the SAP algorithm seeks to determine a projection which best preserves the lengths of all secants between points in
a data set; by applying the algorithm to find the best projections to vector spaces of various dimensions, one may infer the
dimension of the manifold of origination. That is, one may learn the dimension at which it is possible to construct a diffeomorphic
copy of the data in a lower-dimensional Euclidean space. Using Whitney's embedding theorem, we can relate this information to
the natural dimension of the data. A drawback of the SAP algorithm is that a data set with $T$ points has $O(T^2)$ secants,
making the computation and storage of all secants infeasible for very large data sets. In this paper, we propose a novel
algorithm that generalizes the SAP algorithm with an emphasis on addressing this issue. That is, we propose a hierarchical
secant-based dimensionality-reduction method, which can be employed for data sets where explicitly calculating all secants is
not feasible.
Regression Based WCET Analysis For Sampling Based Motion Planning
Hao Wen (Virginia Commonwealth University); Wei Zhang (Virginia Commonwealth University)*
Motion planning is one of the most critical tasks in a self-driving vehicle system. Sampling based motion planning earns
popularity due to its capability of providing quick and effective answers to planning queries. Since motion planning is a safety
critical piece of software, it is important to know the Worst-Case Execution Time (WCET) of this task in the system. Traditional
static WCET analysis techniques do not consider the dynamic behavior of the interaction between the sampling algorithm and
the environment. Measurement-based WCET estimation focuses on an individual task, and therefore has no prediction
capability when the start and goal positions change. We propose regression models to predict safe upper bound of WCET for
the Rapidly-Exploring Random Tree (RRT), a widely used sampling based motion planning algorithm.
A Novel 1D-Convolution Accelerator for Low-Power Real-time CNN Processing on the Edge
Justin Sanchez (UNCC)*; Nasim Soltani (The University of North Carolina at Charlotte); Ramachandra VIkas Chamarthi (The
University of North Carolina at Charlotte); Adarsh Sawant (The University of North Carolina at Charlotte); Hamed Tabkhi (The
University of North Carolina at Charlotte)
With the rise of deep learning, the demand for real-time edge intelligence is greater than ever. Current algorithm and hardware
realizations often focus on the cloud paradigm and maintain the assumption that the entire frame’s data is available in large
batches. As a result, obtaining real-time AI inference at the edge has been a tough goal due to tight-latency awareness as well
as streaming nature of the data. There is an inherent need for novel architectures that can realize latency-aware agile deep
learning algorithms at the edge. This paper introduces a novel joint algorithm architecture approach to enable real-time low-
power Convolutional Neural Network (CNN) processing on edge devices. The core of the proposed approach is utilizing 1D
dimensional convolution with an architecture that can truly benefit from the algorithm optimization. On the algorithm side, we
present a novel training and inference based on 1D convolution. On the architecture side, we present a novel data flow
architecture with the capability of performing on-the-fly 1D convolution over the pixel stream. Our results on Xilinx Zynq-7000
FPGA for SqueezeNet demonstrates only 2% lost in accuracy while maintaining real-time processing of 60 frames per second
with only 1.73W power consumption. The Dynamic power consumption is 7.3X lower than regular 2D convolution CNN for
performing the same frame rate, and 4.3X less than Nvidia Jetson TX2 total power, performing only 30 frame per second.
Energy-Efficient DNN Computing on GPUs Through Register File Management
Xin Wang (Virginia Commonwealth University); Wei Zhang (Virginia Commonwealth University)*
The Deep Neural Networks (DNNs) are state-of-theart approaches to draw knowledge from a huge amount of data with
remarkable accuracies. Currently, the size of the data in the real world increases from Gigabytes to Terabytes and even
Petabytes, leading to high computational complexity for training DNNs, which can range from days to weeks. Current DNNs that
involve a mass of matrix multiplications and other similar operations can be well paralleled and thus accelerated by GPUs.
However, energy consumption is still a big concern for DNN, which can limit the scalability of performance increase. In this
paper, instead of pruning the complexity of DNN models, we propose to utilize the specific micro-architectures of GPUs and the
DNN application characteristics to improve energy efficiency. A huge register file (RF) is often necessary for modern GPUs to
hold contexts of thousands of concurrent threads. Consequently, the GPU RF which is constructed with high leakage transistors
contributes significantly to GPU’s total energy consumption and thus smart RF management strategies can help GPUs to
reduce energy consumption when scaling up the hardware resources for enhanced performance. First, based on the
observation that there are a large fraction of narrow-width operands in DNNs, we propose to use a GPU register packing
scheme to use the RF more efficiently. Second, we introduce the drowsy RF with a simple policy to decrease the leakage
energy consumption. Finally, we attempt to further improve RF energy efficiency by taking advantage of the cooperation of
drowsy RF and register packing techniques. We evaluate the effectiveness of our GPU RF management schemes on energy
reduction using AlexNet which is a state-of-the-art DNN model. The experimental results show that the combination of the
register packing and drowsy techniques achieves the most total GPU energy consumption reduction, up to 11% and 10.3% on
average.
Thursday, September 27, 2018
Machine Learning 2
1:00-2:40 in Eden Vale A1/2
Chair: Sadasivan Shankar / Harvard