ASP-DAC 2014 Technical Program

The 19th Asia and South Pacific Design Automation Conference

Session 4C Emerging Applications
Time: 10:10 - 12:15 Wednesday, January 22, 2014
Location: Room 303
Chairs: Yu Wang (Tsinghua University, China), Dajiang Zhou (Waseda University, Japan)

4C-1 (Time: 10:10 - 10:35)

Title	STD-TLB: A STT-RAM-Based Dynamically-Configurable Translation Lookaside Buffer for GPU Architectures
Author	Xiaoxiao Liu, Yong Li, Yaojun Zhang, Alex K. Jones, *Yiran Chen (University of Pittsburgh, U.S.A.)
Page	pp. 355 - 360
Keyword	TLB, GPU, STT-RAM
Abstract	Translation lookaside buffer (TLB) was recently introduced into modern graphics processing unit (GPU) architectures to support virtual memory addressing. Compared to CPUs, the performance of GPUs is more sensitive to the capacity of TLBs because of heavier memory accesses. However, large SRAM cell area greatly limits the implementable capacity of conventional SRAM-based TLBs. In this work, we propose using STT-RAM to construct TLBs in light of the unique memory access pattern in GPUs, i.e., infrequent data updates. STT-RAM TLB can replace its same-area SRAM counterpart with greater capacity, similar read performance and lower energy consumption. As an optimization of STT-RAM TLB, we further propose a STT-RAM-based dynamically-configurable TLB (STD-TLB) by leveraging differential sensing technique. STD-TLB can switch between high-capacity mode and high-performance mode on-the-fly based on real-time application needs. Our experiments show that compared to SRAM TLB, standard STT-RAM TLB improves the performance and energy delay product of GPU address translation by 32% and 75%, respectively, while STD-TLB achieves additional 15% and 13% improvements over standard STT-RAM TLB.
Slides

4C-2 (Time: 10:35 - 11:00)

Title	Training Itself: Mixed-Signal Training Acceleration for Memristor-Based Neural Network
Author	*Boxun Li, Yuzhi Wang, Yu Wang (Tsinghua University, China), Yiran Chen (University of Pittsburgh, U.S.A.), Huazhong Yang (Tsinghua University, China)
Page	pp. 361 - 366
Keyword	Neural Network, Training, Memristor
Abstract	The artificial neural network (ANN) is among the most widely used methods in data processing applications. The memristor-based neural network further demonstrates a power efficient hardware realization of ANN. Training phase is the critical operation of memristor-based neural network. However, the traditional training method for memristor-based neural network is time consuming and energy inefficient. Users have to first work out the parameters of memristors through digital computing systems and then tune the memristor to the corresponding state. In this work, we introduce a mixed-signal training acceleration framework, which realizes the self-training of memristor-based neural network. We first modify the original stochastic gradient descent algorithm by approximating calculations and designing an alternative computing method. We then propose a mixed-signal acceleration architecture for the modified training algorithm by equipping the original memristor-based neural network architecture with the copy crossbar technique, weight update units, sign calculation units and other assistant units. The experiment on the MNIST database demonstrates that our mixed-signal acceleration is 3 orders of magnitude faster and 4 orders of magnitude more energy efficient than the CPU implementation counterpart at the cost of a slight decrease of the recognition accuracy (<5%).
Slides

4C-3 (Time: 11:00 - 11:25)

Title	HDTV1080p HEVC Intra Encoder with Source Texture Based CU/PU Mode Pre-decision
Author	*Jia Zhu, Zhenyu Liu, Dongsheng Wang (Tsinghua University, China), Qingrui Han, Yang Song (Huawei Technologies Co., Ltd., China)
Page	pp. 367 - 372
Keyword	HEVC, Intra, VLSI, RDO, CU-Depth
Abstract	HEVC doubles the coding efficiency with more than 4x coding complexity as compared to H.264/AVC. To alleviate the burden of Intra encoder, we estimate the RD-cost from the source image textures, and dynamically select two promising CU/PU mode candidates to execute exhaustive RDO processing. As integrated in our hardwired encoder, the averaged 61.7% computation complexity was saved with 4.53% rate augment. With TSMC 90nm technology, the real-time encoder for HDTV1080p at 44fps is implemented with 2269k-gate at 357MHz operating frequency.
Slides

4C-4 (Time: 11:25 - 11:50)

Title	Fast Large-Scale Optimal Power Flow Analysis for Smart Grid through Network Reduction
Author	*Yi Liang, Deming Chen (University of Illinois at Urbana-Champaign, U.S.A.)
Page	pp. 373 - 378
Keyword	smart grid, power system, optimal power flow, network reduction, congestion
Abstract	Optimal power flow (OPF) plays an important role in power system operation. The emerging smart grid aims to create an automated energy delivery system that enables two-way flows of electricity and information. As a result, it will be desirable if OPF can be solved in real time in order to allow the implementation of the time-sensitive applications, such as real-time pricing. In this paper, we present a novel algorithm to accelerate the computation of alternating current optimal power flow (ACOPF) through power system network reduction (NR). We formulate the OPF problem based on an equivalent reduced system and interpret its solution and the detailed optimal dispatch for the original power system is obtained afterwards using a distributed algorithm. Our results are compared with two widely used methods: full ACOPF and the linearized OPF with DC power flow and lossless network assumption, the so-called DCOPF. Experimental results show that for a large power system, our method achieves 7.01X speedup over ACOPF with only 1.72% error, and is 75.7% more accurate than the DCOPF solution. Our method is even 10% faster than DCOPF. Our experimental results demonstrate the unique strength of the proposed technique for fast, scalable, and accurate OPF computation. We also show that our method is effective for smaller benchmarks.

4C-5 (Time: 11:50 - 12:15)

Title	Storage-Less and Converter-Less Maximum Power Point Tracking of Photovoltaic Cells for a Nonvolatile Microprocessor
Author	*Cong Wang (Tsinghua University, China), Naehyuck Chang, Younghyun Kim, Sangyoung Park (Seoul National University, Republic of Korea), Yongpan Liu (Tsinghua University, China), Hyung Gyu Lee (Daegu University, Republic of Korea), Rong Luo, Huazhong Yang (Tsinghua University, China)
Page	pp. 379 - 384
Keyword	Storage-less, Converter-less, MPPT, Nonvolatile Processor
Abstract	This paper pioneers the maximum power point tracking (MPPT) of photovoltaic (PV) cells that directly supply power to a microprocessor without an energy storage element (a battery or a large-size capacitor) nor power converters. The maximum power point tracking is conventionally performed by an MPPT charger that stores in the energy storage element, and a voltage regulator (typically a DC-DC converter) produces a proper voltage level for the microprocessor. The energy storage element is an energy buffer and makes it possible to perform MPPT of the PV cells and power management of the microprocessor independently. However, the energy storage element, MPPT charger and DC-DC converter cause seriously limited lifetime (when a typical battery is adopted), significant energy loss (typically over 20%), increased weight/volume and high cost, etc. The proposed method enables extremely fine-grain dynamic power management (DPM) in every a few hundred microseconds and performs the MPPT without using an MPPT charger and a DC-DC converter as well as an energy storage element. We achieve 84.5% of energy harvesting efficiency using the proposed setup with huge reduction in cost, weight and volume, and extended lifetime, which is not even numerically comparable with conventional MPPT methods.
Slides