ASP-DAC 2006 Archives

2B-1

Title	SAVS: A Self-Adaptive Variable Supply-Voltage Technique for Process -Tolerant and Power-Efficient Multi-issue Superscalar Processor Design
Author	Hai Li (Qualcomm Inc., United States), Yiran Chen (Synopsys Inc., United States), *Kaushik Roy, Cheng-Kok Koh (Purdue University, United States)
Abstract	Technology scaling and sub-wavelength optical lithography is associated with significant process variations. We propose a self-adaptive variable supply-voltage scaling (SAVS) technique for multi-issue out-of-order pipeline to improve parametric yield with minimal power dissipation. Our error-correction circuitry and recovery mechanism allow the proposed fault-tolerant pipeline to work at a dynamically tuned supply voltage with a very low error rate. Experiments on an 8-issue, out-of-order superscalar processor show that SAVS can achieve 93.3% yield with 8.66% total power reduction under a scaled VDD, compared to the same yield achieved by conventional microarchitecture. The increased execution time is negligible (0.014%).
Slides (pdf file)	2B-1

2B-2

Title	The Design and Implementation of a Low-Latency On-Chip Network
Author	*Robert Mullins, Andrew West, Simon Moore (University of Cambridge, Great Britain)
Abstract	Many of the issues that will be faced by the designers of multi-billion transistor chips may be alleviated by the presence of a flexible global communication infrastructure. In the short term, such a network will provide scalable chip-wide communication and ease the complexity of handling multi-cycle communications. In the long term, the network will become a primary tool for optimising power and data transfers and for scheduling computations. This paper details the design and implementation of a low-latency on-chip network. The network's speculative routers are in the best case able to route flits in a single clock cycle, helping to minimise on-chip communication latencies and maximise the effectiveness of buffering resources. Results from our 180nm test chip demonstrate an inter-router data transfer rate in excess of 16Gbit/s for each link. In the best case each router hop adds just 1 clock cycle to the final communication latency.
Slides (pdf file)	2B-2

2B-3

Title	A Near Optimal Deblocking Filter for H.264 Advanced Video Coding
Author	Shen-Yu Shih, Cheng-Ru Chang, *Youn-Long Lin (National Tsing Hua University, Taiwan)
Abstract	We propose a near optimal hardware architecture for deblocking filter in H.264/MPEG-4 AVC. We propose a novel filtering order and data reuse strategy that results in significant saving in filtering time, local memory usage, and memory traffic. Every 16x16 macroblock requires 192 filtering operations. After a few initialization cycles, our 5-stage pipelined architecture is able to perform one filtering operation per cycle. Compared with some state-of-the-art designs, our architecture delivers the fastest level of performance while using much less gate count and memory. We have implemented and integrated the proposed deblocking filter into an H.264 main profile decoder and verified it with an FPGA prototype.
Slides (pdf file)	2B-3

2B-4

Title	Image Segmentation and Pattern Matching Based FPGA/ASIC Implementation Architecture of Real-Time Object Tracking
Author	*Kousuke Yamaoka, Takashi Morimoto, Hidekazu Adachi, Tetsushi Koide, Hans Juergen Mattausch (Research Center for Nanodevices and Systems, Hiroshima University, Japan)
Abstract	A novel algorithm for object tracking in video pictures, based on image segmentation and pattern matching, as well as its FPGA/ASIC implementation architecture are presented. With image segmentation, we can detect all objects in the images no matter whether they are moving or not. Using image segmentation results of successive frames, we exploit pattern matching in a simple object feature space for tracking of objects. The proposed algorithm can be applied to multiple moving and still objects even in the case of a moving camera. The FPGA/ASIC implementation architecture is verified to enable real time tracking of up to 220 objects, when realized with modern FPGA hardware.
Slides (pdf file)	No slides

2B-5

Title	Prefetching-Aware Cache Line Turnoff for Saving Leakage Energy
Author	*Ismail Kadayif (Canakkale Onsekiz Mart University, Turkey), Mahmut Kandemir, Feihui Li (Pennsylvania State University, United States)
Abstract	While numerous prior studies focused on performance and energy optimizations for caches, their interactions have received much less attention. This paper studies this interaction and demonstrates how performance and energy optimizations can affect each other. More importantly, we propose three optimization schemes that turn off cache lines in a prefetching-sensitive manner. These schemes treat prefetched cache lines differently from the lines brought to the cache in a normal way (i.e., through a load operation) in turning off the cache lines. Our experiments with applications from the SPEC2000 suite indicate that the proposed approaches save significant leakage energy with very small degradation on performance.
Slides (pdf file)	2B-5