(Back to Session Schedule)

The 11th Asia and South Pacific Design Automation Conference

Wednesday January 25, 2006

Session 2B Application Examples with Leading Edge Design Methodology (13:30 - 15:35)
Location: Room 413
Chair(s): In-Cheol Park (KAIST, Republic of Korea), Hideharu Amano (Keio University, Japan)

2B-1 (Time: 13:30 - 13:55)
TitleSAVS: A Self-Adaptive Variable Supply-Voltage Technique for Process -Tolerant and Power-Efficient Multi-issue Superscalar Processor Design
AuthorHai Li (Qualcomm Inc., United States), Yiran Chen (Synopsys Inc., United States), *Kaushik Roy, Cheng-Kok Koh (Purdue University, United States)
Pagepp. 158 - 163
KeywordVariable Supply-Voltage , Power Efficient
AbstractTechnology scaling and sub-wavelength optical lithography is associated with significant process variations. We propose a self-adaptive variable supply-voltage scaling (SAVS) technique for multi-issue out-of-order pipeline to improve parametric yield with minimal power dissipation. Our error-correction circuitry and recovery mechanism allow the proposed fault-tolerant pipeline to work at a dynamically tuned supply voltage with a very low error rate. Experiments on an 8-issue, out-of-order superscalar processor show that SAVS can achieve 93.3% yield with 8.66% total power reduction under a scaled VDD, compared to the same yield achieved by conventional microarchitecture. The increased execution time is negligible (0.014%).

2B-2 (Time: 13:55 - 14:20)
TitleThe Design and Implementation of a Low-Latency On-Chip Network
Author*Robert Mullins, Andrew West, Simon Moore (University of Cambridge, Great Britain)
Pagepp. 164 - 169
Keywordon-chip network
AbstractMany of the issues that will be faced by the designers of multi-billion transistor chips may be alleviated by the presence of a flexible global communication infrastructure. In the short term, such a network will provide scalable chip-wide communication and ease the complexity of handling multi-cycle communications. In the long term, the network will become a primary tool for optimising power and data transfers and for scheduling computations. This paper details the design and implementation of a low-latency on-chip network. The network's speculative routers are in the best case able to route flits in a single clock cycle, helping to minimise on-chip communication latencies and maximise the effectiveness of buffering resources. Results from our 180nm test chip demonstrate an inter-router data transfer rate in excess of 16Gbit/s for each link. In the best case each router hop adds just 1 clock cycle to the final communication latency.

2B-3 (Time: 14:20 - 14:45)
TitleA Near Optimal Deblocking Filter for H.264 Advanced Video Coding
AuthorShen-Yu Shih, Cheng-Ru Chang, *Youn-Long Lin (National Tsing Hua University, Taiwan)
Pagepp. 170 - 175
KeywordDeblocking Filter, H.264, MPEG-4 AVC
AbstractWe propose a near optimal hardware architecture for deblocking filter in H.264/MPEG-4 AVC. We propose a novel filtering order and data reuse strategy that results in significant saving in filtering time, local memory usage, and memory traffic. Every 16x16 macroblock requires 192 filtering operations. After a few initialization cycles, our 5-stage pipelined architecture is able to perform one filtering operation per cycle. Compared with some state-of-the-art designs, our architecture delivers the fastest level of performance while using much less gate count and memory. We have implemented and integrated the proposed deblocking filter into an H.264 main profile decoder and verified it with an FPGA prototype.

2B-4 (Time: 14:45 - 15:10)
TitleImage Segmentation and Pattern Matching Based FPGA/ASIC Implementation Architecture of Real-Time Object Tracking
Author*Kousuke Yamaoka, Takashi Morimoto, Hidekazu Adachi, Tetsushi Koide, Hans Juergen Mattausch (Research Center for Nanodevices and Systems, Hiroshima University, Japan)
Pagepp. 176 - 181
KeywordObject Tracking, Real-Time, Image Segmentation, Pattern Matching, Pipeline Processing
AbstractA novel algorithm for object tracking in video pictures, based on image segmentation and pattern matching, as well as its FPGA/ASIC implementation architecture are presented. With image segmentation, we can detect all objects in the images no matter whether they are moving or not. Using image segmentation results of successive frames, we exploit pattern matching in a simple object feature space for tracking of objects. The proposed algorithm can be applied to multiple moving and still objects even in the case of a moving camera. The FPGA/ASIC implementation architecture is verified to enable real time tracking of up to 220 objects, when realized with modern FPGA hardware.

2B-5 (Time: 15:10 - 15:35)
TitlePrefetching-Aware Cache Line Turnoff for Saving Leakage Energy
Author*Ismail Kadayif (Canakkale Onsekiz Mart University, Turkey), Mahmut Kandemir, Feihui Li (Pennsylvania State University, United States)
Pagepp. 182 - 187
KeywordLeakage energy, Cache, Prefetching, Cachline line turnoff, Dead block
AbstractWhile numerous prior studies focused on performance and energy optimizations for caches, their interactions have received much less attention. This paper studies this interaction and demonstrates how performance and energy optimizations can affect each other. More importantly, we propose three optimization schemes that turn off cache lines in a prefetching-sensitive manner. These schemes treat prefetched cache lines differently from the lines brought to the cache in a normal way (i.e., through a load operation) in turning off the cache lines. Our experiments with applications from the SPEC2000 suite indicate that the proposed approaches save significant leakage energy with very small degradation on performance.