(Back to Session Schedule)

The 16th Asia and South Pacific Design Automation Conference

Session 5B  Resilient and Thermal-Aware NoC Design
Time: 13:40 - 15:40 Thursday, January 27, 2011
Location: Room 413
Chairs: Michihiro Koibuchi (National Institute of Informatics, Japan), Pao-Ann Hsiung (National Chung Cheng University, Taiwan)

5B-1 (Time: 13:40 - 14:10)
TitleOn the Design and Analysis of Fault Tolerant NoC Architecture Using Spare Routers
Author*Yung-Chang Chang (Industrial Technology Research Institute, Taiwan), Ching-Te Chiu (National Tsing Hua University, Taiwan), Shih-Yin Lin, Chung-Kai Liu (Industrial Technology Research Institute, Taiwan)
Pagepp. 431 - 436
Keywordfault tolerance, network-on-chip, router-level redundancy
AbstractThe aggressive advent in VLSI manufacturing technology has made dramatic impacts on the dependability of devices and interconnects. In the modern manycore system, mesh based Networks-on-Chip (NoC) is widely adopted as on chip communication infrastructure. It is critical to provide an effective fault tolerance scheme on mesh based NoC. A faulty router or broken link isolates a well functional processing element (PE). Also, a set of faulty routers form faulty regions which may break down the whole design. To address these issues, we propose an innovative router-level fault tolerance scheme with spare routers which is different from the traditional microarchitecture-level approach. The spare routers not only provide redundancies but also diversify connection paths between adjacent routers. To exploit these valuable resources on fault tolerant capabilities, two configuration algorithms are demonstrated. One is shift-and-replace-allocation (SARA) and the other is defect-awareness-path-allocation (DAPA) that takes advantage of path diversity in our architecture. The proposed design is transparent to any routing algorithm since the output topology is consistent to the original mesh. Experimental results show that our scheme has remarkable improvements on fault tolerant metrics including reliability, mean time to failure (MTTF), and yield. In addition, the performance of spare router increases with the growth of NoC size but the relative connection cost decreases at the same time. This rare and valuable characteristic makes our solution suitable for large scale NoC design.

5B-2 (Time: 14:10 - 14:40)
TitleA Resilient On-chip Router Design Through Data Path Salvaging
Author*Cheng Liu, Lei Zhang, Yinhe Han, Xiaowei Li (Institute of Computing Technology, Chinese Academy of Sciences, China)
Pagepp. 437 - 442
KeywordNetwork-on-chip, fault tolerance, data path, salvaging, slicing
AbstractVery large scale integrated circuits typically employ Network-on-Chip (NoC) as the backbone for on-chip communication. As technology advances into the nanometer regime, NoCs become more and more susceptible to permanent faults such as manufacturing defects, device wear-out, which hinder the correct operations of the entire system. Therefore, effective fault-tolerant techniques are essential to improve the reliability of NoCs. Prior work mainly focuses on introducing redundancies, which canít achieve satisfactory reliability and also involve large hardware overhead, especially for data path components. In this paper, we propose fine-grained data path salvaging techniques by splitting data path components, i.e., links, input buffers and crossbar into slices, instead of introducing redundancies. As long as there is one fault-free slice for each component, the router can be functional. Experimental results show that the proposed solution achieves quite high reliability with graceful performance degradation even under high fault rate.
Slides

5B-3 (Time: 14:40 - 15:10)
TitleNS-FTR: A Fault Tolerant Routing Scheme for Networks on Chip with Permanent and Runtime Intermittent Faults
Author*Sudeep Pasricha, Yong Zou (Colorado State University, U.S.A.)
Pagepp. 443 - 448
Keywordfault tolerant routing, NoC, turn model, permanent faults, intermittent faults
AbstractIn sub-65nm CMOS technologies, interconnection networks-on-chip (NoC) will increasingly be susceptible to design time permanent faults and runtime intermittent faults, which can cause system failure. To overcome these faults, NoC routing schemes can be enhanced by adding fault tolerance capabilities, so that they can adapt communication flows to follow fault-free paths. A majority of existing fault tolerant routing algorithms are based on the turn model approach due to its simplicity and inherent freedom from deadlock. However, these turn model based algorithms are either too restrictive in the choice of paths that flits can traverse, or are tailored to work efficiently only on very specific fault distribution patterns. In this paper, we propose a novel fault tolerant routing scheme (NS-FTR) for NoC architectures that combines the North-last and South-last turn models to create a robust hybrid NoC routing scheme. The proposed scheme is shown to have a low implementation overhead and adapt to design time and runtime faults better than existing turn model, stochastic random walk, and dual virtual channel based routing schemes.

5B-4 (Time: 15:10 - 15:40)
TitleA Thermal-aware Application Specific Routing Algorithm for Network-on-Chip Design
Author*Zhiliang Qian, Chi-Ying Tsui (The Hong Kong University of Science and Technology, Hong Kong)
Pagepp. 449 - 454
KeywordNetwork-on-Chip, Thermal-aware, Application specific, Routing algorithm
AbstractIn this work, we propose an application specific routing algorithm to reduce the hot-spot temperature for Network-on-chip (NoC) . Using the traffic information of the application, we develop a routing scheme which can achieve a higher adaptivity than the generic ones and at the same time distribute the traffic more uniformly. To reduce the hot-spot temperature, we find the optimal distribution ratio of the communication traffic among the set of candidate paths. The problem of finding this optimal distribution ratio is formulated as a linear programming (LP) problem and is solved offline. A router microarchitecture which supports our ratio-based selection policy is also proposed. From the simulation results, the peak energy reduction can be as high as 16.6% for synthetic traffic and real benchmarks.
Slides