ASP-DAC 2009 Technical Program

The 14th Asia and South Pacific Design Automation Conference

Session 1A On-Chip Communication Architectures
Time: 10:15 - 12:20 Tuesday, January 20, 2009
Location: Room 411+412
Chair: Sri Parameswaran (University of New South Wales, Australia)

1A-1 (Time: 10:15 - 10:40)

Title	Adaptive Inter-router Links for Low-Power, Area-Efficient and Reliable Network-on-Chip (NoC) Architectures
Author	Avinash Karanth Kodi (Ohio University, United States), Ashwini Sarathy, Ahmed Louri, *Janet Wang (University of Arizona, United States)
Page	pp. 1 - 6
Keyword	network-on-chip, low-power architecture
Abstract	The increasing wire delay constraints in deep sub-micron VLSI designs have led to the emergence of scalable and modular Network-on-Chip (NoC) architectures. As the power consumption, area overhead and performance of the entire NoC is influenced by the router buffers, research efforts have targeted optimized router buffer design. In this paper, we propose iDEAL - inter-router, dual-function energy and area-efficient links capable of data transmission as well as data storage when required. iDEAL enables a reduction in the router buffer size by controlling the repeaters along the links to adaptively function as link buffers during congestion, thereby achieving nearly 30% savings in overall network power and 35% reduction in area with only a marginal 1-3% drop in performance. In addition, aggressive speculative flow control further improves the performance of iDEAL. Moreover, the significant reduction in power consumption and area provides sufficient headroom for monitoring Negative Bias Temperature Instability (NBTI) effects in order to improve circuit reliability at reduced feature sizes.
Slides

1A-2 (Time: 10:40 - 11:05)

Title	Analysis of Communication Delay Bounds for Network on Chips
Author	*Yue Qian (National University of Defense Technology, China), Zhonghai Lu (Royal Institute of Technology, Sweden), Wenhua Dou (National University of Defense Technology, China)
Page	pp. 7 - 12
Keyword	Network-on-chip, network calculus, delay bound
Abstract	In network-on-chip, computing worst-case delay bound for packet delivery is crucial for designing predictable systems but yet an intractable problem due to complicated resource contention scenarios. In this paper, we present an analysis technique to derive the communication delay bound for individual flows. Based on a network contention model, this technique, which is topology independent, employs the network calculus theory to first compute the equivalent service curve for individual flows and then calculate their packet delay bound. To exemplify our method, we also present the derivation of a closed-form formula to calculate the delay bound for all-to-one gather communication. Our experimental results demonstrate the theoretical bounds are correct and tight.

1A-3 (Time: 11:05 - 11:30)

Title	Frequent Value Compression in Packet-based NoC Architectures
Author	Ping Zhou, Bo Zhao, Yu Du, Yi Xu, Youtao Zhang, *Jun Yang (University of Pittsburgh, United States), Li Zhao (Intel, United States)
Page	pp. 13 - 18
Keyword	compression, NoC, performance, power
Abstract	The proliferation of Chip Multiprocessors (CMPs) has led to the integration of large on-chip caches. For scalability reasons, a large on-chip cache is often divided into smaller banks that are interconnected through packet-based Network-on-Chip (NoC). With increasing number of cores and cache banks integrated on a single die, the on-chip network introduces significant communication latency and power consumption. In this paper, we propose a novel scheme that exploits Frequent Value compression to optimize the power and performance of NoC. Our experimental results show that the proposed scheme reduces the router power by up to 16.7%, with CPI reduction as much as 23.5% in our setting. Comparing to the recent zero pattern compression scheme, the frequent value scheme saves up to 11.0\% more router power and has up to 14.5% more CPI reduction. Hardware design of the FV table and its overhead are also presented.

1A-4 (Time: 11:30 - 11:55)

Title	Simultaneous Data Transfer Routing and Scheduling for Interconnect Minimization in Multicycle Communication Architecture
Author	Yu-Ju Hong (Purdue University, United States), Ya-Shih Huang, *Juinn-Dar Huang (National Chiao Tung University, Taiwan)
Page	pp. 19 - 24
Keyword	multicycle communication, architectural synthesis, interconnect minimization, resource allocation and sharing, scheduling
Abstract	In deep submicron technology, wire delay is no longer negligible and is gradually becoming a dominant factor of system performance. Several state-of-the-art architectural synthesis flows have already adopted the distributed register architecture to cope with the increasing wire delay by allowing multicycle communication. In this paper, we formulate channel and register allocation within a refined regular distributed register architecture, named RDR-GRS, as a problem of simultaneous data transfer routing and scheduling for minimizing global interconnect resources. We also present an innovative algorithm with both spatial and temporal considerations. It features both a concentration-oriented path router gathering wire-sharable data transfers and a channel-based time scheduler resolving contentions for wires in a channel, which are in spatial and temporal domain, respectively. The experimental results show that the proposed algorithm can significantly outperform existing related works.

1A-5 (Time: 11:55 - 12:20)

Title	Dynamically Reconfigurable On-Chip Communication Architectures for Multi Use-Case Chip Multiprocessor Applications
Author	Sudeep Pasricha, *Nikil Dutt, Fadi Kurdahi (University of California, Irvine, United States)
Page	pp. 25 - 30
Keyword	crossbar, on-chip communication, synthesis, low power
Abstract	The phenomenon of digital convergence and increasing application complexity today is motivating the design of chip multiprocessor (CMP) applications with multiple use cases. Most traditional on-chip communication architecture design techniques perform synthesis and optimization only for a single use-case, which may lead to sub-optimal design decisions for multi-use case applications. In this paper we present a framework to generate a dynamically reconfigurable crossbar-based on-chip communication architecture that can support multiple use-case bandwidth and latency constraints. Our framework generates on-chip communication architectures with a low cost, low power dissipation, and with minimal reconfiguration overhead. Results of applying our framework on several networking CMP applications show that our approach is able to generate a crossbar solution with significantly lower cost (2.4× to 3.8×), and lower power dissipation (1.5× to 3.1×), compared to the best previously proposed approach.