(Back to Session Schedule)

The 20th Asia and South Pacific Design Automation Conference

Session 2A  NoCS II (Power and Emerging Technology)
Time: 13:50 - 15:30 Tuesday, January 20, 2015
Location: Room 102
Chairs: Mehdi Tahoori (Karlsruhe Institute of Technology, Germany), Tomoya Horiguchi (Toshiba)

2A-1 (Time: 13:50 - 14:15)
TitleShuttleNoC: Boosting On-Chip Communication Efficiency by Enabling Localized Power Adaptation
AuthorHang Lu (State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences/University of Chinese Academy of Sciences, China), *Guihai Yan, Yinhe Han, Ying Wang (State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences, China), Xiaowei Li (State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences/University of Chinese Academy of Sciences, China)
Pagepp. 142 - 147
KeywordNetworks-on-Chip (NoC), Power Adaptation, Bandwidth Scaling, Power Gating, Traffic Heterogeneity
AbstractNetworks-on-Chip (NoC) gradually becomes a main contributor of chip-level power consumption. Due to the temporal and spatial heterogeneity of on-chip traffic, existing power management approaches cannot adapt the NoC power consumption to its traffic intensity, and hence lead to a suboptimal power efficiency. They either resort to over-provisioned NoC design that only suits for traffic spatial distribution, or coarse-grained power gating that only serves traffic temporal variation. In this paper, we propose a novel NoC architecture called Shuttle Networks-on-Chip (ShuttleNoC). By permitting packets shuttling between multiple subnetworks, localized power adaptation can be achieved. Experimental results show that ShuttleNoC could achieve optimal power efficiency with up to 23.5% power savings and 22.3% performance boost in comparison with traditional heterogeneity-agnostic NoC designs.

2A-2 (Time: 14:15 - 14:40)
TitleEnergy-Efficient Optical Crossbars on Chip with Multi-Layer Deposited Silicon
AuthorHui LI, *Sébastien Le Beux (Lyon Institute of Nanotechnology, France), Gabriela Nicolescu (Ecole Polytechnique de Montréal, Canada), Ian O'Connor (Lyon Institute of Nanotechnology, France)
Pagepp. 148 - 153
KeywordOptical Network on Chip, crossbar, optical loss
AbstractThe many cores design research community have shown high interest in optical crossbars on chip for more than a decade. Key properties of optical crossbars, namely a) contention-free data routing b) low-latency communication and c) potential for high bandwidth through the use of WDM, motivate several implementations. These implementations demonstrate very different scalability and power efficiency ability depending on three key design factors: a) the network topology, b) the considered layout and c) the insertion losses induced by the fabrication process. The emerging design technique relying on multi-layer deposited silicon allows reducing optical losses, which may lead to significant reduction of the power consumption. In this paper, multi-layer deposited silicon based crossbars are proposed and compared. The results indicate that the proposed ring-based network exhibits, on average, 22% and 51.4% improvement for worst-case and average losses respectively compared to the most power-efficient related crossbars.

2A-3 (Time: 14:40 - 15:05)
TitleTwo-Phase Protocol Converters for 3D Asynchronous 1-of-n Data Links
AuthorJulian Hilgemberg Pontes, *Pascal Vivet, Yvain Thonnart (CEA/LETI, France)
Pagepp. 154 - 159
KeywordNoC, Asynchronous Circuits, Two-phase Handshake, Delay Insensitive Encoding, 3D
AbstractDesign of fully synchronous System on Chip is becoming a challenging task. This task is even more difficult in advanced nodes and 3D designs, where the local and global variability can turns the timing closure an overwhelming task. In this way, the use of asynchronous circuits for long link and 3D link communication can provide better robustness to both local and inter-die variability and achieve faster timing closure by extending the Globally Asynchronous Locally Synchronous style to 3D architectures. However, while the 4 phase protocol is very well adapted for on chip DI communication, it cannot be adapted for off chip and 3D interface communication due to potential large interface delays. In this paper, we propose to use a simple 2 phase DI protocol based on transitions for 1-of-n codes, and we propose new 4-phase / 2-phase data converters. The proposed circuit is able to reduces 20% the dynamic power and increase 31% the throughput for long link communications.
Slides

2A-4 (Time: 15:05 - 15:30)
TitleFine-Grained Runtime Power Budgeting for Networks-on-Chip
Author*Xiaohang Wang, Tengfei Wang (Guangzhou Institute of Advanced Technology, Chinese Academy of Sciences, China), Terrence Mak (Guangzhou Institute of Advanced Technology, Chinese Academy of Sciences/The Chinese University of Hong Kong, China), Mei Yang, Yingtao Jiang (University of Nevada, Las Vegas, U.S.A.), Masoud Daneshtalab (Royal Institute of Technology, Sweden/University of Turku, Finland)
Pagepp. 160 - 165
KeywordNetworks-on-chip, power budgeting, dynamic programming network, latency
AbstractPower budgeting for NoC needs to be performed to meet limited power budget while assuring the best possible overall system performance. For simplicity and ease of implementation, existing NoC power budgeting schemes, irrespective of the fact that the packet arrival rates of different NoC routers may vary significantly, treat all the individual routers indiscriminately when allocating power to them. However, such homogeneous power allocation may provide excess power to routers with low packet arrival rates whereas insufficient power to those with high arrival rates. In this paper, we formulate the NoC power budgeting problem as to optimize the network performance over a power budget through per-router frequency scaling, taking into account of heterogeneous packet arrival rates across different routers as imposed by run time traffic dynamics. Correspondingly, we propose a fine-grained solution using an agile dynamic programming network with a linear time complexity. In essence, frequency of a router is set individually according to its contribution to the average network latency while meeting the power budget. Experimental results have confirmed that with fairly low runtime and hardware overhead, the proposed scheme can help save up to 50 % application execution time when compared with the best existing methods.
Slides