ASP-DAC 2014 Technical Program

The 19th Asia and South Pacific Design Automation Conference

Session 4B Emerging Techniques for Future NoC
Time: 10:10 - 12:15 Wednesday, January 22, 2014
Location: Room 301
Chairs: Paul Bogdan (University of Southern California, U.S.A.), Wei Zhang (HKUST, Hong Kong)

4B-1 (Time: 10:10 - 10:35)

Title	A Comprehensive and Accurate Latency Model for Network-on-Chip Performance Analysis
Author	*Zhiliang Qian (The Hong Kong University of Science and Technology, Hong Kong), Da-cheng Juan (Carnegie Mellon University, U.S.A.), Paul Bogdan (University of Southern California, U.S.A.), Chi-Ying Tsui (The Hong Kong University of Science and Technology, Hong Kong), Diana Marculescu, Radu Marculescu (Carnegie Mellon University, U.S.A.)
Page	pp. 323 - 328
Keyword	Queuing model, Analytical model, Network on Chip, Latency
Abstract	In this work, we propose a new, accurate, and comprehensive analytical model for Network-on-Chip (NoC) performance analysis. Given the application communication graph, the NoC architecture, the task mapping and the routing algorithm, the proposed framework analyzes the links dependency and then determines the ordering of queuing analysis for accurate performance modeling. Toward this end, the channel waiting times in the links are estimated using a generalized G/G/1/K queuing model, which can tackle bursty traffic and dependent arrival times with general service time distributions. The proposed model is general and can be used to analyze various traffic scenarios for NoC platforms with arbitrary buffer and packet lengths. Experimental results on both synthetic and real applications demonstrate the accuracy and scalability of the newly proposed model.
Slides

4B-2 (Time: 10:35 - 11:00)

Title	A Low-Latency Asynchronous Interconnection Network with Early Arbitration Resolution
Author	Georgios Faldamis (Cavium, Inc., U.S.A.), *Weiwei Jiang (Columbia University, U.S.A.), Gennette Gill (D.E. Shaw Research, U.S.A.), Steven M. Nowick (Columbia University, U.S.A.)
Page	pp. 329 - 336
Keyword	asynchronous, network-on-chip, low-latency, arbitration, mesh-of-trees
Abstract	A new asynchronous arbitration node is introduced for use as a building block in an asynchronous interconnection network. The target network topology is a variant Mesh-of-Trees (MoT), combining a binary fan-out (i.e. routing) network and a binary fan-in (i.e. arbitration) network, which is becoming widely used for multi-core shared-memory interfaces. The two key features are: (i) each fan-in node can resolve its arbitration and pre-allocate the corresponding input channel, before the actual data arrives; and (ii) a lightweight shadow monitoring network fast forwards information as soon as data enters the network, in continuous time, without synchronization to a fixed-rate clock, notifying each fan-in node on its path to enable the early arbitration. The router nodes were designed in IBM 90nm technology using a ARM standard cell library. SPECTRE simulations indicate that the new arbitration node provided significant reductions in latency of up to 54.4\% over prior designs, while maintaining roughly comparable throughput. Network-level simulations were then performed on eight diverse synthetic benchmarks, comparing the new approach ("early arbitration") with two earlier alternative asynchronous MoT networks ("baseline" and "predictive"), using a mix of random and deterministic traffic. Considerable improvements in system latency were obtained on all benchmarks, ranging from 13.0% to 38.7%. The early arbitration strategy also showed direct benefits for the two most adversarial benchmarks, "uniform random traffic" and "hotspot8".
Slides

4B-3 (Time: 11:00 - 11:25)

Title	A Vertically Integrated and Interoperable Multi-Vendor Synthesis Flow for Predictable NoC Design in Nanoscale Technologies
Author	*Alberto Ghiribaldi, Herve Tatenguem Fankem (University of Ferrara, Italy), Federico Angiolini (iNoCs, Switzerland), Mikkel Stensgaard, Tobias Bjerregaard (Teklatech, Denmark), Davide Bertozzi (University of Ferrara, Italy)
Page	pp. 337 - 342
Keyword	Network-on-Chip, Design Flow, EDA tool, Embedded Systems
Abstract	We deliver a design flow for the synthesis and convergence of application-specific networks-on-chip. The flow comes with novel features that can better address nanoscale design challenges: front-end driven floorplanning, dynamic IR-drop minimization, fast and accurate system-level power grid models, predictable link design. Above all, such features are addressed by different prototype engines, even from different vendors, that can be smoothly integrated into the flow by means of a common specification format the Communication Exchange Format (CEF), that enables unprecedented tool interactions. This flow is validated through an extensive demonstration framework.
Slides

4B-4 (Time: 11:25 - 11:50)

Title	Fuzzy Flow Regulation for Network-on-Chip Based Chip Multiprocessors Systems
Author	*Yuan Yao, Zhonghai Lu (KTH Royal Institute of Technology, Sweden)
Page	pp. 343 - 348
Keyword	Network-on-Chip, Chip Multiprocessor, Flow regulation, Fuzzy logic
Abstract	Flow regulation is a traffic shaping technique, which can be used to improve communication performance with better utilization of network resources in chip multi-processors (CMPs). This paper presents fuzzy flow regulation. Being different from the static flow regulation policy, our system makes regulation decisions fully dynamically according to traffic dynamism and the state of interconnection network. The central idea is to use fuzzy logic to mimic the behavior of an expert that can recognize the network status and then intelligently control the admission of input flows. As the experiment results show, the maximum improvement in average delay reaches 53.0% against static regulation and 37.4% against no regulation. The maximum improvement in average throughput reaches 37.5% against static regulation and 23.8% against no regulation.
Slides

4B-5 (Time: 11:50 - 12:15)

Title	Adjustable Contiguity of Run-Time Task Allocation in Networked Many-Core Systems
Author	*Mohammad Fattah, Pasi Liljeberg, Juha Plosila, Hannu Tenhunen (University of Turku, Finland)
Page	pp. 349 - 354
Keyword	Run-Time Application Mapping, Dynamic Many-Core Systems
Abstract	In this paper, we propose a run-time mapping algorithm, CASqA, for networked many-core systems. In this algorithm, the level of contiguousness of the allocated processors (α) can be adjusted in a fine-grained fashion. A strictly contiguous allocation (α = 0) decreases the latency and power dissipation of the network and improves the applications execution time. However, it limits the achievable throughput and increases the turnaround time of the applications. As a result, recent works consider non-contiguous allocation (α = 1) to improve the throughput traded off against applications execution time and network metrics. Experimental results show that relentlessly allowing non-contiguous allocation not only cripples the network performance, but also degrades the achievable throughput compared to moderated cases (0<α<1). More precisely, up to 35% drop in the network costs can be gained by adjusting the level of contiguity compared to non-contiguous cases, while the achieved throughput is kept constant. Moreover, CASqA provides at least 32% energy saving in the network compared to other works.
Slides