Session 3B: Interconnect, NoCs, and MPSoCs

3B-1 (Time: 15:50 - 16:15)

Title Interconnect Modeling for Improved System-Level Design Optimization
Author Luca Carloni (Columbia Univ., USA), Andrew B. Kahng, Swamy Muddu (Univ. of California, San Diego, USA), Alessandro Pinto (Univ. of California, Berkeley, USA), *Kambiz Samadi, Puneet Sharma (Univ. of California, San Diego, USA)
Abstract Accurate modeling of delay, power, and area of interconnections early in the design phase is crucial for efficient system-level optimization. Models presently used in system-level optimizations, such as network-on-chip (NoC) synthesis are inaccurate in the presence of deep-submicron effects. In this paper, we propose new, highly accurate models for delay and power in buffered interconnects; these models are usable by system-level designers for existing and future technologies. We present a general and transferable methodology to construct our models from a wide variety of reliable sources (Liberty, LEF/ITF, ITRS, PTM, etc.). The modeling infrastructure, and a number of characterized technologies, are available as open-source. Our models comprehend key interconnect circuit and layout design styles, and a power-efficient buffering technique that overcomes unrealities of previous delay-driven buffering techniques. We show that our models are significantly more accurate than previous models for global and intermediate buffered interconnects in 90nm and 65nm foundry processes - essentially matching signoff analyses. We also integrate our models in an automatic NoC topology synthesis tool and show that the more accurate modeling signi cantly affects optimal/achievable architectures that are synthesized by the tool. The increased accuracy afforded by our models enables system-level designers to obtain better assessments of the achievable performance/power/area tradeoffs for (communication-centric aspects of) system design, with negligible setup and overhead burdens.
No Slides

3B-2 (Time: 16:15 - 16:40)

Title NoCOUT : NoC Topology Generation with Mixed Packet-Switched and Point-to-Point Networks
Author Jeremy Chan, *Sri Parameswaran (Univ. of New South Wales, Australia)
Abstract Networks-on-Chip (NoC) have been widely proposed as the future communication paradigm for use in next-generation System-on-Chip. In this paper, we present NoCOUT, a methodology for generating an energy optimized application specific NoC topology which supports both point-to-point and packet-switched networks. The algorithm uses a prohibitive greedy iterative improvement strategy to explore the design space efficiently. A system-level floorplanner is used to evaluate the iterative design improvements and provide feedback on the effects of the topology on wire length. The algorithm is integrated within a NoC synthesis framework with characterized NoC power and area models to allow accurate exploration for a NoC router library. We apply the topology generation algorithm to several test cases including real-world and synthetic communication graphs with both regular and irregular traffic patterns, and varying core sizes. Since the method is iterative, it is possible to start with a known design to search for improvements. Experimental results show that many different applications benefit from a mix of "on chip networks" and "point-to-point networks". With such a hybrid network, we achieve approximately 25% lower energy consumption (with a maximum of 37\%) than a state of the art min-cut partition based topology generator for a variety of benchmarks. In addition, the average hop count is reduced by 0.75 hops, which would significantly reduce the network latency.

3B-3 (Time: 16:40 - 17:05)

Title Automatic Generation of Hardware dependent Software for MPSoCs from Abstract System Specifications
Author *Gunar Schirner, Andreas Gerstlauer, Rainer Domer (Univ. of California, Irvine, USA)
Abstract Increasing software content in embedded systems and SoCs drives the demand to automatically synthesize software binaries from abstract models. This is especially critical for Hardware dependent Software (HdS) due to the tight coupling. In this paper, we present our approach to automatically synthesize HdS from an abstract system model. We synthesize driver code, interrupt handlers and startup code. We furthermore automatically adjust the application to use RTOS services. We target traditional RTOS-based multi-tasking solutions, as well as a pure interrupt-based implementation (without any RTOS). Our experimental results show the automatic generation of final binary images for six real-life target applications and demonstrate significant productivity gains due to automation. Our HdS synthesis is an enabler for efficient MPSoC development and rapid design space exploration.

3B-4 (Time: 17:05 - 17:30)

Title Application-Specific Network-on-Chip Architecture Synthesis Based on Set Partitions and Steiner Trees
Author *Shan Yan, Bill Lin (Univ. of California, San Diego, USA)
Abstract This paper considers the problem of synthesizing application-specific Network-on-Chip (NoC) architectures. We propose two heuristic algorithms called CLUSTER and DECOMPOSE that can systematically examine different set partitions of communication flows, and we propose Rectilinear-Steiner-Tree(RST) based algorithms for generating an efficient network topology for each group in the partition. Different evaluation functions in fitting with the implementation backend and the corresponding implementation technology can be incorporated into our solution framework to evaluate the implementation cost of the set partitions and RST topologies generated. In particular, we experimented with an implementation cost model based on the power consumption parameters of a 70nm process technology where leakage power is a major source of energy consumption. Experimental results on a variety of NoC benchmarks showed that our synthesis results can on average achieve a 6.92X reduction in power consumption over the best standard mesh implementation. To further gauge the effectiveness of our heuristic algorithms, we also implemented an exact algorithm that enumerates all distinct set partitions. For the benchmarks where exact results could be obtained, our CLUSTER and DECOMPOSE algorithms on average can achieve results within 1% and 2% of exact results, with execution times all under 1 second whereas the exact algorithms took as much as 4.5 hours.
No Slides
Last Updated on: January 31, 2008