ASP-DAC 2010 Technical Program

The 15th Asia and South Pacific Design Automation Conference

Session 3C System-level Modelling and Analysis
Time: 8:30 - 10:10 Wednesday, January 20, 2010
Location: Room 101C
Chairs: Soonhoi Ha (Seoul National University, Republic of Korea), Nagisa Ishiura (Kwansei Gakuin University, Japan)

3C-1 (Time: 8:30 - 8:55)

Title	Constrained Global Scheduling of Streaming Applications on MPSoCs
Author	*Jun Zhu, Ingo Sander, Axel Jantsch (Royal Institute of Technology, Sweden)
Page	pp. 223 - 228
Keyword	synchronous data flow, scheduling, buffer minimization, streaming applications, MPSoCs
Abstract	We present a global scheduling framework for synchronous data flow (SDF) streaming applications on MPSoCs, based on optimized computation and contention-free routing. The global scheduling of processors computing and communication transactions are formulated as constraint based problem, to avoid the scheduling overhead in TDMA-like heuristic schemes. A public domain constraint solver is exploited to solve the NP-complete scheduling efficiently, together with problem specific constraint modeling techniques. Experimental results show that the proposed framework can achieve a high predictable application throughput with minimized buffer cost. For instance, for applications in communication domain, higher throughput (up to 87%) has been observed with less buffer cost, compared to scenarios considering the heuristic scheduling overhead.
Slides

3C-2 (Time: 8:55 - 9:20)

Title	Analyzing Impact of Multiple ABB and AVS Domains on Throughput of Power and Thermal-Constrained Multi-Core Processors
Author	Jungseob Lee, Shi-Ting Zhou, *Nam Sung Kim (University of Wisconsin-Madison, U.S.A.)
Page	pp. 229 - 234
Keyword	Multicore, AVS, ABB
Abstract	Recently, semiconductor industries have integrated more cores in a single die, which substantially improves the throughput of the processors running highly-parallel applications. However, many existing applications do not have high enough parallelism to exploit multiple cores in a die, slowing the transition to many-core processors with smaller and more cores that benefit future applications with high parallelism. In this paper, we analyze the impact of multiple adaptive voltage scaling (AVS) and adaptive body biasing (ABB) domains on the throughput of power and thermal-constrained multi-core processors when they are combined with per-core power-gating (PCPG). Both AVS and ABB can be effectively used to either increase frequency (thus throughput) or decrease power consumption of the processors. Meanwhile, PCPG can provide extra power and thermal headroom when application’s parallelism is limited. First, we analyze the throughput impact of applying AVS, ABB, and PCPG for power and thermal constrained multi-core processors. Second, we investigate the impact of multiple AVS and ABB domains on the throughput, and recommend the most cost-effective number of domains for AVS and ABB in 16 and 8-core processors. Our analysis using the 32nm predictive technology model considering within-die variations suggests that the most cost-effective number of domains for AVS and/or ABB should be one for each when they are combined with PCPG in both 16 and 8-core processors. Since within-die core-to-core variations provide many choices in terms of core frequency and power consumption for limited-parallelism applications, one AVS or ABB domain can leads to the throughput improvement by 1.77~2.49×; more than one AVS and/or ABB domains only improve the throughput marginally.

3C-3 (Time: 9:20 - 9:45)

Title	Source-Level Timing Annotation for Fast and Accurate TLM Computation Model Generation
Author	Kai-Li Lin, *Chen-Kang Lo, Ren-Song Tsay (National Tsing Hua University, Taiwan)
Page	pp. 235 - 240
Keyword	TLM, timing annotation
Abstract	This paper proposes a source-level timing annotation method for generation of accurate transaction level models for software computation modules. While Transaction Level Modeling (TLM) approach is widely adopted now for system modeling and simulation speed improvement, timing estimation accuracy often is compromised. To have reliable and accurate estimation results at system level, we propose a timing annotation method for accurate TLM computation model generation considering processor architecture with pipeline and cache structures, which are challenging but critical to accurate timing estimation. The experiments show that our results are within 2% of cycle accurate results and the approach is three orders faster than conventional ISS approaches.

3C-4 (Time: 9:45 - 10:10)

Title	Improved On-Chip Router Analytical Power and Area Modeling
Author	Andrew B. Kahng, Bill Lin, *Kambiz Samadi (UC San Diego, U.S.A.)
Page	pp. 241 - 246
Keyword	Network-on-Chip, System-level, On-Chip Router
Abstract	Over the course of this decade, uniprocessor chips have given way to multi-core chips which have become the primary building blocks of today’s computer systems. The presence of multiple cores on a chip shifts the focus from computation to communication as a key bottleneck to achieving performance improvements. As industry moves towards many-core chips, networks-on-chip (NoCs) are emerging as the scalable fabric for interconnecting the cores. With power now the first-order design constraint, early-stage estimation of NoC power has become crucially important. Existing power models (e.g., ORION 2.0 [12], Xpipes [7], etc.) are based on certain router microarchitecture and circuit implementation. Therefore, when validated against different NoC prototypes - different router implementations – we saw significant deviation (up to 40% on average) that can lead to erroneous NoC design choices. This has prompted our development of a new, accurate architecture- and circuit implementation-independent router power and area modeling methodology with complete portability across existing NoC component libraries. Also, validation against a range of implemented router designs confirms substantial improvement in accuracy over existing models.
Slides