ASP-DAC 2012 Technical Program

The 17th Asia and South Pacific Design Automation Conference

Session 2A System-Level Optimization Techniques for Multi-Core Architectures
Time: 16:10 - 17:50 Tuesday, January 31, 2012
Location: Room 204B
Chairs: Kiyoung Choi (Seoul National University, Republic of Korea), Yuko Hara-Azumi (Ritsumeikan University, Japan)

2A-1 (Time: 16:10 - 16:35)

Title	Learning-Based Power Management for Multi-Core Processors via Idle Period Manipulation
Author	Rong Ye, *Qiang Xu (The Chinese University of Hong Kong, Hong Kong)
Page	pp. 115 - 120
Keyword	Power management, Multicore processor, Machine learning
Abstract	Learning-based dynamic power management (DPM) techniques, being able to adapt to varying system conditions and workloads, have attracted lots of research attention recently. To the best of our knowledge, however, none of the existing learning-based DPM solutions are dedicated to power reduction in multi-core processors, although they can be utilized by treating each processor core as a standalone entity and conducting DPM for them separately. In this work, by including task allocation into our learning-based DPM framework for multi-core processors, we are able to manipulate idle periods on processor cores to achieve a better tradeoff between power consumption and system performance. Experimental results show that the proposed solution significantly outperforms existing DPM techniques.

2A-2 (Time: 16:35 - 17:00)

Title	Memory Access Aware Power Gating for MPSoCs
Author	*Ye-Jyun Lin, Chia-Lin Yang, Jiao-Wei Huang (National Taiwan University, Taiwan), Naehyuck Chang (Seoul National University, Republic of Korea)
Page	pp. 121 - 126
Keyword	mpsoc, low power, power gating
Abstract	As technology continues to scale, reducing leakage is critical to achieve energy efficiency. Power gating can potentially save a significant part of leakage but it incurs both energy and performance penalties. Therefore, power gating decisions need to be made carefully. In the current low-power SoC design, an IP core is power gated when it is not operating. In this paper, we explore the IP idle time due to memory accesses for further leakage reduction. In MPSoCs, due to contention among concurrent memory accesses from different IP cores, memory stall cycles vary significantly, ranging from 10 to 600 cycles according to our experiments. We propose a run-time mechanism that predict the memory stall cycles of an individual IP, and make the power gating decision based on the predicted memory latency and its break-even time. With the predicted memory latency, a power-gated IP can be woken up in advance to avoid performance degradation. The experimental results show that our power management mechanism can achieve 25.3% leakage energy saving within 4% performance penalty.

2A-3 (Time: 17:00 - 17:25)

Title	Buffer Minimization in Pipelined SDF Scheduling on Multi-Core Platforms
Author	Yuankai Chen, *Hai Zhou (Northwestern University, U.S.A.)
Page	pp. 127 - 132
Keyword	Buffer-size minimization, Multi-core, SDF, Scheduling, Pipeline
Abstract	With the increasing number of cores available on modern processors, it is imperative to solve the problem of mapping and scheduling a synchronous data flow graph onto a multi-core platform. Such a solution should not only meet the performance constraint, but also minimize resource usage. In this paper, we consider the pipeline scheduling problem for acyclic synchronous dataflow graph on a given number of cores to minimize the total buffer size while meeting the throughput constraint. We propose a two-level heuristic algorithm for this problem. The inner level finds the optimal buffer size for a given topological order of the input task graph; the outer level explores the space of topological order by applying perturbation to the topological order to improve buffer size. We compared our proposed algorithm to an enumeration algorithm which is able to generate optimal solution for small graphs, and a greedy algorithm which is able to run on large graphs. The experimental results show that our two-level heuristic algorithm achieves near-optimal solution compared to the enumeration algorithm, with only 0.8% increase in buffer size on average but with much shorter runtime, and achieves 38.8% less buffer usage on average, compared to the greedy algorithm.

2A-4 (Time: 17:25 - 17:50)

Title	A Hierarchical C2RTL Framework for FIFO-Connected Stream Applications
Author	*Shuangchen Li, Yongpan Liu, Daming Zhang, Xinyu He (TNList, EE Dept.,Tsinghua University, China), Pei Zhang (Y Explorations Inc., U.S.A.), Huazhong Yang (TNList, EE Dept.,Tsinghua University, China)
Page	pp. 133 - 138
Keyword	C2RTL, Hierarchical synthesis, FIFO sizing
Abstract	In modern embedded systems, the C2RTL (high-level synthesis) technology helps the designer to greatly reduce time-to-market, while satisfying the performance and cost constraints. To attack the performance challenges in complex designs, we propose a FIFO-connected hierarchical approach to replace the traditional flatten one in stream applications. Furthermore, we develop an analytical algorithm to find the optimal FIFO capacity to connect multiple modules efficiently. Finally, we prove the advantages of the proposed method and the feasibility of our algorithm in seven real applications. Experimental results show that the hierarchical approach can have an up to 10.43 times speedup compared to the flatten design, while our analytical FIFO sizing algorithm shrinks design time from hours to seconds with the same accuracy compared to the simulation based approach.