(Back to Session Schedule)

The 17th Asia and South Pacific Design Automation Conference

Session 2A  System-Level Optimization Techniques for Multi-Core Architectures
Time: 16:10 - 17:50 Tuesday, January 31, 2012
Location: Room 204B
Chairs: Kiyoung Choi (Seoul National University, Republic of Korea), Yuko Hara-Azumi (Ritsumeikan University, Japan)

2A-1 (Time: 16:10 - 16:35)
TitleLearning-Based Power Management for Multi-Core Processors via Idle Period Manipulation
AuthorRong Ye, *Qiang Xu (The Chinese University of Hong Kong, Hong Kong)
Pagepp. 115 - 120
KeywordPower management, Multicore processor, Machine learning
AbstractLearning-based dynamic power management (DPM) techniques, being able to adapt to varying system conditions and workloads, have attracted lots of research attention recently. To the best of our knowledge, however, none of the existing learning-based DPM solutions are dedicated to power reduction in multi-core processors, although they can be utilized by treating each processor core as a standalone entity and conducting DPM for them separately. In this work, by including task allocation into our learning-based DPM framework for multi-core processors, we are able to manipulate idle periods on processor cores to achieve a better tradeoff between power consumption and system performance. Experimental results show that the proposed solution significantly outperforms existing DPM techniques.

2A-2 (Time: 16:35 - 17:00)
TitleMemory Access Aware Power Gating for MPSoCs
Author*Ye-Jyun Lin, Chia-Lin Yang, Jiao-Wei Huang (National Taiwan University, Taiwan), Naehyuck Chang (Seoul National University, Republic of Korea)
Pagepp. 121 - 126
Keywordmpsoc, low power, power gating
AbstractAs technology continues to scale, reducing leakage is critical to achieve energy efficiency. Power gating can potentially save a significant part of leakage but it incurs both energy and performance penalties. Therefore, power gating decisions need to be made carefully. In the current low-power SoC design, an IP core is power gated when it is not operating. In this paper, we explore the IP idle time due to memory accesses for further leakage reduction. In MPSoCs, due to contention among concurrent memory accesses from different IP cores, memory stall cycles vary significantly, ranging from 10 to 600 cycles according to our experiments. We propose a run-time mechanism that predict the memory stall cycles of an individual IP, and make the power gating decision based on the predicted memory latency and its break-even time. With the predicted memory latency, a power-gated IP can be woken up in advance to avoid performance degradation. The experimental results show that our power management mechanism can achieve 25.3% leakage energy saving within 4% performance penalty.

2A-3 (Time: 17:00 - 17:25)
TitleBuffer Minimization in Pipelined SDF Scheduling on Multi-Core Platforms
AuthorYuankai Chen, *Hai Zhou (Northwestern University, U.S.A.)
Pagepp. 127 - 132
KeywordBuffer-size minimization, Multi-core, SDF, Scheduling, Pipeline
AbstractWith the increasing number of cores available on modern processors, it is imperative to solve the problem of mapping and scheduling a synchronous data flow graph onto a multi-core platform. Such a solution should not only meet the performance constraint, but also minimize resource usage. In this paper, we consider the pipeline scheduling problem for acyclic synchronous dataflow graph on a given number of cores to minimize the total buffer size while meeting the throughput constraint. We propose a two-level heuristic algorithm for this problem. The inner level finds the optimal buffer size for a given topological order of the input task graph; the outer level explores the space of topological order by applying perturbation to the topological order to improve buffer size. We compared our proposed algorithm to an enumeration algorithm which is able to generate optimal solution for small graphs, and a greedy algorithm which is able to run on large graphs. The experimental results show that our two-level heuristic algorithm achieves near-optimal solution compared to the enumeration algorithm, with only 0.8% increase in buffer size on average but with much shorter runtime, and achieves 38.8% less buffer usage on average, compared to the greedy algorithm.

2A-4 (Time: 17:25 - 17:50)
TitleA Hierarchical C2RTL Framework for FIFO-Connected Stream Applications
Author*Shuangchen Li, Yongpan Liu, Daming Zhang, Xinyu He (TNList, EE Dept.,Tsinghua University, China), Pei Zhang (Y Explorations Inc., U.S.A.), Huazhong Yang (TNList, EE Dept.,Tsinghua University, China)
Pagepp. 133 - 138
KeywordC2RTL, Hierarchical synthesis, FIFO sizing
AbstractIn modern embedded systems, the C2RTL (high-level synthesis) technology helps the designer to greatly reduce time-to-market, while satisfying the performance and cost constraints. To attack the performance challenges in complex designs, we propose a FIFO-connected hierarchical approach to replace the traditional flatten one in stream applications. Furthermore, we develop an analytical algorithm to find the optimal FIFO capacity to connect multiple modules efficiently. Finally, we prove the advantages of the proposed method and the feasibility of our algorithm in seven real applications. Experimental results show that the hierarchical approach can have an up to 10.43 times speedup compared to the flatten design, while our analytical FIFO sizing algorithm shrinks design time from hours to seconds with the same accuracy compared to the simulation based approach.