(Back to Session Schedule)

The 11th Asia and South Pacific Design Automation Conference

Friday January 27, 2006

Session 9B Modeling, Compilation and Optimization of Embedded Architectures (16:00 - 18:05)
Location: Room 413
Chair(s): Hiroyuki Tomiyama (Nagoya University, Japan), Lovic Gauthier (FLEETS, Japan)

9B-1 (Time: 16:00 - 16:25)
TitleWorkload Prediction and Dynamic Voltage Scaling for MPEG Decoding
AuthorYing Tan, Parth Malani, Qinru Qiu, *Qing Wu (State University of New York at Binghamton, United States)
Pagepp. 911 - 916
Keywordlow power, dynamic voltage scheduling, MPEG decoding, workload prediction
AbstractIn this paper we present three efficient DVS techniques for a MPEG decoder. Their energy reduction is comparable to that of the optimal solution. A workload prediction model is also developed based on the block level statistics of each MPEG frame. Compared with previous works, the new model exhibits a remarkable improvement in accuracy of the prediction. The experimental results show that, with the new prediction model, the presented DVS techniques achieve more energy reduction than previous works while delivering the same Quality of Service (QoS).

9B-2 (Time: 16:25 - 16:50)
TitleLazy BTB: Reduce BTB Energy Consumption Using Dynamic Profiling
Author*Yen-Jen Chang (Department of Computer Science, National Chung-Hsing University, Taiwan)
Pagepp. 917 - 922
KeywordBTB, low-power, dynamic profiling
AbstractIn this paper, we propose an alternative BTB design, called lazy BTB, to reduce the BTB energy consumption by filtering out the redundant lookups. The most distinct feature of the lazy BTB is that it dynamically profiles the taken traces during program execution. Unlike the traditional design in which the BTB has to be looked up every instruction fetch, by introducing an additional field to record the trace information, our design can achieve the goal of one BTB lookup per taken trace. The experimental results show that with a negligible performance degradation the lazy BTB can reduce the BTB energy consumption by about 77% on average for the MediaBench applications.

9B-3 (Time: 16:50 - 17:15)
TitleCache Size Selection for Performance, Energy and Reliability of Time-Constrained Systems
AuthorYuan Cai (University of Iowa, United States), Marcus T. Schmitz, Alireza Ejlali, Bashir M. Al-Hashimi (University of Southampton, Great Britain), *Sudhakar M. Reddy (University of Iowa, United States)
Pagepp. 923 - 928
Keywordcache size, energy, performability, reliability, performance
AbstractImproving performance, reducing energy consumption and enhancing reliability are three important objectives for embedded computing systems design. In this paper, we study the joint impact of cache size selection on these three objectives. For this purpose, we conduct extensive fault injection experiments on five benchmark examples using a cycle-accurate processor simulator. Performance and reliability are analyzed using the performability metric. Overall, our experiments demonstrate the importance of a careful cache size selection when designing energy-efficient and reliable systems. Furthermore, the experimental results show the existence of optimal or Pareto-optimal cache size selection to optimize the three design objectives.

9B-4 (Time: 17:15 - 17:40)
TitleReducing Dynamic Compilation Overhead by Overlapping Compilation and Execution
AuthorPriya Unnikrishnan (IBM Toronto, Canada), Mahmut Kandemir, *Feihui Li (Pennsylvania State University, United States)
Pagepp. 929 - 934
Keywordembedded Java, dynamic compilation, performance optimization
AbstractAn important problem in executing applications in energy-sensitive embedded environments is to tune their behavior based on dynamic variations in energy constraints. One option for achieving this is dynamic compilation --- compiling code fragments on the fly to adapt to changing energy demands. While dynamic compilation can be very beneficial in many embedded environments where multiple criteria need to be satisfied during execution, it can also incur a significant performance overhead since compilation takes place at runtime. The goal in this work is to reduce this performance overhead of dynamic compilation by overlapping it with application execution. Specifically, provided that we have available hardware resources to perform dynamic compilation concurrently with application execution, our approach compiles the next code fragment to be executed while we are executing the current code fragment. The experimental results from our implementation indicate significant savings in execution times. Our experimental results also indicate that the proposed strategy performs consistently well under different parameters.

9B-5 (Time: 17:40 - 18:05)
TitleFunctional Modeling Techniques for Efficient Sw Code Generation of Video Codec Application
Author*Sang-Il Han (TIMA Laboratory, France), Soo-Ik Chae (Seoul National University, Republic of Korea), Ahmed Amine Jerraya (TIMA Laboratory, France)
Pagepp. 935 - 940
KeywordFunctional model, video codec, software generation, clocked synchronous model, abstract clock
AbstractArchitectures with multiple programmable cores are becoming more attractive for video codec applications because they can provide highly concurrent computation and support multiple video standards and a shorter time-to-market. To find an efficient SW code for the multiple core architecture for a video codec application, it is very important to easily explore the design space by generating a SW code automatically from its functional model. We introduce Abstract Clock Synchronous Model (ACSM) for functional modeling of video codec applications. The ACSM can easily represent both parallelism and conditionals, which are common in video codec applications. By applying ACSM to an H.264 baseline decoder on single core architecture, we reduced the execution time and the number of external memory accesses by 32 % and 46 % respectively compared to traditional dataflow model.