The 11th Asia and South Pacific Design Automation Conference Technical Program

The 11th Asia and South Pacific Design Automation Conference

Friday January 27, 2006

Session 9B Modeling, Compilation and Optimization of Embedded Architectures (16:00 - 18:05)
Location: Room 413
Chair(s): Hiroyuki Tomiyama (Nagoya University, Japan), Lovic Gauthier (FLEETS, Japan)

9B-1 (Time: 16:00 - 16:25)

Title	Workload Prediction and Dynamic Voltage Scaling for MPEG Decoding
Author	Ying Tan, Parth Malani, Qinru Qiu, *Qing Wu (State University of New York at Binghamton, United States)
Page	pp. 911 - 916
Keyword	low power, dynamic voltage scheduling, MPEG decoding, workload prediction
Abstract	In this paper we present three efficient DVS techniques for a MPEG decoder. Their energy reduction is comparable to that of the optimal solution. A workload prediction model is also developed based on the block level statistics of each MPEG frame. Compared with previous works, the new model exhibits a remarkable improvement in accuracy of the prediction. The experimental results show that, with the new prediction model, the presented DVS techniques achieve more energy reduction than previous works while delivering the same Quality of Service (QoS).

9B-2 (Time: 16:25 - 16:50)

Title	Lazy BTB: Reduce BTB Energy Consumption Using Dynamic Profiling
Author	*Yen-Jen Chang (Department of Computer Science, National Chung-Hsing University, Taiwan)
Page	pp. 917 - 922
Keyword	BTB, low-power, dynamic profiling
Abstract	In this paper, we propose an alternative BTB design, called lazy BTB, to reduce the BTB energy consumption by filtering out the redundant lookups. The most distinct feature of the lazy BTB is that it dynamically profiles the taken traces during program execution. Unlike the traditional design in which the BTB has to be looked up every instruction fetch, by introducing an additional field to record the trace information, our design can achieve the goal of one BTB lookup per taken trace. The experimental results show that with a negligible performance degradation the lazy BTB can reduce the BTB energy consumption by about 77% on average for the MediaBench applications.

9B-3 (Time: 16:50 - 17:15)

Title	Cache Size Selection for Performance, Energy and Reliability of Time-Constrained Systems
Author	Yuan Cai (University of Iowa, United States), Marcus T. Schmitz, Alireza Ejlali, Bashir M. Al-Hashimi (University of Southampton, Great Britain), *Sudhakar M. Reddy (University of Iowa, United States)
Page	pp. 923 - 928
Keyword	cache size, energy, performability, reliability, performance
Abstract	Improving performance, reducing energy consumption and enhancing reliability are three important objectives for embedded computing systems design. In this paper, we study the joint impact of cache size selection on these three objectives. For this purpose, we conduct extensive fault injection experiments on five benchmark examples using a cycle-accurate processor simulator. Performance and reliability are analyzed using the performability metric. Overall, our experiments demonstrate the importance of a careful cache size selection when designing energy-efficient and reliable systems. Furthermore, the experimental results show the existence of optimal or Pareto-optimal cache size selection to optimize the three design objectives.

9B-4 (Time: 17:15 - 17:40)

Title	Reducing Dynamic Compilation Overhead by Overlapping Compilation and Execution
Author	Priya Unnikrishnan (IBM Toronto, Canada), Mahmut Kandemir, *Feihui Li (Pennsylvania State University, United States)
Page	pp. 929 - 934
Keyword	embedded Java, dynamic compilation, performance optimization
Abstract	An important problem in executing applications in energy-sensitive embedded environments is to tune their behavior based on dynamic variations in energy constraints. One option for achieving this is dynamic compilation --- compiling code fragments on the fly to adapt to changing energy demands. While dynamic compilation can be very beneficial in many embedded environments where multiple criteria need to be satisfied during execution, it can also incur a significant performance overhead since compilation takes place at runtime. The goal in this work is to reduce this performance overhead of dynamic compilation by overlapping it with application execution. Specifically, provided that we have available hardware resources to perform dynamic compilation concurrently with application execution, our approach compiles the next code fragment to be executed while we are executing the current code fragment. The experimental results from our implementation indicate significant savings in execution times. Our experimental results also indicate that the proposed strategy performs consistently well under different parameters.

9B-5 (Time: 17:40 - 18:05)

Title	Functional Modeling Techniques for Efficient Sw Code Generation of Video Codec Application
Author	*Sang-Il Han (TIMA Laboratory, France), Soo-Ik Chae (Seoul National University, Republic of Korea), Ahmed Amine Jerraya (TIMA Laboratory, France)
Page	pp. 935 - 940
Keyword	Functional model, video codec, software generation, clocked synchronous model, abstract clock
Abstract	Architectures with multiple programmable cores are becoming more attractive for video codec applications because they can provide highly concurrent computation and support multiple video standards and a shorter time-to-market. To find an efficient SW code for the multiple core architecture for a video codec application, it is very important to easily explore the design space by generating a SW code automatically from its functional model. We introduce Abstract Clock Synchronous Model (ACSM) for functional modeling of video codec applications. The ACSM can easily represent both parallelism and conditionals, which are common in video codec applications. By applying ACSM to an H.264 baseline decoder on single core architecture, we reduced the execution time and the number of external memory accesses by 32 % and 46 % respectively compared to traditional dataflow model.