The 12th Asia and South Pacific Design Automation Conference Technical Program

The 12th Asia and South Pacific Design Automation Conference

Session 5B Optimization Issues in Embedded Systems
Time: 13:30 - 15:35 Thursday, January 25, 2007
Location: Room 413
Chairs: Pai Chou (Univ. of California, Irvine, United States), Maziar Goudarzi (Kyushu Univ., Japan)

5B-1 (Time: 13:30 - 13:55)

Title	Retiming for Synchronous Data Flow Graphs
Author	Nikolaos Liveris, Chuan Lin, Jia Wang, *Hai Zhou (Northwestern University, United States), Prithviraj Banerjee (University of Illinois, Chicago, United States)
Page	pp. 480 - 485
Keyword	SDF, retiming, high-level synthesis
Abstract	In this paper we present a new algorithm for retiming Synchronous Dataflow (SDF) graphs. The retiming aims at minimizing the cycle length of an SDF. The algorithm is provably optimal and its execution time is improved compared to previous approaches.

5B-2 (Time: 13:55 - 14:20)

Title	Signal-to-Memory Mapping Analysis for Multimedia Signal Processing
Author	Ilie I. Luican, Hongwei Zhu, *Florin Balasa (University of Illinois at Chicago, United States)
Page	pp. 486 - 491
Keyword	memory management, signal-to-memory mapping, intra-array mapping
Abstract	The storage requirements in data-dominant signal processing systems, whose behavior is described by array-based, loop-organized algorithmic specifications, have an important impact on the overall energy consumption, data access latency, and chip area. Finding the optimal storage of the usually large arrays from these behavioral specifications is an important step during memory allocation. This paper proposes more efficient algorithms for two intra-array mapping-to-memory models (of De Greef and Troncon), resulting in an implementation several times faster than the original ones.

5B-3 (Time: 14:20 - 14:45)

Title	MODLEX: A Multi Objective Data Layout EXploration Framework for Embedded Systems-on-Chip
Author	*Rajesh Kumar T. S. (Texas Instruments India, India), Ravikumar C. P. (Texas Instruments, India), Govindarajan R. (Indian Institute of Science, India)
Page	pp. 492 - 497
Keyword	Memory Architecture, Data Layout, Power-performance Trade-off, Genetic Algorithm
Abstract	The memory subsystem is a major contributor to the performance, power, and area of complex SoCs used in feature rich multimedia products. Hence, memory architecture of the embedded DSP is complex and usually custom designed with multiple banks of single-ported or dual ported on-chip scratch pad memory and multiple banks of off-chip memory. Building software for such large complex memories with many of the software components as individually optimized software IPs is a big challenge. In order to obtain good performance and a reduction in memory stalls, the data buffers of the application need to be placed carefully in different types of memory . In this paper we present a unified framework (MODLEX) that combines different data layout optimizations to address the complex DSP memory architectures. Our method models the data layout problem as multi-objective Genetic Algorithm (GA) with performance and power being the objectives and presents a set of solution points which is attractive from a platform design viewpoint. While most of the work in the literature assumes that performance and power are non-conflicting objectives, our work demonstrates that there is significant trade-off (up to 70\%) that is possible between power and performance.

5B-4 (Time: 14:45 - 15:10)

Title	A Run-Time Memory Protection Methodology
Author	*Udaya Seshua (Philips Semiconductors, India), Nagaraju Bussa (Philips Research, India), Bart Vermeulen (Philips Research, Netherlands)
Page	pp. 498 - 503
Keyword	memory protection, software debug, Hardware/Software co-design
Abstract	In this paper we present a novel methodology, which aids in debugging memory corruption errors during application development. This methodology is based on the analysis of the memory access behavior of a set of benchmark applications. The analysis result is used to strike an optimal balance between hardware and software instrumentation to make our approach low-cost both from a performance penalty and hardware area point-of-view. Experimental results show that our innovative approach typically requires less than 2% of CPU silicon area for less than 1% run-time performance overhead, making it applicable in time-constrained embedded systems.

5B-5 (Time: 15:10 - 15:35)

Title	Short-Circuit Compiler Transformation: Optimizing Conditional Blocks
Author	*Mohammad Ali Ghodrat, Tony Givargis, Alex Nicolau (University of California, Irvine, United States)
Page	pp. 504 - 510
Keyword	Short circuit evaluation, lazy evaluation, compiler transformation, domain space partitioning
Abstract	We present the short-circuit code transformation technique, intended for embedded compilers. The transformation technique optimizes conditional blocks in high-level programs. Specifically, the transformation takes advantage of the fact that the Boolean value of the conditional expression, determining the true/false paths, can be statically analyzed to determine cases when one or the other of the true/false paths are guaranteed to execute. In such cases, code is generated to bypass the evaluation of the conditional expression. In instances when the bypass code is faster to evaluate than the conditional expression, a net performance gain is obtained. Our experiments with the Mediabench applications show that the short-circuit transformation yields a an average of 35.1% improvement in execution time for SPARC and an average of 36.3% improvement in execution time for ARM. We also measured an average of 36.4% reduction in power consumption for ARM.