5B: Optimization Issues in Embedded Systems


5B-1
Title Retiming for Synchronous Data Flow Graphs
Author Nikolaos Liveris, Chuan Lin, Jia Wang, *Hai Zhou (Northwestern University, United States), Prithviraj Banerjee (University of Illinois, Chicago, United States)
Abstract In this paper we present a new algorithm for retiming Synchronous Dataflow (SDF) graphs. The retiming aims at minimizing the cycle length of an SDF. The algorithm is provably optimal and its execution time is improved compared to previous approaches.
Slides (pdf file) 5B-1

5B-2
Title Signal-to-Memory Mapping Analysis for Multimedia Signal Processing
Author Ilie I. Luican, Hongwei Zhu, *Florin Balasa (University of Illinois at Chicago, United States)
Abstract The storage requirements in data-dominant signal processing systems, whose behavior is described by array-based, loop-organized algorithmic specifications, have an important impact on the overall energy consumption, data access latency, and chip area. Finding the optimal storage of the usually large arrays from these behavioral specifications is an important step during memory allocation. This paper proposes more efficient algorithms for two intra-array mapping-to-memory models (of De Greef and Troncon), resulting in an implementation several times faster than the original ones.
Slides (pdf file) 5B-2

5B-3
Title MODLEX: A Multi Objective Data Layout EXploration Framework for Embedded Systems-on-Chip
Author *Rajesh Kumar T. S. (Texas Instruments India, India), Ravikumar C. P. (Texas Instruments, India), Govindarajan R. (Indian Institute of Science, India)
Abstract The memory subsystem is a major contributor to the performance, power, and area of complex SoCs used in feature rich multimedia products. Hence, memory architecture of the embedded DSP is complex and usually custom designed with multiple banks of single-ported or dual ported on-chip scratch pad memory and multiple banks of off-chip memory. Building software for such large complex memories with many of the software components as individually optimized software IPs is a big challenge. In order to obtain good performance and a reduction in memory stalls, the data buffers of the application need to be placed carefully in different types of memory . In this paper we present a unified framework (MODLEX) that combines different data layout optimizations to address the complex DSP memory architectures. Our method models the data layout problem as multi-objective Genetic Algorithm (GA) with performance and power being the objectives and presents a set of solution points which is attractive from a platform design viewpoint. While most of the work in the literature assumes that performance and power are non-conflicting objectives, our work demonstrates that there is significant trade-off (up to 70\%) that is possible between power and performance.
Slides (pdf file) 5B-3

5B-4
Title A Run-Time Memory Protection Methodology
Author *Udaya Seshua (Philips Semiconductors, India), Nagaraju Bussa (Philips Research, India), Bart Vermeulen (Philips Research, Netherlands)
Abstract In this paper we present a novel methodology, which aids in debugging memory corruption errors during application development. This methodology is based on the analysis of the memory access behavior of a set of benchmark applications. The analysis result is used to strike an optimal balance between hardware and software instrumentation to make our approach low-cost both from a performance penalty and hardware area point-of-view. Experimental results show that our innovative approach typically requires less than 2% of CPU silicon area for less than 1% run-time performance overhead, making it applicable in time-constrained embedded systems.
Slides (pdf file) 5B-4

5B-5
Title Short-Circuit Compiler Transformation: Optimizing Conditional Blocks
Author *Mohammad Ali Ghodrat, Tony Givargis, Alex Nicolau (University of California, Irvine, United States)
Abstract We present the short-circuit code transformation technique, intended for embedded compilers. The transformation technique optimizes conditional blocks in high-level programs. Specifically, the transformation takes advantage of the fact that the Boolean value of the conditional expression, determining the true/false paths, can be statically analyzed to determine cases when one or the other of the true/false paths are guaranteed to execute. In such cases, code is generated to bypass the evaluation of the conditional expression. In instances when the bypass code is faster to evaluate than the conditional expression, a net performance gain is obtained. Our experiments with the Mediabench applications show that the short-circuit transformation yields a an average of 35.1% improvement in execution time for SPARC and an average of 36.3% improvement in execution time for ARM. We also measured an average of 36.4% reduction in power consumption for ARM.
Slides (pdf file) 5B-5
Last Updated on: January 29, 2007