ASP-DAC 2009 Technical Program

The 14th Asia and South Pacific Design Automation Conference

Session 9A Memory Systems Simulation and Optimization
Time: 15:55 - 18:00 Thursday, January 22, 2009
Location: Room 411+412
Chair: Zonghua Gu (Hong Kong University of Science and Technology, Hong Kong)

9A-1 (Time: 15:55 - 16:20)

Title	Soft Lists: A Native Index Structure for NOR-Flash-Based Embedded Devices
Author	*Li-Pin Chang, Chen-Hui Hsu (National Chiao Tung University, Taiwan)
Page	pp. 799 - 804
Keyword	flash memory, embedded system, storage systems, data structure
Abstract	Efficient data indexing is significant to embedded devices, because both CPU cycles and energy are very precious resources. Soft lists, a new index structure for embedded devices with NOR flash, are proposed. The challenge of data indexing over NOR flash is that data update and pointer update may recursively trigger each other. Our approach is to allow a bounded number of probes when a pointer is de-referenced. By this way update and garbage collection is largely simplified, because data can be moved around physical locations without invalidating any pointers. Even better, search with soft lists is very fast, because the probes provide opportunities of forward random skips. Soft lists are evaluated and compared against tree-based index, and soft lists are shown simple but efficient.

9A-2 (Time: 16:20 - 16:45)

Title	Energy-aware Register File Re-Partitioning for Clustered VLIW Architectures
Author	*Chun Jason Xue, Minming Li, Yingchao Zhao, Bessie Hu (City University of Hong Kong, Hong Kong)
Page	pp. 805 - 810
Keyword	register file, partition, energy
Abstract	VLIW architectures have gained acceptance in embedded systems. Traditional monolithic register file is not suitable for VLIW architectures with a large number of functional units. Clustered VLIW architecture is often applied, where the register file is partitioned into a number of smaller register files. Register files represent a substantial portion of the energy consumption in modern processors, and it is growing rapidly with wider instruction width. Most of the known clustered VLIW architectures partition the register file evenly among clusters. In this paper, we study the effect of energy consumption with register file re-partitioning on clustered VLIW architecture, where register files are not necessarily partitioned evenly. We present algorithms to compute energy-efficient re-partition of register files under different conditions. The impact of different intercluster communication models as well as the impact of program behavior on the register file re-partitioning are analyzed in this paper. Experimental results show that energy saving can be achieved using the proposed techniques.

9A-3 (Time: 16:45 - 17:10)

Title	Memory Subsystem Simulation in Software TLM/T Models
Author	*Eric Cheung, Harry Hsieh (University of California, Riverside, United States), Felice Balarin (Cadence Design Systems, United States)
Page	pp. 811 - 816
Keyword	Multiprocessor Simulation, Memory Subsystem Simulation, TLM/T
Abstract	Design of Multiprocessor System-on-a-Chips requires efficient and accurate simulation of every component. Since thememory subsystemaccounts for up to 50%of the performance and energy expenditures, it has to be considered in system-level design space exploration. In this paper, we present a novel technique to simulate memory accesses in software TLM/T models. We use a compiler to automatically expose all memory accesses in software and annotate them onto efficient TLM/T models. A reverse address map provides target memory addresses for accurate cache and memory simulation. Simulating at more than 10MHz, our models allow realistic architectural design space explorations on memory subsystems. We demonstrate our approach with a design exploration case study of an industrial-strength MPEG-2 decoder.

9A-4 (Time: 17:10 - 17:35)

Title	Exact and Fast L1 Cache Simulation for Embedded Systems
Author	*Nobuaki Tojo, Nozomu Togawa, Masao Yanagisawa, Tatsuo Ohtsuki (Waseda University, Japan)
Page	pp. 817 - 822
Keyword	cache, design space exploration, cache simulation, cache optimization
Abstract	In recent years, the gap between the cycle time of processors and memory access time has been increasing. One of the solutions to solve this problem is to use a cache. But just using a large cache may not reduce the total memory access time. We can have an optimal cache configuration which minimizes overall memory access time by varying the three cache parameters: a cache set size, a line size, and an associativity. In this paper, we propose two exact cache simulation algorithms: CRCB1 and CRCB2, based on Cache Inclusion Property. They realize exact cache simulation but increase simulation speed dramatically. By using our approach, the number of cache hit/miss judgments required for simulating all the cache configurations is reduced to 31.4%--93.6% compared to conventional approaches. As a result, our proposed approach totally runs an average of 1.8 times faster and a maximum of 3.3 times faster compared to the fastest approach proposed so far. Our proposed exact cache simulation approach achieves the world fastest L1 cache simulation.
Slides

9A-5 (Time: 17:35 - 18:00)

Title	Accuracy-Aware SRAM: A Reconfigurable Low Power SRAM Architecture for Mobile Multimedia Applications
Author	Minki Cho (Georgia Institute of Technology, United States), Jason Schlessman (Princeton University, United States), *Wayne Wolf, Saibal Mukhopadhyay (Georgia Institute of Technology, United States)
Page	pp. 823 - 828
Keyword	Memory, Power, Variation, Multimedia, SRAM
Abstract	We propose a dynamically reconfigurable SRAM architecture for low-power mobile multimedia applications. Parametric failures due to manufacturing variations limit the opportunities for power saving in SRAM. We show that, using a lower voltage for cells storing low-order bits and a nominal voltage for cells storing higher order bits, ~45% savings in memory power can be achieved with a marginal (~10%) reduction in image quality. A reconfigurable array structure is developed to dynamically reconfigure the number of bits in different voltage domains.