Title | Soft Lists: A Native Index Structure for NOR-Flash-Based Embedded Devices |
Author | *Li-Pin Chang, Chen-Hui Hsu (National Chiao Tung University, Taiwan) |
Page | pp. 799 - 804 |
Keyword | flash memory, embedded system, storage systems, data structure |
Abstract | Efficient data indexing is significant to embedded devices, because both CPU cycles and energy are very precious resources. Soft lists, a new index structure for embedded devices with NOR flash, are proposed. The challenge of data indexing over NOR flash is that data update and pointer update may recursively trigger each other. Our approach is to allow a bounded number of probes when a pointer is de-referenced. By this way update and garbage collection is largely simplified, because data can be moved around physical locations without invalidating any pointers. Even better, search with soft lists is very fast, because the probes provide opportunities of forward random skips. Soft lists are evaluated and compared against tree-based index, and soft lists are shown simple but efficient. |
Title | Energy-aware Register File Re-Partitioning for Clustered VLIW Architectures |
Author | *Chun Jason Xue, Minming Li, Yingchao Zhao, Bessie Hu (City University of Hong Kong, Hong Kong) |
Page | pp. 805 - 810 |
Keyword | register file, partition, energy |
Abstract | VLIW architectures have gained acceptance in embedded systems.
Traditional monolithic register file is not suitable for VLIW architectures with a large number of functional units.
Clustered VLIW architecture is often applied, where the register file is partitioned into a number of smaller register files.
Register files represent a substantial portion of the energy consumption in modern processors, and it is growing rapidly with wider instruction width.
Most of the known clustered VLIW architectures partition the register file evenly among clusters.
In this paper, we study the effect of energy consumption with register file re-partitioning
on clustered VLIW architecture, where register files are not necessarily partitioned evenly.
We present algorithms to compute energy-efficient re-partition of register files under different conditions.
The impact of different intercluster communication models as well as the impact of program behavior on the register file re-partitioning are analyzed in this paper.
Experimental results show that energy saving can be achieved using the proposed techniques. |
Title | Memory Subsystem Simulation in Software TLM/T Models |
Author | *Eric Cheung, Harry Hsieh (University of California, Riverside, United States), Felice Balarin (Cadence Design Systems, United States) |
Page | pp. 811 - 816 |
Keyword | Multiprocessor Simulation, Memory Subsystem Simulation, TLM/T |
Abstract | Design of Multiprocessor System-on-a-Chips requires efficient and accurate simulation of every component. Since thememory subsystemaccounts for up to 50%of the performance and energy expenditures, it has to be considered in system-level design space exploration. In this paper, we present a novel technique to simulate memory accesses in software TLM/T models. We use a compiler to automatically expose all memory accesses in software and annotate them onto efficient TLM/T models. A reverse address map provides target memory addresses for accurate cache and memory simulation. Simulating at more than 10MHz, our models allow realistic architectural design space explorations on memory subsystems. We demonstrate our approach with a design exploration case study of an industrial-strength MPEG-2 decoder. |
Title | Exact and Fast L1 Cache Simulation for Embedded Systems |
Author | *Nobuaki Tojo, Nozomu Togawa, Masao Yanagisawa, Tatsuo Ohtsuki (Waseda University, Japan) |
Page | pp. 817 - 822 |
Keyword | cache, design space exploration, cache simulation, cache optimization |
Abstract | In recent years, the gap between the cycle time of
processors and memory access time has been increasing.
One of the solutions to solve this problem is to
use a cache. But just using a large cache may not reduce
the total memory access time. We can have an
optimal cache configuration which minimizes overall
memory access time by varying the three cache parameters:
a cache set size, a line size, and an associativity.
In this paper, we propose two exact cache simulation
algorithms: CRCB1 and CRCB2, based on Cache Inclusion
Property. They realize exact cache simulation
but increase simulation speed dramatically. By using
our approach, the number of cache hit/miss judgments
required for simulating all the cache configurations is
reduced to 31.4%--93.6% compared to conventional approaches.
As a result, our proposed approach totally
runs an average of 1.8 times faster and a maximum
of 3.3 times faster compared to the fastest approach
proposed so far. Our proposed exact cache simulation
approach achieves the world fastest L1 cache simulation. |
Slides |
Title | Accuracy-Aware SRAM: A Reconfigurable Low Power SRAM Architecture for Mobile Multimedia Applications |
Author | Minki Cho (Georgia Institute of Technology, United States), Jason Schlessman (Princeton University, United States), *Wayne Wolf, Saibal Mukhopadhyay (Georgia Institute of Technology, United States) |
Page | pp. 823 - 828 |
Keyword | Memory, Power, Variation, Multimedia, SRAM |
Abstract | We propose a dynamically reconfigurable SRAM architecture for low-power mobile multimedia applications. Parametric failures due to manufacturing variations limit the opportunities for power saving in SRAM. We show that, using a lower voltage for cells storing low-order bits and a nominal voltage for cells storing higher order bits, ~45% savings in memory power can be achieved with a marginal (~10%) reduction in image quality. A reconfigurable array structure is developed to dynamically reconfigure the number of bits in different voltage domains. |