ASP-DAC 2013 Technical Program

The 18th Asia and South Pacific Design Automation Conference

Session 8B Revisiting Latency and Reliability in Memory Architectures
Time: 13:40 - 15:40 Friday, January 25, 2013
Chairs: Luca Carloni (Columbia University, U.S.A.), Fabien Clermidy (CEA-LETI, France)

8B-1 (Time: 13:40 - 14:10)

Title	Reevaluating the Latency Claims of 3D Stacked Memories
Author	*Daniel W. Chang (University of Wisconsin, Madison, U.S.A.), Gyungsu Byun (West Virginia University, U.S.A.), Hoyoung Kim, Minwook Ahn, Soojung Ryu (Samsung Electronics Co., Ltd., Republic of Korea), Nam S. Kim, Michael Schulte (University of Wisconsin, Madison, U.S.A.)
Page	pp. 657 - 662
Keyword	DRAM, 3D main memory, 3D memory latency, digital signal processor, embedded systems
Abstract	In recent years, 3D technology has been a popular area of study that has allowed researchers to explore a number of novel computer architectures. One of the more popular topics is that of integrating 3D main memory dies below the computing die and connecting them with through-silicon vias (TSVs). This is assumed to reduce off-chip main memory access latencies by roughly 45% to 60%. Our detailed circuit-level models, however, demonstrate that this latency reduction from the TSVs is significantly less. In this paper, we present these models, compare 2D and 3D main memory latencies, and show that the reduction in latency from using 3D main memory to be no more than 2.4 ns. We also show that although the wider I/O bus width enabled by using TSVs increases performance, it may do so with an increase in power consumption. Although TSVs consume less power per bit transfer than off-chip metal interconnects (11.2 times less power per bit transfer), TSVs typically use considerably more bits and may result in a net increase in power due to the large number of bits in the memory I/O bus. Our analysis shows that although a 3D memory hierarchy exploiting a wider memory bus can increase performance, this performance increase may not justify the net increase in power consumption.
Slides

8B-2 (Time: 14:10 - 14:40)

Title	Heterogeneous Memory Management for 3D-DRAM and External DRAM with QoS
Author	*Le-Nguyen Tran (University of California, Irvine, U.S.A.), Houman Homayoun (George Mason University, U.S.A.), Fadi J. Kurdahi, Ahmed M. Eltawil (University of California, Irvine, U.S.A.)
Page	pp. 663 - 668
Keyword	3D-DRAM, Memory management, QoS, Heterogeneous memory system, computer architecture
Abstract	This paper presents an innovative memory management approach to utilize both 3D-DRAM and external DRAM (ex-DRAM). Our approach dynamically allocates and relocates memory blocks between the 3D-DRAM and the ex-DRAM to exploit the high memory bandwidth and the low memory latency of the 3D-DRAM as well as the high capacity and the low cost of the ex-DRAM. Our simulation shows that in workloads that are not memory intensive, our memory management technique transfers all active memory blocks to the 3D-DRAM which runs faster than the ex-DRAM. In memory intensive workloads, our memory management technique utilizes both the 3D-DRAM and the ex-DRAM to increase the memory bandwidth to alleviate the bandwidth congestion. Our approach also supports Quality of Service (QoS) for “latency sensitive”, “bandwidth sensitive”, and “insensitive” applications. To improve the performance and satisfy a certain level of QoS, memory blocks of different application types are allocated differently. Compared to the scratchpad memory management mechanism, the average memory access latency of our approach decreases by 19% and 23%; the performance improves by up to 5% and 12% in single threaded benchmarks and multi-threaded benchmarks respectively. Moreover, using our approach, applications do not need to manage the memory explicitly like the scratchpad case. Our memory block relocation comes with negligible performance overhead, particularly for applications which have high spatial memory locality.

8B-3 (Time: 14:40 - 15:10)

Title	Line Sharing Cache: Exploring Cache Capacity with Frequent Line Value Locality
Author	*Keitarou Oka (Graduate School of Infomation Science and Electrical Engineering, Kyushu University, Japan), Hiroshi Sasaki, Koji Inoue (Faculty of Infomation Science and Electrical Engineering, Kyushu University, Japan)
Page	pp. 669 - 674
Keyword	Cache Memory, Frequent Value Locality, Compression
Abstract	This paper proposes a new LLC architecture called line sharing cache (LSC) which reduces the number of misses without increasing the size of the cache memory. LSC stores lines which have the identical value in a single line entry and allows greater amounts of lines to be stored. Evaluation results show performance improvements of up to 35% across a set of SPEC CPU2000 benchmarks.
Slides

8B-4 (Time: 15:10 - 15:40)

Title	ShieldUS: A Novel Design of Dynamic Shielding for Eliminating 3D TSV Crosstalk Coupling Noise
Author	*Yuan-Ying Chang, Yoshi Shih-Chieh Huang (National Tsing Hua University, Taiwan), Vijaykrishnan Narayanan (Pennsylvania State University, U.S.A.), Chung-Ta King (National Tsing Hua University, Taiwan)
Page	pp. 675 - 680
Keyword	TSV, crosstalk
Abstract	3D IC is a promising technology to meet the demands of high throughput, high scalability, and low power consumption for future generation integrated circuits. One way to implement the 3D IC is to interconnect layers of two-dimensional (2D) IC with Through-Silicon Via (TSV), which shortens the signal lengths. Unfortunately, while TSVs are bundled together as a cluster, the crosstalk coupling noise may lead to transmission errors. As a result, the working frequency of TSVs has to be lowered to avoid the errors, leading to narrower bandwidth that TSVs can provide. In this paper, we first derive the crosstalk noise model from the perspective of 3D chip and then propose ShieldUS, a runtime data-to-TSVs remapping strategy. With ShieldUS, the transition patterns of data over TSVs are observed at runtime, and relatively stable bits will be mapped to the TSVs which act as shields to protect the other bits which have more fluctuations. We evaluate the performance of ShieldUS with address lines from real benchmark traces and data lines of different self-similarities. The results show that ShieldUS is accurate and flexible. We further study dynamic shielding and our design of Interval Equilibration Unit (IEU) can intelligently select suitable parameters for dynamic shielding, which makes dynamic shielding practical and does not need to predefine parameters. This also improves the practicability of ShieldUS.
Slides