ASP-DAC 2011 Technical Program

The 16th Asia and South Pacific Design Automation Conference

Session 2B Memory Architecture and Buffer Optimization
Time: 13:40 - 15:40 Wednesday, January 26, 2011
Location: Room 413
Chairs: Yu Wang (Tsinghua University, China), Yinhe Han (Chinese Academy of Sciences, China)

2B-1 (Time: 13:40 - 14:10)

Title	Template-based Memory Access Engine for Accelerators in SoCs
Author	*Bin Li, Zhen Fang, Ravi Iyer (Intel Corporation, U.S.A.)
Page	pp. 147 - 153
Keyword	SoC, Accelerators, Memory systems
Abstract	With the rapid progress in semiconductor technologies, more and more accelerators can be integrated onto a single SoC chip. In SoCs, accelerators often require deterministic data access. However, as more and more applications are running simultaneous, latency can vary significantly due to contention. To address this problem, we propose a template-based memory access engine (MAE) for accelerators in SoCs. The proposed MAE can handle several common memory access patterns observed for near-future accelerators. Our evaluation results show that the proposed MAE can significantly reduce memory access latency and jitter, thus very effective for accelerators in SoCs.
Slides

2B-2 (Time: 14:10 - 14:40)

Title	Realization and Performance Comparison of Sequential and Weak Memory Consistency Models in Network-on-Chip based Multi-core Systems
Author	*Abdul Naeem, Xiaowen Chen, Zhonghai Lu, Axel Jantsch (Royal Institute of Technology, Sweden)
Page	pp. 154 - 159
Keyword	Memory consistency, Distributed shared memory
Abstract	This paper studies realization and performance comparison of the sequential and weak consistency models in the network-on-chip (NoC) based distributed shared memory (DSM) multi-core systems. Memory consistency constraint the order of shared memory operations for the expected behavior of the multi-core systems. Both the consistency models are realized in the NoC based multi-core systems. The performance of the two consistency models are compared for various sizes networks using regular mesh topologies and deflection routing algorithm. The results show that the weak consistency improves the performance by 46.17% and 33.76% on average in the code and consistency latencies over the sequential consistency model, due to the program order relaxation, as the system grows from single core to 64 cores.
Slides

2B-3 (Time: 14:40 - 15:10)

Title	Network-on-Chip Router Design with Buffer-Stealing
Author	Wan-Ting Su, *Jih-Sheng Shen, Pao-Ann Hsiung (National Chung Cheng University, Taiwan)
Page	pp. 160 - 164
Keyword	NoC, Buffer Design
Abstract	A Buffer-Stealing (BS) mechanism is proposed, which enables the input channels in NoC routers that have insufficient buffer space to utilize at runtime the unused input buffers from other input channels. Implementation results of the proposed BS design for a 64-bit 5-input-buffer router show a reduction of the average packet transmission latency by up to 10.17% and an increase of the average throughput by up to 23.47%, at an overhead of 22% more hardware resources.
Slides

2B-4 (Time: 15:10 - 15:40)

Title	Minimizing Buffer Requirements for Throughput Constrained Parallel Execution of Synchronous Dataflow Graph
Author	Tae-ho Shin (Seoul National University, Republic of Korea), Hyunok Oh (Hanyang University, Republic of Korea), *Soonhoi Ha (Seoul National University, Republic of Korea)
Page	pp. 165 - 170
Keyword	Synchronous Dataflow Graph, Static mapping, Dynamic scheduilng, Buffer size minimize
Abstract	This paper concerns throughput-constrained parallel execution of synchronous data flow graph. This paper assumes static mapping and dynamic scheduling of the nodes, which has several benefits over static scheduling approaches. We determine the buffer size of all arcs to minimize the total buffer size while satisfying a throughput constraint. Dynamic scheduling is able to achieve the similar throughput performance as the static scheduling does by unfolding the given SDF graph. A key issue of dynamic scheduling is how to assign the priority to each node invocation, which is also discussed in this paper. Since the problem is NP-hard, we present a heuristic based on a genetic algorithm. The experimental results confirm the viability of the proposed technique.
Slides