Title | Template-based Memory Access Engine for Accelerators in SoCs |
Author | *Bin Li, Zhen Fang, Ravi Iyer (Intel Corporation, U.S.A.) |
Page | pp. 147 - 153 |
Keyword | SoC, Accelerators, Memory systems |
Abstract | With the rapid progress in semiconductor technologies, more and more accelerators can be integrated onto a single SoC chip. In SoCs, accelerators often require deterministic data access. However, as more and more applications are running simultaneous, latency can vary significantly due to contention. To address this problem, we propose a template-based memory access engine (MAE) for accelerators in SoCs. The proposed MAE can handle several common memory access patterns observed for near-future accelerators. Our evaluation results show that the proposed MAE can significantly reduce memory access latency and jitter, thus very effective for accelerators in SoCs. |
Slides |
Title | Realization and Performance Comparison of Sequential and Weak Memory Consistency Models in Network-on-Chip based Multi-core Systems |
Author | *Abdul Naeem, Xiaowen Chen, Zhonghai Lu, Axel Jantsch (Royal Institute of Technology, Sweden) |
Page | pp. 154 - 159 |
Keyword | Memory consistency, Distributed shared memory |
Abstract | This paper studies realization and performance comparison of the sequential and weak consistency models in the network-on-chip (NoC) based distributed shared memory (DSM) multi-core systems. Memory consistency constraint the order of shared memory operations for the expected behavior of the multi-core systems. Both the consistency models are realized in the NoC based multi-core systems. The performance of the two consistency models are compared for various sizes networks using regular mesh topologies and deflection routing algorithm. The results show that the weak consistency improves the performance by 46.17% and 33.76% on average in the code and consistency latencies over the sequential consistency model, due to the program order relaxation, as the system grows from single core to 64 cores. |
Slides |
Title | Minimizing Buffer Requirements for Throughput Constrained Parallel Execution of Synchronous Dataflow Graph |
Author | Tae-ho Shin (Seoul National University, Republic of Korea), Hyunok Oh (Hanyang University, Republic of Korea), *Soonhoi Ha (Seoul National University, Republic of Korea) |
Page | pp. 165 - 170 |
Keyword | Synchronous Dataflow Graph, Static mapping, Dynamic scheduilng, Buffer size minimize |
Abstract | This paper concerns throughput-constrained parallel execution of synchronous data flow graph. This paper assumes static mapping and dynamic scheduling of the nodes, which has several benefits over static scheduling approaches. We determine the buffer size of all arcs to minimize the total buffer size while satisfying a throughput constraint. Dynamic scheduling is able to achieve the similar throughput performance as the static scheduling does by unfolding the given SDF graph. A key issue of dynamic scheduling is how to assign the priority to each node invocation, which is also discussed in this paper. Since the problem is NP-hard, we present a heuristic based on a genetic algorithm. The experimental results confirm the viability of the proposed technique. |
Slides |