Title | Performance-Driven Register Write Inhibition in High-Level Synthesis under Strict Maximum-Permissible Clock Latency Range |
Author | *Keisuke Inoue, Mineo Kaneko (Japan Advanced Institute of Science and Technology, Japan) |
Page | pp. 239 - 244 |
Keyword | high-level synthesis, register write inhibition, FU binding, register binding |
Abstract | Clock skew scheduling is a process of assigning intentional clock skews to registers for improving circuit performance and reliability. Due to the recent large effect of process variations, it becomes more and more difficult to reliably implement a large set of arbitrary clock latencies. Consequently, the optimization potential of clock skew scheduling should be highly limited. This paper points out that there is a chance to achieve further improvement of circuit performance by removing some register-writes while preserving reliability. This paper is the first work of the clock skew-aware high-level synthesis framework considering register write inhibition to minimize the clock period. A network flow-based heuristic algorithm
to obtain the minimum clock period is presented and evaluated by experiments, which supports the effectiveness of the approach. |
Title | Clock Period Minimization with Minimum Area Overhead in High-Level Synthesis of Nonzero Clock Skew Circuits |
Author | *Wen-Pin Tu, Shih-Hsu Huang, Chun-Hua Cheng (Chung Yuan Christian University, Taiwan) |
Page | pp. 245 - 250 |
Keyword | Clock Skew Optimization, Clock Period Minimization, High-Level Synthesis, Resource Binding, Area Minimization |
Abstract | Although clock skew can be utilized to reduce the clock period, the utilization of clock skew also limits the sharing of resources (including registers and functional units). Previous works have considered the influence of clock arrival times on register sharing, but they do not pay any attention to the influence of clock arrival times on functional unit sharing. As a result, extra functional units are often required during functional unit binding. Based on that observation, in this paper, we perform the simultaneous application of register binding and functional unit binding for the high-level synthesis of nonzero clock skew circuits. Our objective is to minimize the circuit area for working with the lower bound of the clock period. Compared with previous works, benchmark data show that our approach can achieve the lower bound of the clock period with a smaller area overhead. |
Title | Clock-Constrained Simultaneous Allocation and Binding for Multiplexer Optimization in High-Level Synthesis |
Author | *Yuko Hara-Azumi, Hiroyuki Tomiyama (Ritsumeikan University, Japan) |
Page | pp. 251 - 256 |
Keyword | High-level synthesis, Allocation, Binding, Multiplexer, Clock constraint |
Abstract | This paper proposes a novel simultaneous allocation and binding method in high-level synthesis, which minimizes the circuit area including multiplexers (MUXs) under a clock constraint. Most existing works on binding minimize MUXs under given allocation by minimizing the number of interconnections, but do not care where the MUXs would be inserted in a circuit. As a result, they cannot guarantee the required clock frequency and often violate the clock constraint. On the contrary, our work globally optimizes binding and allocation for FUs and registers while meeting the clock constraint by considering where MUXs would be inserted. Our work is formulated as an ILP problem. Also, an effective ILP-based heuristic for non-small designs is presented. Experimental results demonstrate that our work satisfies the clock constraint with the minimum circuit area. |
Title | An Integrated and Automated Memory Optimization Flow for FPGA Behavioral Synthesis |
Author | *Yuxin Wang (Computer Science Department, Peking University and UCLA/PKU Joint Research Institute in Science and Engineering, China), Peng Zhang (Computer Science Department, University of California, Los Angeles, U.S.A.), Xu Cheng (Computer Science Department, Peking University, China), Jason Cong (Computer Science Department, University of California, Los Angeles and UCLA/PKU Joint Research Institute in Science and Engineering, U.S.A.) |
Page | pp. 257 - 262 |
Keyword | Behavioral Synthesis, Memory Partitioning, Memory Merging |
Abstract | Behavioral synthesis tools have made significant progress in compiling high-level programs into register-transfer level (RTL) specifications. But manually rewriting code is still necessary in order to obtain better quality of results in memory system optimization. In recent years different automated memory optimization techniques have been proposed and implemented, such as data reuse and memory partitioning, but the problem of integrating these techniques into an applicable flow to obtain a better performance has become a challenge. In this paper we integrate data reuse, loop pipelining, memory partitioning, and memory merging into an automated optimization flow (AMO) for FPGA behavioral synthesis. We develop memory padding to help in the memory partitioning of indices with modulo operations. Experimental results on Xilinx Virtex-6 FPGAs show that our integrated approach can gain an average 5.8x throughput and 4.55x latency improvement compared to the approach without memory partitioning. Moreover, memory merging saves up to 44.32% of block RAM (BRAM). |