ASP-DAC 2012 Technical Program

The 17th Asia and South Pacific Design Automation Conference

Session 3B High-Level Synthesis
Time: 10:40 - 12:20 Wednesday, February 1, 2012
Location: Room 203
Chairs: Nagisa Ishiura (Kwansei Gakuin University, Japan), Shigeru Yamashita (Ritsumeikan University, Japan)

3B-1 (Time: 10:40 - 11:05)

Title	Performance-Driven Register Write Inhibition in High-Level Synthesis under Strict Maximum-Permissible Clock Latency Range
Author	*Keisuke Inoue, Mineo Kaneko (Japan Advanced Institute of Science and Technology, Japan)
Page	pp. 239 - 244
Keyword	high-level synthesis, register write inhibition, FU binding, register binding
Abstract	Clock skew scheduling is a process of assigning intentional clock skews to registers for improving circuit performance and reliability. Due to the recent large effect of process variations, it becomes more and more difficult to reliably implement a large set of arbitrary clock latencies. Consequently, the optimization potential of clock skew scheduling should be highly limited. This paper points out that there is a chance to achieve further improvement of circuit performance by removing some register-writes while preserving reliability. This paper is the first work of the clock skew-aware high-level synthesis framework considering register write inhibition to minimize the clock period. A network flow-based heuristic algorithm to obtain the minimum clock period is presented and evaluated by experiments, which supports the effectiveness of the approach.

3B-2 (Time: 11:05 - 11:30)

Title	Clock Period Minimization with Minimum Area Overhead in High-Level Synthesis of Nonzero Clock Skew Circuits
Author	*Wen-Pin Tu, Shih-Hsu Huang, Chun-Hua Cheng (Chung Yuan Christian University, Taiwan)
Page	pp. 245 - 250
Keyword	Clock Skew Optimization, Clock Period Minimization, High-Level Synthesis, Resource Binding, Area Minimization
Abstract	Although clock skew can be utilized to reduce the clock period, the utilization of clock skew also limits the sharing of resources (including registers and functional units). Previous works have considered the influence of clock arrival times on register sharing, but they do not pay any attention to the influence of clock arrival times on functional unit sharing. As a result, extra functional units are often required during functional unit binding. Based on that observation, in this paper, we perform the simultaneous application of register binding and functional unit binding for the high-level synthesis of nonzero clock skew circuits. Our objective is to minimize the circuit area for working with the lower bound of the clock period. Compared with previous works, benchmark data show that our approach can achieve the lower bound of the clock period with a smaller area overhead.

3B-3 (Time: 11:30 - 11:55)

Title	Clock-Constrained Simultaneous Allocation and Binding for Multiplexer Optimization in High-Level Synthesis
Author	*Yuko Hara-Azumi, Hiroyuki Tomiyama (Ritsumeikan University, Japan)
Page	pp. 251 - 256
Keyword	High-level synthesis, Allocation, Binding, Multiplexer, Clock constraint
Abstract	This paper proposes a novel simultaneous allocation and binding method in high-level synthesis, which minimizes the circuit area including multiplexers (MUXs) under a clock constraint. Most existing works on binding minimize MUXs under given allocation by minimizing the number of interconnections, but do not care where the MUXs would be inserted in a circuit. As a result, they cannot guarantee the required clock frequency and often violate the clock constraint. On the contrary, our work globally optimizes binding and allocation for FUs and registers while meeting the clock constraint by considering where MUXs would be inserted. Our work is formulated as an ILP problem. Also, an effective ILP-based heuristic for non-small designs is presented. Experimental results demonstrate that our work satisfies the clock constraint with the minimum circuit area.

3B-4 (Time: 11:55 - 12:20)

Title	An Integrated and Automated Memory Optimization Flow for FPGA Behavioral Synthesis
Author	*Yuxin Wang (Computer Science Department, Peking University and UCLA/PKU Joint Research Institute in Science and Engineering, China), Peng Zhang (Computer Science Department, University of California, Los Angeles, U.S.A.), Xu Cheng (Computer Science Department, Peking University, China), Jason Cong (Computer Science Department, University of California, Los Angeles and UCLA/PKU Joint Research Institute in Science and Engineering, U.S.A.)
Page	pp. 257 - 262
Keyword	Behavioral Synthesis, Memory Partitioning, Memory Merging
Abstract	Behavioral synthesis tools have made significant progress in compiling high-level programs into register-transfer level (RTL) specifications. But manually rewriting code is still necessary in order to obtain better quality of results in memory system optimization. In recent years different automated memory optimization techniques have been proposed and implemented, such as data reuse and memory partitioning, but the problem of integrating these techniques into an applicable flow to obtain a better performance has become a challenge. In this paper we integrate data reuse, loop pipelining, memory partitioning, and memory merging into an automated optimization flow (AMO) for FPGA behavioral synthesis. We develop memory padding to help in the memory partitioning of indices with modulo operations. Experimental results on Xilinx Virtex-6 FPGAs show that our integrated approach can gain an average 5.8x throughput and 4.55x latency improvement compared to the approach without memory partitioning. Moreover, memory merging saves up to 44.32% of block RAM (BRAM).