Session 1A: New Challenges in High Level Synthesis

1A-1 (Time: 10:15 - 10:40)

Title Variability-Driven Module Selection with Joint Design Time Optimization and Post-Silicon Tuning
Author Feng Wang, *Xiaoxia Wu, Yuan Xie (Pennsylvania State Univ., USA)
Abstract Increasing delay and power variation are significant challenges to the designers as technology scales to deep sub-micron (DSM) regime. Traditional module selection techniques in high level synthesis use worst case delay/power information to perform the optimization, and therefore may be too pessimistic. In this paper, we propose a module selection algorithm that combines design-time optimization with postsilicon tuning (using adaptive body biasing) to maximize design yield. Fast efficient performance and power yield gradient computation is developed. The post silicon optimization is formulated as an efficient sequential conic programming to determine the optimal body bias distribution, which in turn affects design-time module selection. To the best of our knowledge, this is the first variability-driven high level synthesis technique that considers post-silicon tuning during design time optimization.
Slides

1A-2 (Time: 10:40 - 11:05)

Title Behavioral Synthesis with Activating Unused Flip-Flops for Reducing Glitch Power in FPGA
Author *Cheng-Tao Hsieh (Nat'l Tsing Hua Univ., Taiwan), Jason Cong, Zhiru Zhang (Univ. of California, Los Angeles, USA), Shih-Chieh Chang (Nat'l Tsing Hua Univ., Taiwan)
Abstract In this paper we discuss optimizing the interconnect power of designs implemented in FPGA platforms. In particular, we reduce the glitch power on interconnects associated with the output of functional units in a design. The idea is to activate unused flip-flops to block the propagation of glitches, which takes advantage of the abundant flip-flops in modern FPGA structures. Since the activation of additional flip-flops may cause data hazard problems, we develop several effective behavioral synthesis techniques to prevent such data hazards. We also study the optimality of our techniques. The experimental results show that on average, our methods lead to a 28% reduction in dynamic power in the Xilinx Virtex-II platform.
Slides

1A-3 (Time: 11:05 - 11:30)

Title A Multicycle Communication Architecture and Synthesis Flow for Global Interconnect Resource Sharing
Author Wei-Sheng Huang, Yu-Ru Hong, Juinn-Dar Huang, *Ya-Shih Huang (Nat’l Chiao Tung Univ., Taiwan)
Abstract In deep submicron technology, wire delay is no longer negligible and is gradually dominating the system latency. Some state-of-the-art architectural synthesis flows adopt the distributed register (DR) architecture to cope with this increasing latency. The DR architecture, though allows multicycle communication, introduces extra overhead on interconnect resource. In this paper, we propose the Regular Distributed Register - Global Resource Sharing (RDR-GRS) architecture to enable global sharing of interconnects and registers. Based on the RDR-GRS architecture, we further define the channel and register allocation problem as a path scheduling problem of data transfers. A formal and flexible formulation of this problem is then presented and optimally solved by Integer Linear Programming (ILP). Experimental results show that RDR-GRS/ILP can averagely reduce 58% wires and 35% registers compared to the previous work.
Slides

1A-4 (Time: 11:30 - 11:55)

Title Scheduling with Integer Time Budgeting for Low-Power Optimization
Author Wei Jiang, Zhiru Zhang, Miodrag Potkonjak, *Jason Cong (Univ. of California, Los Angeles, USA)
Abstract In this paper we present a mathematical programming formulation of the integer time budgeting problem for directed acyclic graphs. In particular, we formally prove that our constraint matrix has a special property that enables a polynomial-time algorithm to solve the problem optimally with guaranteed integral solution. Our theory can be directly applied to solve a scheduling problem in behavioral synthesis with the objective of minimizing the system power consumption. Given a set of scheduling constraints and a collection of convex power-delay tradeoff curves for each type of operation, our scheduler can intelligently schedule the operations to appropriate clock cycles and simultaneously select the module implementations that lead to low-power solutions. Experiments demonstrate that our proposed technique and produce near-optimal results (within 6% of the optimum by the ILP formulation), but with 40x+ speedup.
Slides

1A-5 (Time: 11:55 - 12:08)

Title REWIRED - Register Write Inhibition by Resource Dedication
Author *Pushkar Tripathi, Rohan Jain (Indian Inst. of Tech. Delhi, India), Srikanth Kurra (Oracle, India), Preeti Ranjan Panda (Indian Inst. of Tech. Delhi, India)
Abstract We propose REWIRED (REgister Write Inhibition by REsource Dedication), a technique for reducing power during high level synthesis (HLS) by selectively inhibiting the storage of function unit (FU) output data into registers. Registers are generally inferred in HLS when data produced in one clock cycle is used in a later cycle. However, when it can be established that the input registers to an FU are not changing values during a certain period, the outputs during this period can be directly read off the FU output pins without needing to store them in registers. When the life-times of such data are short, it may be possible to completely eliminate the register storage operation, thereby reducing power. We present a genetic algorithm formulation and a heuristic for maximizing the number of register stores that can be inhibited in a scheduled data flow graph (DFG) during behavioral synthesis.
Slides

1A-6 (Time: 12:08 - 12:21)

Title An Efficient Performance Improvement Method Utilizing Specialized Functional Units in Behavioral Synthesis
Author *Tsuyoshi Sadakata, Yusuke Matsunaga (Kyushu Univ., Japan)
Abstract This paper proposes a novel Behavioral Synthesis method that improves a performance of synthesized circuits utilizing specialized functional units efficiently. Almost all conventional methods can not utilize specialized functional units efficiently under a total area constraint because of their less flexibility for resource sharing. With proposed method, module selection, scheduling, and allocation problems under a total area constraint with specialized functional units can be solved in practical time. Experimental results show that proposed method has achieved up to 35 % and on average 14 % reduction of the number of cycles in practical time.
Slides
Last Updated on: January 31, 2008