Title | An Efficient Hybrid Engine to Perform Range Analysis and Allocate Integer Bit-widths for Arithmetic Circuits |
Author | *Yu Pang (Chongqing University of Posts and Telecommunications, China), Katarzyna Radecka, Zeljko Zilic (McGill University, Canada) |
Page | pp. 455 - 460 |
Keyword | arithmetic circuits, range analysis, SMT, arithmetic transform, fixed-point synthesis |
Abstract | Range analysis is an important task in obtaining the
correct, yet fast and inexpensive arithmetic circuits. The
traditional methods, either simulation-based or static, have the
disadvantage of low efficiency and coarse bounds, which may lead
to unnecessary bits. In this paper, we propose a new method that
combines several techniques to perform fixed-point range analysis
in a datapath towards obtaining the much tighter ranges efficiently.
We show that the range and the bit-width allocation can be
obtained with better results relative to the past methods, and in
significantly shorter time. |
Slides |
Title | Register Pressure Aware Scheduling for High Level Synthesis |
Author | *Rami Beidas, Wai Sum Mong, Jianwen Zhu (University of Toronto, Canada) |
Page | pp. 461 - 466 |
Keyword | Phase Coupling, Scheduling, Register Pressure, Area Optimization |
Abstract | Variations of list scheduling became the de-facto standard of scheduling straight line code in software compilers, a trend faithfully inherited by high-level synthesis solutions. Due to its nature, list scheduling is oblivious of the tightly coupled register pressure; a dangling fundamental problem that has been attacked by the compiler community for decades, and which results, in case of high-level synthesis, in excessive instantiations of registers and accompanying steering logic.
To alleviate this problem, we propose a synthesis framework called "soft scheduling", which acts as a resource unconstrained pre-scheduling stage that restricts subsequent scheduling to minimize register pressure. This optimization objective is formulated as a live range minimization problem, a measure shown to be proportional to register pressure, and optimally solved in polynomial time using minimum cost network flow formulation. Unlike past solutions in the compiler community, which try to reduce register pressure by local serialization of subject instructions, the proposed solution operates on the entire basic block or hyperblock and systematically handles instruction chaining subject to the same objective.
The application of the proposed solution to a set of real-life benchmarks results in a register pressure reduction ranging, on average, between 11% and 41% depending on the compilation and synthesis configurations with minor 2% to 4% increase in schedule latency. |
Title | Parallel Cross-Layer Optimization of High-Level Synthesis and Physical Design |
Author | *James Williamson (ECEE Dept., University of Colorado at Boulder, U.S.A.), Yinghai Lu (EECS Dept., Northwestern University, U.S.A.), Li Shang (ECEE Dept., University of Colorado at Boulder, U.S.A.), Hai Zhou (EECS Dept., Northwestern University, U.S.A.), Xuan Zeng (State Key Lab of ASIC & System, Microelectronics Dept., Fudan University, China) |
Page | pp. 467 - 472 |
Keyword | cross-layer optimization, parallel CAD, GPGPU, heterogeneous architectures, parallelization |
Abstract | Integrated circuit (IC) design automation has traditionally followed a hierarchical approach. Modern IC design flow is divided into sequentially-addressed design and optimization layers; each successively finer in design detail and data granularity while increasing in computational complexity. Eventual agreement across the design layers signals design closure. Obtaining design closure is a continual problem, as lack of awareness and interaction between layers often results in multiple design flow iterations. In this work, we propose parallel cross-layer optimization, in which the boundaries between design layers are broken, allowing for a more informed and efficient exploration of the design space. We leverage the heterogeneous parallel computational power in current and upcoming multi-core/many-core computation platforms to suite the heterogeneous characteristics of multiple design layers. Specifically, we unify the high-level and physical synthesis design layers for parallel cross-layer IC design optimization. In addition, we introduce a massively-parallel GPU floorplanner with local and global convergence test as the proposed physical synthesis design layer. Our results show average performance gains of 11X speed-up over state-of-the-art. |
Slides |
Title | Network Flow-based Simultaneous Retiming and Slack Budgeting for Low Power Design |
Author | Bei Yu, Sheqin Dong, *Yuchun Ma, Tao Lin, Yu Wang (Tsinghua University, China), Song Chen, Satoshi GOTO (Waseda University, Japan) |
Page | pp. 473 - 478 |
Keyword | Retiming, Slack Budgeting, Network Flow, Low Power |
Abstract | Low power design has become one of the most significant requirements when CMOS technology entered the nanometer era. Therefore, timing budget is often performed to slow down as many components as possible so that timing slacks can be applied to reduce the power consumption while maintaining the performance of the whole design. Retiming is a procedure that involves the relocation of flip-flops (FFs) across logic gates to achieve faster clocking speed. In this paper we show that the retiming and slack budgeting problem can be formulated to a convex cost dual network flow problem. Both the theoretical analysis and experimental results show the efficiency of our approach which can not only reduce power consumption but also speedup previous work. |
Slides |