# Latency Constraint Guided Buffer Sizing and Layer Assignment for Clock Trees with Useful Skew

Necati Uysal<sup>\*</sup>, Wen-Hao Liu<sup>+</sup>, Rickard Ewetz<sup>\*</sup> University of Central Florida<sup>\*</sup>, Cadence Design Systems<sup>+</sup>

## Outline

- Introduction
- Preliminaries
- Proposed Techniques
- Experimental Results





#### Timing Constraints

### Clock Tree Synthesis Steps



## Traditional Clock Tree Optimization

• Steps:

1.

2.



## LP Formulation & Predicted Timing Quality [2,3]



[2] Jianchao Lu and Baris Taskin, "Post-CTS Clock Skew Scheduling with Limited Delay Buffering", Proceedings of the IEEE International Conference on Midwest Circuits and Systems (MWSCAS), August 2009, pp. 224--227.

[3] Rickard Ewetz. 2017. A Clock Tree Optimization Framework with Predictable Timing Quality (DAC'17). 13–18.

## Buffer Sizing and Layer Assignment: Van Ginneken's Algorithm [4] **Buffer Options Objective:** Satisfy maximum latency Layer Options constraint with minimum cost Pruning S 0 Candidates at sink nodes c<sub>k</sub> =(latency ,capacitance, cost) c<sub>sink</sub> =(0,load capacitance, load capacitance)

[4] L. P. P. van Ginneken. 1990. Buffer placement in distributed RC-tree network for minimal Elmore delay. In Proc. IEEE Int. Symp. Circuits Syst. 865–868.

[5] J. Lillis, Chung-Kuan Cheng, and T. T. Y. Lin. 1996. Optimal wire sizing and buffer insertion for low power and a generalized delay model. IEEE Journal of Solid-State Circuits 31, 3 (1996).

8

#### Van Ginneken's Algorithm cont.



#### Previous Works

| Reference | Discrete/Continuous<br>Buffer Sizes | Skew Type |
|-----------|-------------------------------------|-----------|
| [6]       | Continuous                          | Zero      |
| [7]       | Discrete                            | Zero      |
| [8]       | Discrete                            | Bounded   |
| [9]       | Continuous                          | Useful    |
| [10]      | Continuous                          | Useful    |
| This work | Discrete                            | Useful    |
|           |                                     |           |

[6] Krit Athikulwongse, Xin Zhao, and Sung Kyu Lim. 2010. Buffered Clock Tree Sizing for Skew Minimization Under Power and Thermal Budgets (ASP-DAC'10).
[7] Jeng-Liang Tsai, Tsung-Hao Chen, and C. C. P. Chen. 2004. Zero skew clock-tree optimization with buffer insertion/sizing and wire sizing. TCAD 23, 4 (2004),
[8]Logan Rakai et al. 2013. Buffer Sizing for Clock Networks Using Robust Geometric Programming Considering Variations in Buffer Sizes (ISPD'13).

[9]Matthew R. Guthaus, Dennis Sylvester, and Richard B. Brown. 2006. Clock Buffer and Wire Sizing Using Sequential Programming (DAC'06).[10]Kai Wang and Malgorzata Marek-Sadowska. 2004. Buffer Sizing for Clock Power Minimization Subject to General Skew Constraints (DAC'04).

## Problem Statement

- Clock Tree Optimization
  - Objective : Minimize timing violations, amount of delay insertion and total capacitance
- Inputs
  - A constructed clock tree
  - Discrete buffers and layers
- Constraints:
  - Skew (timing) constraints
  - Slew



#### Proposed Solution : BLU Framework

- Method: Delay adjustments are translated into latency constraints such that van ginneken's algorithm can realize delay adjustments.
- Feature I : To further reduce total capacitance
- Feature II: To realize negative delay adjustments

### Baseline of the BLU Framework





## Feature II: Improving predicted timing quality



## Methodology



[11] Rickard Ewetz and Cheng-Kok Koh. 2018. Scalable Construction of Clock Trees with Useful Skew and High Timing Quality. TCAD (2018).

[12] S. Held et al. 2003. Clock scheduling and clock tree construction for high performance ASICs (ICCAD'03). 232–239.

## **Experimental Setup**

- Benchmarks are synthesized by Synopsis DC & ICC
- CTS Engine in [11] is used to synthesize initial USTs.
- Buffer & Wire Library
- On-chip variations
- Evaluations in timing
  - NGSPICE simulations
- Evaluations in capacitance
  - Total capacitance

[11] Rickard Ewetz and Cheng-Kok Koh. 2018. Scalable Construction of Clock Trees with Useful Skew and High Timing Quality. TCAD (2018).

| Circuit       | #Sinks | #Skew<br>Constraints |
|---------------|--------|----------------------|
| scaled s1423  | 74     | 78                   |
| scaled s5378  | 179    | 175                  |
| scaled s15850 | 597    | 318                  |
| msp           | 683    | 44990                |
| fpu           | 715    | 16263                |
| usbf          | 1765   | 33438                |
| dma           | 2092   | 132834               |
| pci bridge32  | 3578   | 141074               |
| ecg           | 7674   | 63440                |
| des peft      | 8808   | 17152                |
| eht           | 10544  | 450762               |
| aes           | 13216  | 53382                |

## Evaluated Tree Structures

- UST After CTS (Useful Skew Tree Sythesis)
- UST-CTO Tree structure after CTO is applied to UST
- UST-P Tree structure after BLU Framework with point constraints is applied to UST
- UST-P-CTO Tree structure after CTO is applied to UST-P
- UST-R Tree structure after BLU Framework with range constraints is applied to UST
- UST-R-CTO Tree structure after CTO is applied to UST-R
- UST-RT Tree structure after applying UST-R structure combined with realizing negative delay adjustments.
- UST-RT-CTO Tree structure after CTO is applied to UST-RT

## Results (only non-negative delay adjustments)

| Circuit | Cap (pF) |             |       |               | Run-Time (pF) |               |      |             |       |               |           |               |
|---------|----------|-------------|-------|---------------|---------------|---------------|------|-------------|-------|---------------|-----------|---------------|
| (name)  | UST      | UST-<br>CTO | UST-P | UST-P-<br>CTO | UST-<br>R     | UST-R-<br>CTO | UST  | UST-<br>CTO | UST-P | UST-P-<br>CTO | UST-<br>R | UST-R-<br>CTO |
| msp     | 1.41     | 1.41        | 1.35  | 1.35          | 1.20          | 1.20          | 0.0  | 0.0         | 0.5   | 0.0           | 3.7       | 0.1           |
| fpu     | 1.60     | 1.60        | 1.52  | 1.52          | 1.35          | 1.35          | 0.0  | 0.2         | 0.7   | 0.0           | 1.2       | 0.0           |
| usbf    | 4.55     | 4.55        | 4.14  | 4.14          | 4.07          | 4.07          | 1.0  | 0.2         | 0.4   | 0.2           | 2.4       | 0.2           |
| dma     | 5.06     | 5.17        | 4.49  | 4.65          | 4.44          | 4.56          | 1.0  | 2.1         | 1.1   | 2.5           | 10.3      | 2.1           |
| ecg     | 23.44    | 23.66       | 20.39 | 20.96         | 20.54         | 20.84         | 8.0  | 11.3        | 1.7   | 15.5          | 5.0       | 12.8          |
| s15850  | 18.09    | 18.85       | 15.84 | 16.86         | 15.77         | 16.62         | 0.2  | 2.3         | 0.3   | 14.7          | 0.2       | 2.0           |
| Norm.   | 0.99     | 1.00        | 0.89  | 0.90          | 0.87          | 0.87          | 0.30 | 1.00        | 0.60  | 1.10          | 1.20      | 1.70          |

UST: Useful Skew Tree. UST-P: BLU structure with point constraints. UST-R: BLU structure with range constraints.

## Results (with strict timing constraints)

| Circuit<br>(name) | Structure<br>(name) | TNS<br>(ps) | WNS<br>(ps) | P <sub>tns</sub><br>(ps) | P <sub>wns</sub><br>(ps) | Cap<br>(pF) | Run-<br>time<br>(min) |
|-------------------|---------------------|-------------|-------------|--------------------------|--------------------------|-------------|-----------------------|
| aes               | UST                 | 16041       | 32          | 7095                     | 14                       | 112.1       | 24.9                  |
|                   | UST-CTO             | 8448        | 18          | 7950                     | 15                       | 121.6       | 139.0                 |
|                   | UST-R               | 36367       | 47          | 4478                     | 13                       | 97.8        | 9.2                   |
|                   | UST-R-CTO           | 5697        | 19          | 5186                     | 15                       | 111.8       | 73.5                  |
|                   | UST-RT              | 15685       | 36          | 2636                     | 11                       | 102.2       | 56.5                  |
|                   | UST-RT-CTO          | 3569        | 16          | 3330                     | 14                       | 113.4       | 189.2                 |
| Norm.             | UST                 | 3.56        | 3.04        | 1.00                     | 1.00                     | 0.91        | 0.13                  |
|                   | UST-CTO             | 1.00        | 1.00        | 4.39                     | 1.12                     | 1.00        | 1.00                  |
|                   | UST-R               | 6.33        | 3.84        | 0.77                     | 0.88                     | 0.79        | 0.26                  |
|                   | UST-R-CTO           | 0.71        | 1.13        | 1.17                     | 1.25                     | 0.90        | 0.95                  |
|                   | UST-RT              | 4.24        | 3.19        | 0.44                     | 0.63                     | 0.85        | 0.60                  |
|                   | UST-RT-CTO          | 0.42        | 0.80        | 0.53                     | 0.73                     | 0.95        | 1.21                  |

UST: Useful Skew Tree. UST-R: BLU structure with range constraints. UST-RT: UST-R & negative delay adjustments.

## Summary

- BLU Framework
  - handling discrete buffer sizes
  - layer assignments
  - utilizing useful skew
  - reducing capacitive cost

## Questions ?

• Thank you!