# OCV Guided Clock Tree Topology Reconstruction

ASP-DAC 2018 Necati Uysal and Rickard Ewetz Department of Electrical and Computer Engineering University of Central Florida



#### Overview

- Preliminaries
- Previous studies
- Proposed techniques
- Experimental results

#### Timing constraints and timing slack

$$setup\_slack_{ij} = T - t_i^{CQ} - t_{ij}^{max} - t_j^{S} + t_j - t_i - \delta_j - \delta_i$$
$$hold\_slack_{ij} = t_i^{CQ} + t_{ij}^{min} - t_j^{H} + t_i - t_j - \delta_j - \delta_i$$

Delay variations introduced by OCV  $\delta_i = c_{OCV} \cdot t_{CCA(i,j),i}$  $\delta_j = c_{OCV} \cdot t_{CCA(i,j),j}$ 



#### Leaf buffer slack graph (LB-SG) LB-SG: а b b а $w_{ji} = hold\_slack_{ij}$ Combinational **SG:** $w_{ij} = setup\_slack_{ij}$ FF<sub>i</sub> FF<sub>i</sub> logic



[10] J. Lu and B. Taskin. Post-CTS clock skew scheduling with limited delay buffering. Cir. and Sys., p224–227, 2009.

#### Handle multiple scenarios using compression

[11] V. Ramachandran. Construction of minimal functional skew clock trees. ISPD'12, pages 119–120, 2012.
 [6] R. Ewetz and C.-K. Koh. MCMM clock tree optimization based on slack redistribution using a reduced slack graph. ASP-DAC '16, pages 366 – 371, 2016.

#### LP Formulation



$$\begin{split} n \sum_{k \in V} c_{in} \Delta_k + c_{wns} \ pWNS + c_{tns} \ pTNS \\ + \ c_{ocv}) \Delta_i - (1 - c_{ocv}) \Delta_j - s_{ij} \leq w_{ij}, \\ s_{ij} \leq pWNS, \\ \sum_{s_{ij} \in E} s_{ij} = pTNS, \\ \text{Timing violation: } s_{ij} \geq 0 \\ \text{predicted WNS: } pWNS \\ \text{predicted TNS: } pTNS \end{split}$$

[11] V. Ramachandran. Construction of minimal functional skew clock trees. ISPD'12, pages 119–120, 2012.[3] R. Ewetz. A clock tree optimization framework with predictable timing quality. DAC'17, pages 13–18, 2017

## Summary of Previous works

**TNS** reduction



[3] R. Ewetz. A clock tree optimization framework with predictable timing quality. DAC'17, pages 13–18, 2017 [10] J. Lu and B. Taskin. Post-CTS clock skew scheduling with limited delay buffering. In Intr. Midwest Sym. on Cir. and Sys., pages 224–227, 2009.

[11] V. Ramachandran. Construction of minimal functional skew clock trees. ISPD'12, pages 119–120, 2012. [12] S. Roy, P. M. Mattheakis, L. Masse-Navette, and D. Z. Pan. Clock tree resynthesis for multi-corner multimode timing closure. IEEE TCAD, pages 589–602, 2015.

#### How to reduce pWNS and pTNS?

- Tree topology reconstruction to realize negative delay adjustments [12].
- (this presentation): OCV Guided Clock Tree Topology Reconstruction.



[12] S. Roy, P. M. Mattheakis, L. Masse-Navette, and D. Z. Pan. Clock tree resynthesis for multi-corner multi-mode timing closure. IEEE TCAD, pages 589–602, 2015 (best paper at ISPD 2013).

#### Three types of topology changes



OCV Guided tree topology construction!

Distance in topology:
1. Closer
2. Further
3. Same

OCV impact: Reduced Increased Unchanged

### Predicted Leaf Buffer Slack Graph (pLB-SG)



pWNS is bounded by strongly connected component (SCC) in pLB-SG! S: Constraints in the SCC L: LBs connected in SCC Red circle L = {3,4}

#### Improving pWNS!

Candidates  $(b_p, b_c)$ Reduce delay variations in S (Potential to improve pWNS)  $\Delta\delta$  is change of delay variations in S





#### Identify LBs to be placed closer in the topology

- 1. Remove edges that are larger than pWNS
- 2. Detect SCC using two DFS [2]



[2] T. H. Cormen, C. Stein, R. L. Rivest, and C. E. Leiserson. Intro. to Algorithms. McGraw-Hill Higher Education, 2001.

#### Enumeration of candidates

Condition:

(i) Generate pairs in Pi and Pj (ii) Check timing requirement  $(t_{b_c} = t_{b_p} + t_{b_p b_c})$ 

 $t_{b_pb_c}$  Estimated using linear delay model Apply pairwise to buffers in L



pWNS is not guaranteed to be improved. Careful evaluation is required!

#### Evaluation of candidates

- Accurate evaluation of  $m=(b_p, b_c)$ 
  - Make topology change
  - Update timing
  - Find pWNS and pTNS by solving LP
  - Evaluate:

$$cost(m) = c_{cap}r(m) + c_{in}pCost(m) + c_{wns}pWNS(m) + c_{tns}pTNS$$

Cost of connecting  $b_p$  to  $b_c$ 

#### Only drawback is long run-time!

#### Two phase-evaluation

- Rank all using fast metric
- Evaluate top-k candidates with accurate metric
- Fast evaluation
  - Cost(m) =  $t_{b_c}^{pre} t_{b_p}^{post}$
  - (i) Short buffer chain
  - (ii) Closer in topology





### **Experimental Setup**

- Open cores Verilog spec. synthesized using Synopsys tool chain.
- Clock trees obtained after CTS
- Evaluation in TNS and WNS
  - Nominal timing computed in each scenario (NGSPICE simulations)
  - OCV applied with  $c_{OCV} = 0.10$

| Name       | Scenarios<br>(num) | Modes<br>(num) | Corners<br>(num) | Sinks<br>(num) | Skew<br>constraints<br>(num) |
|------------|--------------------|----------------|------------------|----------------|------------------------------|
| fpu        | 9                  | 7              | 3                | 715            | 213225                       |
| pci_bridge | 9                  | 7              | 3                | 3582           | 1113894                      |
| ecg        | 9                  | 7              | 3                | 7674           | 798082                       |
| des3       | 9                  | 7              | 3                | 8808           | 154364                       |
| aes        | 9                  | 7              | 3                | 13216          | 637936                       |

[4] R. Ewetz, S. Janarthanan, and C.-K. Koh. Benchmark circuits for clock scheduling and synthesis. https://purr.purdue.edu/publications/1759, 2015.

#### Evaluated clock tree structures

- Pre-CTO
- CTO-P [6]
- CTO-R [3]
- OGR
- OGR-CTO

Initial clock tree Clock tree after CTO in [6] Clock tree after CTO in [3] After pTNS and pWNS optimization OGR after CTO [3]

### On circuit aes



#### Results

| Circuit<br>(name) | Method    | TNS<br>(ps) | WNS<br>(ps) | pTNS<br>(ps) | pWNS<br>(ps) | Cap<br>(pF) | Run-time<br>(min) |
|-------------------|-----------|-------------|-------------|--------------|--------------|-------------|-------------------|
| fpu               | Pre-CTO   | 791         | 44          | 0            | 0            | 3.23        | 4                 |
|                   | CTO-P [6] | 0           | 0           |              |              | 3.64        | 8                 |
|                   | CTO-R [3] | 0           | 0           |              |              | 3.57        | 8                 |
|                   | OGR       | n/a         | n/a         | n/a          | n/a          | n/a         | n/a               |
|                   | OGR-CTO   | 0           | 0           |              |              | 3.57        | 4                 |
| Norm.             | Pre-CTO   | 719         | 41          | 52           | 11           | 10.42       | 9                 |
|                   | CTO-P [6] | 93          | 15          |              |              | 11.10       | 28                |
|                   | CTO-R [3] | 75          | 11          |              |              | 11.06       | 40                |
|                   | OGR       | 20178       | 34          | 0            | 0            | 10.26       | 17                |
|                   | OGR-CTO   | 0           | 0           |              |              | 11.03       | 22                |

#### Results cont.

| Circuit<br>(name) | Method    | TNS<br>(ps) | WNS<br>(ps) | pTNS<br>(ps) | pWNS<br>(ps) | Cap<br>(pF) | Run-time<br>(min) |
|-------------------|-----------|-------------|-------------|--------------|--------------|-------------|-------------------|
| ecg               | Pre-CTO   | 2603        | 44          | 0            | 0            | 16.76       | 33                |
|                   | CTO-P [6] | 6           | 2           |              |              | 17.59       | 27                |
|                   | CTO-R [3] | 0           | 0           |              |              | 17.69       | 14                |
|                   | OGR       | n/a         | n/a         | n/a          | n/a          | n/a         | n/a               |
|                   | OGR-CTO   | 0           | 0           |              |              | 17.69       | 15                |
| des3              | Pre-CTO   | 28511       | 99          | 19535        | 43           | 81.79       | 32                |
|                   | CTO-P [6] | 29761       | 79          |              |              | 98.74       | 254               |
|                   | CTO-R [3] | 20282       | 47          |              |              | 88.71       | 173               |
|                   | OGR       | 24658       | 100         | 3250         | 32           | 81.97       | 30                |
|                   | OGR-CTO   | 17281       | 35          |              |              | 90.87       | 193               |

#### Results cont.

| Circuit<br>(name) | Method    | TNS<br>(ps) | WNS<br>(ps) | pTNS<br>(ps) | pWNS<br>(ps) | Cap<br>(pF) | Run-time<br>(min) |
|-------------------|-----------|-------------|-------------|--------------|--------------|-------------|-------------------|
| aes               | Pre-CTO   | 7895        | 52          | 5036         | 31           | 32.50       | 30                |
|                   | СТО-Р [6] | 6716        | 41          |              |              | 36.58       | 94                |
|                   | CTO-R [3] | 4950        | 33          |              |              | 34.57       | 74                |
|                   | OGR       | 10246       | 83          | 2294         | 15           | 33.49       | 46                |
|                   | OGR-CTO   | 2747        | 20          |              |              | 36.49       | 78                |
| Norm.             | Pre-CTO   | 0%          | 0%          | 0%           | 0%           | 1.00        |                   |
|                   | CTO-P [6] | 59%         | 59%         |              |              | 1.12        |                   |
|                   | CTO-R [3] | 74%         | 71%         |              |              | 1.07        |                   |
| [                 | OGR       | -73%        | -14%        | 79%          | 59%          | 1.01        |                   |
|                   | OGR-CTO   | 80%         | 84%         |              |              | 1.09        |                   |

#### Summary

- Improve pWNS and pTNS using OCV guided clock tree topology reconstruction
- Better and faster topology changes?