# Clustering of Flip-Flops for Useful-Skew Clock Tree Synthesis

- Chuan Yean Tan Rickard Ewetz
- Cheng-Kok Koh

- Purdue University
- University of Central Florida
- Purdue University



### Outline

- MBFF Introduction
- Previous MBFF/Clustering Solutions
- Bounded Arrival Time Useful Skew Tree
- Clustering of Flip-Flops Mechanics
- Proposed Methodology
- Evaluation & Experimental Results
- Summary

### Outline

#### MBFF Introduction

- Previous MBFF/Clustering Solutions
- Bounded Arrival Time Useful Skew Tree
- Clustering of Flip-Flops Mechanics
- Proposed Methodology
- Evaluation & Experimental Results
- Summary

### Multi-Bit Flip Flop (MBFF)



| No. of Bits   | 1-Bit FF | 2-Bit FF | 4-Bit FF |
|---------------|----------|----------|----------|
| Power Per Bit | 1.00     | o.86     | 0.78     |
| Area Per Bit  | 1.00     | 0.96     | 0.71     |

### Benefits of utilizing MBFF

- Reduce power consumption
- Reduce routing complexity and clock tree complexity, less elements to connect leads to less routing complexity

| No. of Bits                    | 1-Bit FF | 2-Bit FF | 4-Bit FF |  |
|--------------------------------|----------|----------|----------|--|
| Power Per Bit                  | 1.00     | o.86     | 0.78     |  |
| Area Per Bit                   | 1.00     | 0.96     | 0.71     |  |
| MBFF Power And Area Savings[3] |          |          |          |  |

[3] L.Chen et al. Using MBFF for clock power saving by design compiler. – Proceedings of Synopsys User Group, 2010 <sup>5</sup>

### Outline

- MBFF Introduction
- Previous MBFF/Clustering Solutions
- Bounded Arrival Time Useful Skew Tree
- Clustering of Flip-Flops Mechanics
- Proposed Methodology
- Evaluation & Experimental Results
- Summary

#### Previous Works Summary

#### Key Methodologies of Previous Works:

- Timing-Driven Intersecting of Feasible Regions[14]
- Maximum Independent Set based on heuristic approach [2]
- x- and y- interval graphs to determine maximum clique [9]
- K-means Algorithm [15,10]

[2] Y.T.Chang et al. Post-placement power optimization with multi-bit flip-flops - ICCAD 2010
[9] I.H.R. Jiang et al. INTEGRA: Fast multi-bit flip-flop cluster for clock power saving. - IEEE Transactions CAD of I.C.S. 2015
[10] A.B.Khang et al. Improve flop tray-based design implementation for power reduction. - ICCAD 2016
[14] S.H.Wang et al. Power-driver flip-flop merging and relocation. - IEEE Transactions CAD of I.C.S 2012
[15] G.Wu et al. Flip-Flop clustering by weighted k-means algorithm. - DAC 2016



Timing-Driven Clustering of Flip-Flops



#### Analysis of Previous Works

- Previous work cluster flip-flop without consideration of clock tree synthesis (CTS)
- Timing slacks are obtained pre-CTS, timing does not correlate as accurately after CTS
- Most modern designs adopt useful-skew tree to reduce area and power consumption
- However, once the UST has been built, updating or displacing FF's placement is expensive!

### Outline

- MBFF Introduction
- Previous MBFF/Clustering Solutions
- Bounded Arrival Time Useful Skew Tree
- Clustering of Flip-Flops Mechanics
- Proposed Methodology
- Evaluation & Experimental Results
- Summary



Setup Constraints:  $t_i + t_i^{CQ} + t_{ij}^{max} + t_j^{S} \le t_j + T$ Hold Constraints:  $t_i + t_i^{CQ} + t_{ij}^{min} \ge t_j + t_j^{H}$  Setup Constraints: $t_i + t_i^{CQ} + t_{ij}^{max} + t_j^{S} \le t_j + T$ Hold Constraints: $t_i + t_i^{CQ} + t_{ij}^{min} \ge t_j + t_j^{H}$ 

**Constraints Formulation:** 

$$u_{ij} = T - t_i^{CQ} - t_{ij}^{max} - t_j^{Q}$$
$$I_{ij} = t_j^{H} - t_i^{CQ} - t_{ij}^{min}$$

**Skew Constraints:** 

$$I_{ij} \le t_i - t_j \le u_{ij}$$
$$I_{ij} \le skew_{ij} \le u_{ij}$$

#### Skew Constraint Graph (SCG)



Weighted Edges:

 $w_{12} = u_{12}$  $w_{21} = -l_{12}$ 

From the skew constraints:  $t_1 - t_2 \le u_{12}$  $t_2 - t_1 \le -l_{12}$ 

Generalize weighted constraint:  $t_i - t_i \le w_{ii}$ 

#### Useful-Skew Tree based on Arrival Time Constraints [5]



### Useful-Skew Tree based on Arrival Time Constraints [5]



#### **Arrival Time Ranges**

**Clock Tree Construction** 

### Outline

- MBFF Introduction
- Previous MBFF/Clustering Solutions
- Bounded Arrival Time Useful Skew Tree
- Clustering of Flip-Flops Mechanics
- Proposed Methodology
- Evaluation & Experimental Results
- Summary

### Clustering of FFs Mechanics



#### Displacing FFs



#### Worst Case: FF's displacement introduces increase in wirelength

#### Displacing FFs



Moving each FF will **require** update to SCG to avoid "out-dated" skew constraints!!

### Displacing FFs



Furthermore, displacing FF<sub>i</sub> does not guarantee that all skew constraints will be met after update<sub>1</sub>

## Timing Delays



#### Let $\Delta k$ = timing delay for increase in wirelength

# Timing Delays



 $\Sigma \Delta k_i$  = total timing delay increased from displacing FFs

#### Comparing FF's Arrival Time Ranges



Example of arrival time ranges derived from skew constraints

#### Merging Flip-Flop Candidates



#### **Overlapping Arrival Time Ranges Between FFs**



#### Updating New Arrival Time Range



**Fast computation** to update Arrival Time Ranges after merging FFs

### Outline

- MBFF Introduction
- Bounded Arrival Time Useful Skew Tree
- Previous MBFF/Clustering Solutions
- Clustering of Flip-Flops Mechanics
- Proposed Methodology
- Evaluation & Experimental Results
- Summary

### Proposed Methodology



#### Cluster Flip-Flops Algorithm



#### Cluster Flip-Flops Algorithm



#### Nearest Neighbor FF Selection Heuristic



#### Cluster Flip-Flops Algorithm



#### **Compute Clustering Location**



**Cluster Location** 

#### **Cluster Location Algorithm:**

Weighted FF candidate based on FF's fanout Higher fanout — Higher capacitive load

#### **Compute Clustering Location**



**Cluster Location** 

# Displacing FF with higher capacitive load will incur higher delay change

#### Cluster Flip-Flops Algorithm



#### Arrival Time Range Overlaps



#### Updating Scaled Arrival Time Ranges



Scaled Arrival Time Range update after FF has been moved

#### Cluster Flip-Flops Algorithm



## Outline

- MBFF Introduction
- Previous MBFF/Clustering Solutions
- Bounded Arrival Time Useful Skew Tree
- Clustering of Flip-Flops Mechanics
- Proposed Methodology
- Evaluation & Experimental Results
- Summary

## **Evaluation Framework**

#### Monte Carlo Framework:

- Process Variation (10%)
- Voltage Variation (15%)
- Temperature Variation (30%)

#### Tree Structure:

- Zero-Skew Tree (ZST)
- Useful-Skew Tree (UST) [5]





## Clustering FF Results

| Design <sub>[1]</sub> | No. of FF | Total Capacitance (pF) |               |                    |               |
|-----------------------|-----------|------------------------|---------------|--------------------|---------------|
|                       |           | ZST <sub>[2]</sub>     | Cluster + ZST | UST <sub>[3]</sub> | Cluster + UST |
| USB Fast              | 1752      | 5.62                   | 5.08          | 3.53               | 3.2           |
| DMA                   | 2121      | 6.94                   | 5.56          | 4.31               | 3.72          |
| openMSP430            | 686       | 2.66                   | 1.98          | 1.44               | 1.23          |
| Ethernet              | 10544     | 31.67                  | 26.23         | 22.54              | 18.91         |
| AES256                | 13216     | 40.47                  | 33.75         | 25.09              | 22.72         |
| PCI Bridge            | 3582      | 10.55                  | 9.15          | 7.48               | 6.52          |
| VGA enchanced         | 17071     | 49.89                  | 41.08         | 34.83              | 30.03         |

Benchmark from IWLS 2005 [8]

8 – 14% decrease in total capacitance

[8] International Workshop for Logic Synthesis – <u>http://irwls.org/iwls2005/benchmarks.html</u> - 2005 <sup>42</sup>

## Clustering FF + MBFF Transformation Results

| Destar                |           | Total Capacitance (pF) |                      |  |
|-----------------------|-----------|------------------------|----------------------|--|
| Design <sub>[1]</sub> | No. of FF | UST <sub>[3]</sub>     | Cluster + MBFF + UST |  |
| USB Fast              | 1752      | 3.53                   | 2.48                 |  |
| DMA                   | 2121      | 4.31                   | 2.85                 |  |
| openMSP430            | 686       | 1.44                   | 1.10                 |  |
| Ethernet              | 10544     | 22.54                  | 14.25                |  |
| AES256                | 13216     | 25.09                  | 17.8                 |  |
| PCI Bridge            | 3582      | 7.48                   | 5.00                 |  |
| VGA enchanced         | 17071     | 34.83                  | 22.92                |  |

20% – 34% decrease in total capacitance

#### Data & Clock Path Cap Reduction

Assuming worst case datapath wirelength

increase

| Design <sub>[1]</sub> | No. of FF | Data Path Wire<br>Cap Reduction (%) | Clock Path Wire<br>Cap Reduction (%) |
|-----------------------|-----------|-------------------------------------|--------------------------------------|
| USB Fast              | 1752      | -4.17                               | 13.25                                |
| DMA                   | 2121      | -9.86                               | 19.47                                |
| openMSP430            | 686       | -7.29                               | 19.81                                |
| Ethernet              | 10544     | -6.02                               | 22.25                                |
| AES256                | 13216     | -1.8                                | 13.03                                |
| PCI Bridge            | 3582      | -4.81                               | 18.05                                |
| VGA enchanced         | 17071     | -8.98                               | 19.56                                |

Clock Path WireCap Reduction >> Data Path Wire Cap Reduction

## Outline

- MBFF Introduction
- Bounded Arrival Time Useful Skew Tree
- Previous MBFF/Clustering Solutions
- Clustering of Flip-Flops Mechanics
- Proposed Methodology
- Evaluation & Experimental Results
- Summary

## Summary

- Displacing FFs requires update to skew constraints to ensure timing is met
- Clustering of Flip-Flops based on Bounded Arrival Time Range Clock Tree Synthesis provides fast update to the skew constraints while ensuring timing correctness
- Clustering of FFs reduces routing complexity
- MBFF transformation further reduces power consumption

# Q&A

## Backup Slides

#### Bounded Arrival Time Ranges



#### Skew Constraint Graph (SCG)

