### **Clock Tree Embedding for 3D ICs**

15<sup>th</sup> Asia and South Pacific Design Automation Conference Taipei, Taiwan January 18-21, 2010

Tak-Yung Kim and Taewhan Kim

System Synthesis Laboratory School of Electrical Engineering and Computer Science Seoul National University

#### **Contents**

- 3-Dimensional Integrated Circuit
- Clock Tree Synthesis
- Clock Tree Embedding for 3D ICs
  - Zero Skew Clock Tree Embedding
  - Minimization of the number of through-silicon vias
  - Minimization of the total wirelength
- Clock Tree Synthesis Flow for 3D ICs
- Experimental Results
- Conclusions

# **3-Dimensional Integrated Circuit**

#### Chip with multiple layers of active devices

- by vertically stacking into a single chip or package.
- Currently, a true 3D IC design technology using TSVs has been actively researched.
- Advantages & Issues

| Advantages                                                                                                         | Issues                                                                                         |
|--------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------|
| <ul> <li>Short global interconnection wire</li> <li>Performance/Power/Area benefit</li> </ul>                      | <ul> <li>Manufacturing cost</li> <li>Yield</li> <li>Thermal problem</li> </ul>                 |
| <ul> <li>Small form factor</li> <li>System size reduction</li> <li>Heterogeneous technology integration</li> </ul> | <ul> <li>Thermal problem</li> <li>Noise problem</li> <li>Physical design automation</li> </ul> |

# **3D Physical Design Automation**

- Various algorithms have been researched in the area of floorplanning, placement, and routing.
- Clock tree synthesis for 3D ICs
  - Minz, Zhao, and Lim developed a thermal-aware buffered clock tree synthesis solution. (BURITO; ASPDAC2008)
    - Minimizes and balances the temperature dependent skew.
    - Limited to two layered face-to-face wafer bonded 3D ICs.
    - Does not effectively consider the clock tree embedding problem in a vertical design space.

# **Clock Tree Synthesis**



CTS algorithm should consider 3-dimensional design space !!!

- Vertical die layers
- Through-silicon vias

1. Topology Generation 2. Embedding (Routing)





# Zero Skew Clock Tree Embedding



 $|e_x|$ : edge length from the parent node

# **Proposed Algorithm**

#### ZCTE-3D (Zero Skew Clock Tree Embedding in 3D ICs)

Solves the problem for a given topology in two steps.

Subproblem 1

**TSV allocation problem** 

➔ Minimize the number of TSVs

Subproblem 2

**3D clock tree embedding** problem with **TSV placement** 

→ Minimize the total wirelength

## Motivation for the TSV Allocation

Clock root s<sub>0</sub>



Number of TSV (x, s) = | layer(x) - layer(s) |

| Sink Placement<br>Layer |                       |            | Embedding<br>Layer |            |                       | TSVs           |   |  |  |
|-------------------------|-----------------------|------------|--------------------|------------|-----------------------|----------------|---|--|--|
| <b>S</b> <sub>1</sub>   | <b>S</b> <sub>2</sub> | <b>S</b> 3 | S <sub>4</sub>     | <b>X</b> 1 | <b>X</b> <sub>2</sub> | S <sub>0</sub> |   |  |  |
| 1                       | 2                     |            | 2                  | 1          | 1                     | 1              | 2 |  |  |
|                         |                       |            |                    |            |                       | 2              | 4 |  |  |
|                         |                       |            |                    | 1          | 2                     | 1              | 3 |  |  |
|                         |                       |            |                    |            |                       | 2              | 3 |  |  |
|                         |                       |            |                    | 2          | 1                     | 1              | 3 |  |  |
|                         |                       |            |                    |            |                       | 2              | 3 |  |  |
|                         |                       |            |                    | 2          | 2                     | 1              | 4 |  |  |
|                         |                       |            |                    |            |                       | 2              | 2 |  |  |

# Number of TSVs & Embedding Layers

- The number of TSVs is computed by
  - TSVs allocated in the sub-trees rooted at children
  - TSVs needed for routing to children.



The number of TSVs is decided by the embedding layers of tree nodes.





### **Embedding Layer Candidates**



- Find embedding layer candidates  $el(x) = \langle l_1, l_2 \rangle$ .
- Node x can be assigned to a layer between  $I_1$  and  $I_2$ .



# **Exact Embedding Layer Assignment**



### **Optimality of Embedding Layer Assignment**

1<sup>st</sup> Bottom-up Phase

**Embedding layer candidates construction** 

Lemma 1: Optimality of embedding layer candidates el(.)

Embedding layer candidates  $el(\cdot)$  contain an assignment of embedding layers of optimal TSV allocation.

2<sup>nd</sup> Top-down Phase

The exact embedding layer assignment

Lemma 2: Optimality of nearest embedding layer assignment

The nearest embedding layer assignment in  $el(\cdot)$  leads to an optimal TSV allocation.

# Minimization of the Total Wirelength

- Find the precise (x,y)-location with TSV placement.
- Extended the 2D clock tree embedding algorithm
  - DME (Deferred Merge Embedding) (Chao et al., TCAS1992)
  - Wirelength optimal under the linear delay model
  - Suboptimal under the Elmore delay model
  - For a prescribed clock connection topology
  - 1<sup>st</sup> bottom-up phase: Constructs a tree of merging segments, which represent loci of possible placements of tree nodes.
  - 2<sup>nd</sup> top-down phase: Decide the exact embedding locations of tree nodes.

# DME on 3D Design Space

 Construction of merging segment ms(v), with TRR (Tilted Rectangular Region)



- Edge length |e<sub>x</sub>| calculation is modified to consider the delays of TSVs as well as 2D wires.
- The (x,y)-location of TSVs is also concurrently decided based on the parasitic values of TSV and interconnect wire.

# Zero Skew Clock Routing Model



# Merging under the Linear Delay Model

Linear lumped capacitance delay model

a

- $t(a,b) = c_w \cdot d(a,b)$
- Propagation delay

$$t(v, T_a) =$$
  
$$t_a + c_w x + (l_v - l_a)c_v$$

$$t(v, T_b) =$$
  
$$t_b + c_w (L - x) + (l_b - l_v)c_v$$

#### Independent on the TSV locations.

b

#### Merging under the Linear Delay Model cont' Edge length calculation $\Box$ $|e_a|=x$ and $|e_b|=L-x$ ( $0 \le x \le L$ ) $x = \frac{(t_b - t_a) + c_w L + (l_a + l_b - 2l_v)c_v}{2c_w}$ b TSV TSI ˈms(v) ms(v) a $(I_a = I_b = I_v)$ $(I_a = 1, I_b = 2, I_v = 1)$ $(I_a = 1, I_b = 2, I_v = 2)$

# Merging under the Elmore Delay Model



$$Merging under the Elmore Delay Model-cont'$$
• Edge length calculation  

$$|e_a|=x \text{ and } |e_b|=L-x (0 \le x \le L)$$
TSVs are on the  
merging location.  

$$x = \begin{cases} \frac{(t_b - t_a) + r_w L(C_b + \frac{1}{2}c_w L) + \beta + r_v c_w (l_b - l_v) L}{r_w (C_a + C_b + c_w L) + r_v c_w (l_b - l_a)}, & (r_w c_v - r_v c_w) \ge 0, x_{v1} = x_{v2} = x \\ \frac{(t_b - t_a) + r_w L(C_b + \frac{1}{2}c_w L) + \beta + r_w c_v (l_b - l_v) L}{r_w (C_a + C_b + c_w L) + r_w c_v (l_b - l_a)}, & (r_w c_v - r_v c_w) \ge 0, x_{v1} = 0, x_{v2} = L \\ \frac{\beta = r_v ((l_b - l_v) C_b - (l_v - l_a) C_a + \frac{1}{2}c_v ((l_b - l_v)^2 - (l_v - l_a)^2))} \\ \text{TSVs are on the location of children} \end{cases}$$

#### **Proposed 3D Clock Tree Synthesis Flow**



#### Tree Topology Generation for 3D ICs (MMM-3D)

- Extended MMM (Method of Means and Medians) (Jackson et al., DAC1990): Recursively divides sinks based on the (x,y)-coordinates.
- If HPWL(subset)  $\leq \rho \cdot L$  (0 $\leq \rho \leq 1$ , L = HPWL(all sinks)),
  - Partitions the sinks according to the placement layers.
  - Controls the density of TSVs



### **Experimental Results**

- 2-Layered 3D ICs under Elmore delay model
- Total number of TSVs: 10%  $\downarrow$
- **Total wirelength:**  $4\% \downarrow$
- Max clock network delay: 2%  $\downarrow$

| benchmark | ALL_BURITO (100%) |            |               | BURITO + ZCTE-3D |            |               | MMM-3D + ZCTE-3D |            |               |
|-----------|-------------------|------------|---------------|------------------|------------|---------------|------------------|------------|---------------|
|           | TSVs              | WL<br>(um) | Delay<br>(ns) | TSVs             | WL<br>(um) | Delay<br>(ns) | TSVs             | WL<br>(um) | Delay<br>(ns) |
| r1        | 88                | 1496266    | 1.68          | 82               | 1496538    | 1.67          | 83               | 1441849    | 1.64          |
| r2        | 225               | 2994996    | 4.34          | 197              | 2996950    | 4.29          | 197              | 2831346    | 4.34          |
| r3        | 309               | 3869936    | 6.75          | 283              | 3872996    | 6.71          | 276              | 3725294    | 6.37          |
| r4        | 725               | 7784107    | 19.82         | 658              | 7789465    | 19.68         | 653              | 7424886    | 19.28         |
| r5        | 1161              | 11385430   | 35.75         | 1033             | 11393346   | 35.54         | 1052             | 10940984   | 35.2          |
| ratio     | 1                 | 1          | 1             | 0.9              | 1          | 0.99          | 0.9              | 0.96       | 0.98          |

# **Experimental Results** – cont'

#### Clock topology Generation

- Benchmark circuit: r3
- Number of die layers: 4



TSVs = 657TSVs = 359TSVs = 3Wirelength = 187cmWirelength = 224cmWirelength = 350cm

# **Conclusions**

- The zero skew clock tree embedding algorithm ZCTE-3D is proposed.
  - Allocates optimal number of TSVs. (Assign-Embedding-Layer)
  - Minimizes the total wirelength by extending 2D DME algorithm to 3D design space. (DME-3D)
    - Wirelength optimal under the linear delay model.
    - Suboptimal under the Elmore delay model.
- **3D topology generation algorithm MMM-3D is proposed.**

