## HS3DPG: Hierarchical Simulation for 3D P/G Network

Shuai Tao, Xiaoming Chen, Yu Wang, Yuchun Ma, Yiyu Shi , Hui Wang, Huazhong Yang

Presented by Shuai Tao (Tsinghua University, Beijing,China) January 24, 2013, ASP-DAC in Japan

- Background
- Motivation
- HS3DPG
  - Overview of the hierarchical simulation flow
  - Port equivalent model
  - Simplified model
- Experimental Results
- Conclusions

## P/G network simulation problems

- P/G networks have become more and more critical in the chip design flow.
  - 5% IR drop can lead to 15% or more performance degradation [J.\_S Yim, ACM Des.Autom. Conf, 99]
- The analysis of P/G networks is a very computationally challenging task.
  - Millions of P/G nodes in chips nowadays.
  - Difficult to divide for the most common mesh-based P/G networks



## **3D P/G network**

- With the feature size shrinking, three-dimensional (3D) integration has been regarded as a promising solution to mitigate bottlenecks faced by the traditional 2D integration.
  - Interconnect delay (15% wirelength reduction for 3 tiers [J. Cong, 08]), leakage power et al
- The power supply system of 3D ICs.



## **Related work**

- The time domain analysis of the P/G network can be divided into two categories: static IR drop analysis and transient simulation. This paper focuses on the former one.
- There are many studies on the modeling and fast analysis in 2D P/G networks, while only a few work in 3D.
- 2D P/G network simulation
  - Some Multigrid-based approaches:
    - AMG-PCG [J.Yang, ICCAD11], CPU-GPU HMD [Z. Feng, TVLSI11] and multigrid-like techniques [J. Kozhaya, TCAD02].
  - Hierarchical analysis [M. Zhao, TCAD02]:
    - Divide and conquer

## **Related work\_3D P/G network**

- Much tougher situation in the 3D case
  - The network scale may be several times larger than that of 2D cases.
- 3D P/G network simulation
  - Compact physical model [G. Huang, EPEP07]
  - Model order reduction [H. Yu, TDAES09]
  - Both above regarded the 3D power system as a whole.
- Standard reduced power models (SRPM) based approach. [X. Hu, 3DIC 10]

- Background
- Motivation
- HS3DPG
  - Overview of the hierarchical simulation flow
  - Port equivalent model
  - Simplified model
- Experimental Results
- Conclusions

## Why hierarchical simulation in 3D?

The inherent hierarchical nature of 3D P/G network

- Port number problem also exists
- Clustered TSV location makes it more suitable to use hierarchical approach.
  - Two ways of P/G TSV location in 3D chips. [M. B. Healy, TVLSI11]



Moreover, the "locality" property can also help solve the port number problem and simplify the simulation in 3D.

- Background
- Motivation
- HS3DPG
  - Overview of the hierarchical simulation flow
  - Port equivalent model
  - Simplified model
- Experimental Results
- Conclusions

## Hierarchical simulation flow for 3D P/G network

- P/G network Analysis based on MNA
  - G is sparse and SPD (solved by CHOLMOD [Y.Chen 08] in this paper )

$$Gx = I, G \in \mathbb{R}^{N \times N}, x \in \mathbb{R}^{N \times 1}, I \in \mathbb{R}^{N \times 1}$$

#### Objectives

- Static IR drop analysis : Obtain voltages of each node in the network
- Deal with each tier separately: In order to achieve benefits from parallelism.
- **Approach:** Extract a port equivalent model for each tier

### Hierarchical simulation flow for 3D P/G network



## Port equivalent model \_definition

Definition of the Port, V and I



The port equivalent model should maintain the same port characteristics as the original network.

$$I = J * V + S = \begin{bmatrix} \frac{\partial I_1}{\partial V_1} & \cdots & \frac{\partial I_1}{\partial V_M} \\ \vdots & \ddots & \vdots \\ \frac{\partial I_M}{\partial V_1} & \cdots & \frac{\partial I_M}{\partial V_M} \end{bmatrix} \begin{bmatrix} V_1 \\ \vdots \\ V_M \end{bmatrix} + \begin{bmatrix} S_1 \\ \vdots \\ S_M \end{bmatrix}$$

## **Port equivalent model \_ computation**

We take 2 ports for example to show how to compute the port equivalent model.
Is the state of the s



# Port equivalent model\_ circuit representation

 Circuit representation of one tier after using port equivalent model.



#### Benefits

- The port equivalent model of each tier can be computed in parallel.
- Mask details of the P/G network inside to avoid the conflict between data sharing and chip protection in 3D ICs.
- Potential to be also used in the transient simulation.

## "Locality" property and simplified model

"Locality" property in the flip-chip packaging.



 With "locality", the Jacobi matrix J in the port equivalent model can be **quite sparse.** Accordingly, **the number of** VCCS in the equivalent circuit can **be reduced** dramatically.

# Sparsity of the Jacobi matrix in the simplified models

 Sparsity of the Jacobi matrix increases in simplified models when we take "locality" property into consideration.

| TSV cluster array in<br>each tier | Sparsity in full port equivalent models | Sparsity in simplifed<br>models |  |  |
|-----------------------------------|-----------------------------------------|---------------------------------|--|--|
| 10x10                             | 1.00                                    | 0.49                            |  |  |
| 13x13                             | 1.00                                    | 0.33                            |  |  |
| 20x20                             | 1.00                                    | 0.16                            |  |  |
| 25x25                             | 0.9997                                  | 0.1076                          |  |  |
| 48x48                             | 0.62                                    | 0.032                           |  |  |

 With the "locality" effect in consideration, non-zero elements of the Jacobi matrix in simplified models can be only 5% of that in the full port equivalent model. (48x48 TSVs)

- Background
- Motivation
- HS3DPG
  - Overview of the hierarchical simulation flow
  - Port equivalent model
  - Simplified model
- Experimental Results
- Conclusions

## Verification of the proposed approach

Results on a 3D P/G benchmark from industrial design

3D\_µP, vdd 1.8V, 831184 P/G nodes, 2 tiers, 2x2 TSV clusters

|                                      | Time (s)                        |                                   |               | Memory (KB)                     |                                   |                |
|--------------------------------------|---------------------------------|-----------------------------------|---------------|---------------------------------|-----------------------------------|----------------|
| 3D_µP                                | Compute<br>equivalent<br>models | Simulate<br>the global<br>network | Total<br>time | Compute<br>equivalent<br>models | Simulate<br>the global<br>network | Peak<br>memory |
| Direct full<br>network<br>simulation | 0                               | 1.420                             | 1.420         | 0                               | 143956                            | 143956         |
| Hierarchical approach                | 0.747                           | 0.005                             | 0.752         | 72050                           | 8500                              | 72050          |

- 1.9x acceleration in speed and save nearly 2x memory
- The accuracy of the hierarchical simulation can also be well maintained (maximum absolute error around 10^-12)

## Scalability with the number of tiers

- When the number of tiers increase, the simulation results are as follows
  - Each tier: 1M P/G nodes and 10x10 TSV clusters.



 HS3DPG can ensure a good scalability with the increase of tier number and gain more benefits when the tier number becomes larger (9 tiers, 6.5 times faster)

## Performance comparison when the TSV cluster number increases

Peak memory

#### Total time



 Simplified models have much smaller number of VCCSs because we omit most of the port dependencies, which brings much lower memory allocation.

# Accuracy analysis of the simplified model

The maximal relative error of port voltages changes along with the affected area of one port.



 Users should make a balance between the accuracy and the simulation complexity

## Voltage Distribution of a clustered TSV based 3D P/G network



0.77V~0.8V

- IR-drop in the vertical direction along TSVs is small in the clustered architecture. Nodes connected to TSV clusters always have the maximum voltage in each tier.
- Particular attention to the power supply on top tiers in the 3D chip design is needed.

- Background
- Motivation
- HS3DPG
  - Overview of the hierarchical simulation flow
  - Port equivalent model
  - Simplified model
- Experimental Results
- Conclusions

## Conclusions

- We propose a hierarchical simulation method suitable for 3D P/G network (HS3DPG).
  - The proposed method firstly separates different tiers from the global network and then extracts the port equivalent models in parallel.
  - To further simplify the port equivalent model, we introduce the "locality" property into the 3D P/G network simulation.
  - Experimental results have proven the accuracy and scalability of our method.
- We use HS3DPG to analyze the voltage distribution map of a clustered TSV based 3D P/G network and some related features are concluded.

## Thanks! Q&A