

# **Session 4C**

# Power Optimization for Application-Specific 3D Network-on-Chip with Multiple Supply Voltages



Kan Wang, Sheqin Dong Tsinghua University, Beijing, P.R.China E-mail: wangkan09@mails.thu.edu.cn





#### Introduction



## MSV-driven Layer Assignment



#### **3D NoC Synthesis**











#### Introduction



## MSV-driven Layer Assignment



#### **3D NoC Synthesis**









#### Challenge in Moore's law







# Moore's law is facing huge challenges



# **Power Consumption**

- Waste power
- Thermal problem



- Three-dimensional (3D) integration
  - Decreasing wire delay, increasing integration density and improving performance
  - Faced with heat dissipation and temperature problem











- Network-on-chip (NoC)
  - Brings networks into chip
  - Greatly reduce communication Cost



#### **Power Consumption**







- Multiple Supple Voltage (MSV)
  - Partitions the circuit into domains of different voltage levels (Voltage Island)
  - Reducing power by assigning lower voltage to some blocks on chip

<u>– Many MSV-dr</u>iv

There is still no work on MSV-driven applicationspecific 3D NoC design







- Communication power can be optimized through a good NoC synthesis
- Many designers proposed 3D NoC synthesis methods[ICCD'08, TCAD'10,ASPDAC'10]
- However, none of them considered the layer assignment problem, which can greatly affects the vertical communication power
- Besides, few of previous work thought of the usage of MSV technology on core power optimization







- In this paper, a MSV-driven framework is proposed for application-specific 3D NoC
  - A unified ILP modeling method is proposed for taking into account both layer assignment and voltage level assignment
  - 2. 3D NoC synthesis is proposed with consideration of inter-layer communication optimization
  - 3. 3D NoC floorplanning with network component assignment considered is presented for communication power improvement.





#### Introduction



## **MSV-driven Layer Assignment**



#### **3D NoC Synthesis**







#### **Overall Design Flow**







- Given:
  - (1) N cores with information of width and height
  - (2) communication graph
  - (3) m legal voltage levels
  - (4) the lowest allowable voltage table of each core and the corresponding power consumptions at different voltage levels
  - (5) the number of layers n The output is a generated
- The objective is:

legal 3D layout

 Assigns N cores to n layers with a voltage assignment for each core, assigns network components on each layer, and determines the physical position of each core and component









- How to solve MLA (m, n)?
  - -m = n: Single voltage level in each layer
  - -m > n: Single voltage level in multi-layers
  - -m < n: Multi-voltage levels in each layer



#### The case of *m*= *n*













- m > n: more than one voltage island on each layer
  - Which voltages on each layer?
  - Voltage assignment on 2D NoC
    - Mapping 3D cores onto 2D NoC
    - Do *m* voltage assignment on 2D NoC
    - Compare total cost of different combination
  - Modified Core communication graph
    - Communication cost and connection cost to separate the cores in the same voltage island
  - Area balanced partition



- As a result of area-balanced constraints, the ILP formulation is exact but timeconsuming
  - It takes thousands of seconds to solve even small cases
- An ILP variable pruning method is proposed inspired by that
  - If core *i* connect few other cores or the communication cost generated by core *i* is small, then core *i* can be assigned to the lowest allowable voltage level

#### Variable Pruning for ILP









#### Introduction



#### MSV-driven Layer Assignment



#### **3D NoC Synthesis**









- 3D NoC is a 3D network including innerlayer networks and inter-layer interconnection
  - Inner-layer NoC synthesis [DAC'09]
  - Inter-layer TSV planning





 Network components are integrated into floorplanning and the positions can be improved through floorplanning process





 A post-floorplan dead-space re-allocation and LP based optimization algorithm is proposed to further improve comp Line-scan method

 $comCost_{i,k} = \sum_{j=0}^{n} Com_{kj} * length_{ij} + \sum_{t=0}^{n} Com_{kt} * length_{it}$ 

Function DS\_Allocation: Method : orderDummyCoresByComs(); //order Switches by communication amount For (i between 1 and Ns){ //Search for all dead-space for each switch orderDS(i); //order all dead-space by weighted wire length For (j between 1 and Nds) { if (enoughSpace(i, DSj)==True){//There is enough space for Switch i allocate(i, j); //Allocate core I to dead-space j updateSpace(j)://update the space by reduce the size of switch i }} Network flow based algorithm is used for NI





Post-Floorplan process

**Dead-space** Generation

Output Floorplan





#### Introduction



## MSV-driven Layer Assignment



#### **3D NoC Synthesis**











#### Table 1: Results of MSV-driven Layer Assignment Algorithm

| Benchmark L# V# E#                                              | ILP<br>Inter-layer Run                                                                                | Area Balanced M<br>Core Inter-layer | Method<br>Run                                       | Heuristic ILP<br>Core Inter Ayer Run |                |                      |  |
|-----------------------------------------------------------------|-------------------------------------------------------------------------------------------------------|-------------------------------------|-----------------------------------------------------|--------------------------------------|----------------|----------------------|--|
| Pure ILP solving<br>without any<br>strategy                     | Com timo(s)<br>44<br>79<br>78<br>78<br>78<br>76<br>76<br>76<br>76<br>76<br>76<br>76<br>76<br>76<br>76 | alanced<br>for only<br>cost         | The proposed<br>method with ILP<br>variable pruning |                                      |                |                      |  |
| $D_{-76}$ $\begin{array}{c ccccccccccccccccccccccccccccccccccc$ |                                                                                                       |                                     | 0.01                                                | 94.4<br>104.1                        | 81.4<br>138.6  | 1.75<br>63.0         |  |
|                                                                 | >3000<br>1 1                                                                                          | 149.7 151.6                         | 0.01                                                | 101.8<br>0.998                       | 144.9<br>1.062 | <u>95.4</u><br>0.018 |  |
|                                                                 |                                                                                                       | 1 1                                 | 1                                                   | 0.664                                | 1.098          | -                    |  |

- Compared to ILP formulation, the proposed algorithm can save time greatly by 98.2% with only 6% more power
- Compared to the partition method, the proposed method can improve core power by 33.6% with only 9.8% increase on communication power, which achieves good trade-off between core power and communication power.





- Inter-layer communication can improve by about 35.8% with only 3.8% increase on inner-layer communication power, total power can be improved by 16.9%
- Global optimization can improve inter-layer communication by 60.8% and total power by 29.1%



|              |                                          |      | Tat | ole 3: E  | nect of | IVISV- | 3DNot         | )     |       |          |  |
|--------------|------------------------------------------|------|-----|-----------|---------|--------|---------------|-------|-------|----------|--|
| Benchmark    | L#                                       | V#   | E#  | MSV-2DNoC |         |        | MSV-PDNoC     |       |       |          |  |
|              |                                          |      |     | Core      | Com     | Dead   | Core          | Com   | rad   | Run-time |  |
|              |                                          |      |     | Power     | Power   | Space  | Power         | Power | e     | (s)      |  |
|              | 2                                        | 38   | 46  | 46.2      | 95.96   | 14.39  | 47.3          | 54.12 |       | 44.7     |  |
| $D_38_tvopd$ | 3                                        | - 38 | 46  |           | 95.96   | 14.39  | 51.7          | 48.77 | 1     | 37.7     |  |
| 96 14.39     |                                          |      |     |           |         |        |               |       |       |          |  |
|              | MSV-2D NOC In .4 11.99                   |      |     |           |         |        | I he proposed |       |       |          |  |
| D_36         | .4 11.99                                 |      |     |           |         |        |               |       |       |          |  |
|              | $GLSVLSI'12$ $\frac{11.99}{3}$ $3D-N_0C$ |      |     |           |         |        |               |       |       |          |  |
|              |                                          |      |     |           | .8      | 19.56  |               |       |       |          |  |
| D_52         | 5.8 19.56                                |      |     |           |         |        |               |       |       |          |  |
|              | 4                                        | 52   | 61  | 77.8      | 135.8   | 19.56  |               |       |       |          |  |
| D_76         | 2                                        | 76   | 92  | 92.4      | 264.1   | 23.83  | 94.4          | 139.1 | 23.83 | 65.1     |  |
|              | 3                                        | 76   | 92  | 92.4      | 264.1   | 23.83  | 104.1         | 144.9 | 13.36 | 60.3     |  |
|              | 4                                        | 76   | 92  | 92.4      | 264.1   | 23.83  | 101.8         | 119.2 | 18.65 | 82.2     |  |
| Ratio        |                                          | -    | -   | 1         | 1       | 1      | 1.098         | 0.523 | 1.034 | -        |  |









#### Introduction



## MSV-driven Layer Assignment



#### **3D NoC Synthesis**



#### **Experimental Results**



# Conclusion



- In this paper, a MSV-driven framework for application-specific 3D NoC design is proposed.
- Through a unified modeling method for simultaneously layer assignment and voltage assignment, 3D NoC synthesis and 3D floorplanning algorithm, the total power can be optimized.
- Compared to MSV-driven 2D NoC, the proposed method can improve total chip power greatly



# Thanks

Email: wangkan09@mails.thu.edu.cn

35

Q&A

#### **Overall Design Flow**





#### The case of *m*= *n*





#### The case of *m*= *n*





#### Variable Pruning for ILP







 A post-floorplan dead-space re-allocation and LP based optimization algorithm is proposed to further improve comp Ns
Line-scan method

 $comCost_{i,k} = \sum_{j=0}^{1} Com_{kj} * length_{ij} + \sum_{t=0}^{1} Com_{kt} * length_{it}$ 

Function DS\_Allocation: Method : orderDummyCoresByComs(); //order Switches by communication amount For (i between 1 and Ns){ //Search for all dead-space for each switch orderDS(i); //order all dead-space by weighted wire length For (j between 1 and Nds) { if (enoughSpace(i, DSj)==True){//There is enough space for Switch i allocate(i, j); //Allocate core I to dead-space j updateSpace(j);//update the space by reduce the size of switch i }} Network flow based algorithm is used for NI

insertion





- Environment:
  - Workstation: 3.0 GHz CPU, 4GB memory
  - Tool: *hmetis* for partition *and lp\_solve* for ILP and LP solving
  - Benchmark: D\_38\_tvopd is used from [6] and D\_36, D\_52 and D\_76 are derived from D\_38\_tvopd
  - The power model of network components and communication is evaluated according to [ASPDAC'10]