POLITECNICO DI MILANO



DIPARTIMENTO DI ELETTRONICA, INFORMAZIONE E BIOINGEGNERIA



Variation-Aware Voltage Island Formation for Power Efficient Near Threshold Manycore Architectures ASP-DAC 2014

Stamelakos Ioannis, Sotirios Xydis, Gianluca Palermo, Cristina Silvano Politecnico di Milano Dipartimento di Elettronica, Informazione e Bioingegneria **Presenter: ioannis.stamelakos@polimi.it** 

### Outline

- Motivation
- Problem Specification
- Proposed Solution/Framework
- Experimental Setup
- Experimental Results
- Conclusion





## **The Dark Silicon Era**

Dark Silicon:

The percentage of transistors/ circuit that is switched off ("dark") due to the limited power budget





#### Vdd aggressively tuned close to the Vth value of the transistors

Lower frequency but larger number of cores available

 Promising energy savings (10x) while sustaining performance through parallelization





DIPARTIMENTO DI ELETTRONICA, INFORMAZIONE E BIOINGEGNERIA POLITECNICO DI MILANO

#### **Performance Degradation @ NTC**

Limited maximum achievable clock frequency

Vdd-Vth difference reduction imposes a significant performance degradation

*Open Issue*: How to sustain performance when exploiting higher task parallelism at lower clock frequencies under process variability ?

#### **The Variability Problem**



DIPARTIMENTO DI ELETTRONICA, INFORMAZIONE E BIOINGEGNERIA POLITECNICO DI MILANO

### Variability @ NTC





DIPARTIMENTO DI ELETTRONICA, INFORMAZIONE E BIOINGEGNERIA POLITECNICO DI MILANO

#### **Target Architecture**



Abstract view of tile based many-core architecture

Vth Variability Map 128 Cores

![](_page_11_Figure_5.jpeg)

![](_page_11_Figure_6.jpeg)

8-core Cluster

.20 .22 .24 .26 \NO

- SFMV(Single Frequency Multiple Voltages) Approach:
  One chip-wide frequency but many voltage domains (Voltage Islands)
- Each VI can include a certain number of cores and the Vdd can be tuned in a custom way

 Adjust Vdd according to the underlying variability in order to reach the desired frequency that sustains the application performance

#### **Proposed Framework (I)**

#### STC Regime: Application & Architecture Characterization

![](_page_13_Figure_2.jpeg)

![](_page_14_Figure_0.jpeg)

![](_page_15_Picture_0.jpeg)

DIPARTIMENTO DI ELETTRONICA, INFORMAZIONE E BIOINGEGNERIA POLITECNICO DI MILANO

#### **Experimental Setup**

![](_page_16_Figure_1.jpeg)

#### Splash-2 Benchmark Suite run in Sniner sim

| <b>Speedup:</b> | ideal     | good      | limited  |
|-----------------|-----------|-----------|----------|
| ی<br>م          | radiosity | barnes    | raytrace |
| С<br>Д          |           | water-nsq | water-sp |

- + an average case workload water-sp aver
- Variability maps: VARIUS-NTV Karpuzcu et al., DSN' 12

17 single

snipe

## **Architecture and Floorplan**

A. Tile based many-core architecture

R Floornlan: 128 cores

Vth Variability Map 128 Cores

| Tile <sub>11</sub> | Tile <sub>12</sub> | Tile <sub>13</sub> | Tile <sub>14</sub> | 14 -                      | Core<br>P\$ |
|--------------------|--------------------|--------------------|--------------------|---------------------------|-------------|-------------|-------------|-------------|-------------|-------------|-------------|-------------|-------------|-------------|-------------|-------------|-------------|-------------|-------------|-------------|
|                    |                    |                    |                    |                           |             | L           | L\$         |             | LL\$        |             |             | LL\$        |             |             |             | LL\$        |             |             |             |             |
| Tilea, Tilea       | Tileaa             | Tile               |                    | Р\$                       | P\$         | Р\$         | P\$         | Р\$         | Р\$         |             |
|                    |                    |                    | 24                 |                           | Core        |
|                    |                    |                    |                    |                           | Core        |
| Tile <sub>31</sub> | Tile <sub>32</sub> | Tile <sub>33</sub> | Tile <sub>34</sub> | 10 -                      | P\$         |
|                    |                    |                    |                    | ArchFP*                   |             | L           |             |             |             | L           | ->          |             |             | L           |             |             |             | L           |             |             |
| Tile <sub>41</sub> | Tile <sub>42</sub> | Tile               | Tile               | 8                         | P\$         |
|                    | 43                 | 44                 |                    | Core                      | Core        | Core        | Core        | Core        | Core        | Core        | Core        | Core        | Core        | Core        | Core        | Core        | Core        | Core        | Core        |             |
|                    |                    |                    |                    | Core                      | Core        | Core        | Core        | Core        | Core        | Core        | Core        | Core        | Core        | Core        | Core        | Core        | Core        | Core        | Core        |             |
| Tile <sub>51</sub> | Tile <sub>52</sub> | Tile <sub>53</sub> | Tile <sub>54</sub> | 6                         | P\$         |
|                    |                    |                    |                    |                           | LL\$        |             | LL\$        |             |             |             | LL\$        |             |             |             | LL\$        |             |             |             |             |             |
| Tile               | Tile               | Tile               | Tile               |                           | P\$         | Р\$         |
|                    |                    | 04                 | 4 -                | Core                      | Core        | Core        | Core        | Core        | Core        | Core        | Core        | Core        | Core        | Core        | Core        | Core        | Core        | Core        | Core        |             |
|                    |                    | PPPP               | PPPP               |                           | Core        |
| Tile <sub>71</sub> | Tile <sub>72</sub> |                    |                    | 1111.2-                   | P\$         |
|                    |                    |                    |                    |                           | LL\$        |             |             | LL\$        |             |             | LL\$        |             |             |             | LL\$        |             |             |             |             |             |
|                    | Tilo               | LLŞ                | LLŞ                |                           | Р\$         | P\$         | P\$         | Р\$         | P\$         | Р\$         | Р\$         | P\$         | P\$         | P\$         | P\$         | P\$         | Р\$         | Р\$         | P\$         | P\$         |
| 111081             | 111e <sub>82</sub> | P P P P            | P P P P            | ° ° ° ° 1 o <mark></mark> | Core        |
|                    |                    |                    |                    | 0                         |             |             | 5           |             |             | 10          |             |             | IV.         |             |             | 20          |             |             |             |             |
|                    | 24x16 grid         |                    |                    |                           |             |             |             |             |             |             |             |             |             |             |             |             |             |             |             |             |
|                    |                    |                    |                    |                           | *           | Fa          | us          | t e         | et a        | al.         | , V         | ′LS         | 1-3         | So          | C           | 12          | 20          |             |             |             |
|                    |                    |                    |                    |                           |             |             |             |             |             |             |             |             |             |             |             |             |             |             |             |             |

![](_page_18_Picture_0.jpeg)

DIPARTIMENTO DI ELETTRONICA, INFORMAZIONE E BIOINGEGNERIA POLITECNICO DI MILANO

#### **Power Reduction @ NTC**

![](_page_19_Figure_1.jpeg)

# Power Gain of Variability-aware technique w.r.t Overdesign

![](_page_20_Figure_1.jpeg)

#### **Impact of Voltage Island Granularity** on Power Consumption

128s1

128s2

128s4

---X--

[M]

· · · ×

3.8

3.6

3.4

[M]

Power

[M]

Power

![](_page_21_Figure_1.jpeg)

![](_page_21_Figure_2.jpeg)

#### Impact of Voltage Island Granularity on Power Consumption

![](_page_22_Figure_1.jpeg)

#### Impact of Voltage Regulator Resolution on power efficiency at NTC

![](_page_23_Figure_1.jpeg)

Voltage Regulator Resolution

Power overhead: the normalized difference between the power consumed in the ideal case and the power with the specific value of voltage precision

The higher is the resolution the smaller is the overhead

Even the 12% can be tolerable for applications that exhibit ideal or good scaling

![](_page_24_Picture_0.jpeg)

DIPARTIMENTO DI ELETTRONICA, INFORMAZIONE E BIOINGEGNERIA POLITEC

POLITECNICO DI MILANO

### Conclusions

 A variability-aware framework for exploring the power-efficiency of Near-Threshold Computing

Voltage island formation combined with the operation at the nearthreshold regime proposed as an effective technique for building power efficient many-core architectures while sustaining super threshold performance

Promising results shown, depending on both workload characteristics and the underlying architectural organization

- □ ~ 65% average power gain
  - ~ 15-35% extra savings for finest VI granularity
- ~ 2.5 -12% power degradation due to VR quantization

![](_page_26_Picture_0.jpeg)

DIPARTIMENTO DI ELETTRONICA, INFORMAZIONE E BIOINGEGNERIA POLITECNICO DI MILANO

#### Backup Slides

![](_page_27_Picture_1.jpeg)

#### Vdd Distribution at NT regime

![](_page_28_Figure_1.jpeg)

DIPARTIMENTO DI ELETTRONICA, INFORMAZIONE E BIOINGEGNERIA POLITECNICO DI MILANO

#### The DIBL Effect

![](_page_29_Figure_1.jpeg)