Asia and South Pacific Design Automation Conference 2008

## Variability-Driven Module Selection with Joint Design Time Optimization and Post-Silicon Tuning

#### Feng Wang, Xiaoxia Wu, Yuan Xie

The Pennsylvania State University Department of Computer Science & Engineering

# Outline

#### Introduction

- Process Variation and its impact on HLS
- Related work

#### Variability-Driven Module Selection

- Performance/Power yield
- Design Time Approach
- Post-silicon Tuning Approach
- The combined approach
- Experimental Results
- Conclusion

### What is the problem?

- Process variation has become a prominent concern as technology scales
- Device and interconnect process variations increase with shrinking feature sizes



(Source: K. Roy DAC05)



# Impact on High-Level Synthesis

- HLS schedules operations at difference clock cycle and maps them to function units (FU).
- Traditionally, each FU has a fixed latency value.

CC1

CC2

CC3

CC4



However, under process variation....

## **Old Solutions**

Worst-case analysis:

-- much larger variation -- very pessimistic



Require a shift in the design paradigm, from today's deterministic to probabilistic design

# Probabilistic Design Paradigm

A holistic design paradigm shift to statistical design



# **Related work**

- High-level synthesis is a well-studied problem
  - □ Low power: T. Kim TVLSI03, J. Cong ASPDAC08
  - Thermal: Seda ICCAD 06
- Physical information can also be integrated into HLS
  H. Zhou DAC05
- Industry success story:
  - □ HLS tool "Catapult" (Mentor Graphics)
  - □ BlueSpec inc.
  - AutoESL
- Variation-aware HLS
  - □ W. Huang ICCAD06, T. Kim ICCAD07, S. P. Mohanty VLSID 07

# variation-aware high level synthesis is still in its infancy

# Outline

#### Introduction

Process Variation and its impact on HLS

Related work

#### Variability-Driven Module Selection

Performance/Power yield

Design Time Approach

Post-silicon Tuning Approach

 $\hfill\square$  The combined approach

Experimental Results

Conclusion

### Performance Analysis/Yield

Performance yield: The probability that the synthesis hardware can work at a particular clock rate

A functional unit:  $T_i = a0_i + a1_i \Delta V_{th} + a2_i \Delta l + a3_i V_{SB}$ 

Synthesized DFG: Sum operation and Max operation

Performance Yield of the DFG:

$$\begin{aligned} Yield_{delay}(DFG) &= \Pr ob(T_{\max} \leq T_{clock} \left| constraint s \right) \\ Yield_{delay} &= \prod_{i=1}^{M} Yield_{delay}(b_i) \\ \Delta Yield_{delay} &= \prod_{i=1, i \neq j}^{M} Yield_{delay}(b_i) \times \Delta Yield_{delay(b_j)} \end{aligned}$$

## Power Analysis/Yield

Power yield: The probability that the total power less than the power limit

A functional unit:  $P_i = \exp(b0_i + b1_i \Delta V_{th} + b2_i \Delta l + b3_i V_{SB})$ 

Synthesized DFG: Sum of the random variables

Power Yield of the DFG:

 $\begin{aligned} &Yield_{power}(DFG) = \Pr ob(P_{tot} \leq P_{target} | constraint s) \\ &P_{DFG}^{new} = P_{DFG}^{old} - P_{opt_k}^{old} + P_{opt_k}^{new} \\ &\Delta Yield = Yield(P_{DFG}^{new}) - Yield(P_{DFG}^{old}) \end{aligned}$ 

#### Design Time Approach- example



<u>Worst case analysis</u>: Adder2 is faster <u>CCT=T1</u>: <u>Adder 1 is better</u> <u>CCT=T2</u>: <u>Adder 2 is better</u> <u>CCT=T3</u>: Both Adders have the same yield (100%)

## Design Time Approach- algorithm

- **Input:** initial scheduled DFG, constraints, module library
- Output: a synthesized DFG with optimized power and satisfied performance constraints



# Post Silicon Tuning

- Tuning chips after manufacturing, body biasing techniques by controlling threshold voltage
  - Reverse body biasing (RBB) reduces leakage power at the expense of slowing down circuits
  - Forward body biasing (FBB) improves performance at the expense of higher leakage power



# Post Silicon tuning Approach

Decide the optimal body biasing for a module selection decision such that the power yield is maximized under the performance constraints.

minimize: $P_{sttot}$ subject to: $P(T_{max} \leq T_{clock} | constraint s) \geq \alpha$ second order conic programminimize: $(a1+b*a2)^T s$ subject to: $b^T s + \phi^{-1}(\alpha)(s^T \sum s)^{1/2} \leq T_{limit}$  $c^T(s-s_{ini}) \leq \varepsilon$ 

vector s is to be determined, then Vsb

# Joint optimization Approach

```
JointOpt (ISDFG, constraints, Library)
```

While (ΔY ield > ε and meet constraints){

Design time module selection under current body bias;
 Sequential Conic Optimization;

- 4.}
- □ The initial body bias is zero
- Maximize the power yield under performance yield constraints
- □ Iterates until no improvement can be obtained
- Output a synthesized DFG with optimal body bias

# Outline

#### Introduction

Process Variation and its impact on HLS

Related work

#### Variability-Driven Module Selection

- Performance/Power yield
- Design Time Approach
- Post-silicon Tuning Approach
- $\hfill\square$  The combined approach
- Experimental Results
- Conclusion

Experiment set up

- Algorithms in C++
- 90nm technology
- Six high level synthesis benchmarks:
  - □ A 16-point symmetric FIR filter (FF)
  - □ A 16-point elliptic wave filter (EWF)
  - An autoregressive lattice filter (ARF)
  - An algorithm for computing discrete cosine transform (DCT)
  - $\Box$  A differential equation solver (DES)
  - □ An IIR filter (IIR)

## Power Yield Gain

Design Time Approach vs. worst case
 90% performance yield constraint 34% power yield



## **Power Yield Results**

Joint Approach vs. Design time only □ 99% performance yield constraint

38% power yield

|         |     |        |        | impro       | vement |
|---------|-----|--------|--------|-------------|--------|
| Name    | DT  | JTS    | JTS-DT | (JTS-DT)/DT |        |
| AR      | 47% | 86%    | 39%    | 83%         |        |
| DCT     | 60% | 85%    | 25%    | 42%         | 1      |
| DES     | 76% | 90%    | 14%    | 18%         | 1      |
| EWF     | 79% | 90%    | 11%    | 14%         | т      |
| FF      | 75% | 92%    | 17%    | 23%         | тѕ     |
| IIR     | 58% | 85%    | 27%    | 47%         |        |
| Average | 66% | 88%    | 22%    | 38%         |        |
| 10% -   |     |        |        |             |        |
| 0%      |     |        |        |             |        |
| AP      | oc' | OFFS E | NF FF  | IIR Jerage  |        |
|         |     |        |        | P'          |        |

## Conclusion

- As technology scales, process variation has increasing impact on performance and power variations
- Traditional synthesis techniques belong to design time approaches
- We propose a yield driven module selection with joint design time optimization and post-silicon tuning



## Compare with Previous Works

- Only consider timing variability
- Every step is still deterministic
- Design time approach