Trade-off Analysis between Timing Error Rate and Power Dissipation for Adaptive Speed Control with Timing Error Prediction

> <u>Hiroshi Fuketa</u><sup>\*†</sup>, Masanori Hashimoto<sup>\*†</sup>, Yukio Mitsuyama<sup>\*†</sup>, and Takao Onoye<sup>\*†</sup>

> > \*Dept. Information Systems Engineering, Osaka University

<sup>†</sup>JST, CREST

# **Table of Contents**

- Background
- Systematic evaluation of power dissipation and timing error rate
- Experimental results
- Conclusion

# **Table of Contents**

- Background
- Systematic evaluation of power dissipation and timing error rate
- Experimental results
- Conclusion

# Background

- Circuit speed is becoming more sensitive to:
  - manufacturing variability
  - operating environment (supply voltage, temperature, etc)
  - aging (NBTI, HCI, etc)

Timing margin of a chip varies chip by chip.

- "Worst case design" is inefficient for large variation.
- Run-time adaptive speed control is promising.



## Adaptive speed control

Adaptive speed control with timing error prediction



[1] T.Sato, et al., "A Simple Flip-Flop Circuit for Typical-Case Designs for DFM," in *Proc. ISQED*, 2007

# Problem of adaptive speed control

- A timing error can not be completely eliminated
  - If path activation probability is extremely low, a warning signal may not occur during the monitoring period.

- Circuit is slowed down excessively
  - → Timing errors could occur before a warning signal emerges.
- When the occurrence of timing errors is extremely rare, some systems could accept the errors.
  - Need to estimate the occurrence of timing errors systematically and quantitatively

# Timing error rate and power dissipation

- How to improve timing error rate?
  - Insert larger buffer delay
    - Timing margin of canary FF is much severer than main FF
      - $\rightarrow$  speeded up more than required
      - → increase in power dissipation
  - Change inserted location
  - Lengthen the monitoring periods

trade-off relations between timing error rate and power ex) 32b Ripple carry adder



Path activation probability

$$\left(\frac{1}{2}\right)^{32} \approx \frac{1}{10^9}$$

\* FA = Full Adder

# Contributions

- Propose a framework that systematically evaluates power dissipation and occurrence of timing errors.
  - Explore the design space of the adaptive speed control with canary FF
  - Examine the relationship between the <u>timing error rate</u> and the <u>power dissipation</u>

# **Table of Contents**

- Background
- Systematic evaluation of power dissipation and timing error rate
- Experimental results
- Conclusion

# Assumed system

- Only one canary FF is inserted.
- Circuit speed is controlled digitally. ("<u>speed level</u>")

#### Goal

 Reveal the relationship between <u>power dissipation</u> and <u>timing error rate</u> of this system



- Find the optimum design parameters satisfying the required power dissipation and timing error rate
  - Where should canary FF be inserted?
  - How large should buffer delay be set?
  - How long should monitoring period be set?

Focus on a path activation probability

# Path activation probability

| $P_i(t)$     | Probability that at least one of paths terminating at the $i$ th FF whose delays are larger than $t$ is activated. |
|--------------|--------------------------------------------------------------------------------------------------------------------|
| $P_{all}(t)$ | Probability that at least one path in a circuit whose delay is larger than $t$ is activated.                       |

- Path activation probability depends on:
  - circuit structure
  - speed level
  - operating condition (ex. temperature)

Functions of speed level l and condition X:

$$\begin{cases} P_i(t,l,X) \\ P_{all}(t,l,X) \end{cases}$$

#### Framework overview



- Path activation probability at each speed level and operation condition
- Power dissipation at each speed level and operation condition

#### Design parameters

- Inserted location of canary FF
- Delay time of the delay buffer
- Monitoring period
- Expected power dissipation of the system
- Timing error rate of the system

# Warning and error probability



- Let P<sub>w</sub>(l, X) be the occurrence probability of a <u>warning signal</u> at speed level *l* and condition X in a cycle
  - Canary FF is inserted at the i th FF
  - $D_d$  is the buffer delay in the canary FF
  - $T_c$  is the clock cycle

$$P_{w}(l,X) = P_{i}(T_{c} - D_{d}, l, X) - P_{i}(T_{c}, l, X)$$

• Let  $P_{err}(l, X)$  be the occurrence probability of a <u>timing error</u> at speed level l and condition X in a cycle  $P_{err}(l, X) = P_{all}(T_c, l, X)$ 

## Speed level transition

Speed level : How fast or slow the circuit is controlled

- <u>Higher</u> speed level means the circuit is controlled <u>faster</u>.
- Once a warning signal is detected during the monitoring period, speed level is incremented by 1.



- Speed level transition satisfies Markov property.
  - The next speed level is determined by the present speed level and by the detection of the warning signal.

# Speed level transition probability



- Probability that speed level transits
  - Let  $P_d(l, X)$  be the probability that <u>at</u> <u>least one warning signal</u> is detected during the monitoring period  $N_{mon}$  at speed level l and condition X:

$$P_{d}(l, X) = 1 - (1 - P_{w}(l, X))^{N_{mon}}$$

$$P_{w}: \text{ warning probability}$$

Probability that warning is not detected in a cycle

#### **Transition Matrix**

Transition matrix of the Markov chain: P



\*  $l_{\rm max}$  and  $l_{\rm min}$  are maximum and minimum speed level

## State probability



• Let  $\pi(n)$  be a state probability vector in n-th time step  $\pi(n+1) = \pi(n) \cdot P$ P: transition matrix

• Let  $\pi_l(X)$  be a steady state probability of being at speed level l when condition X

$$\pi(n \to \infty) = \left[ \begin{array}{ccc} \pi_{l_{\max}}(X) & \pi_{l_{\max}-1}(X) & \cdots & \pi_{l_{\min}}(X) \end{array} \right]$$

\*  $l_{\text{max}}$  and  $l_{\text{min}}$  are maximum and minimum speed level

# Average cycle of a single stay

- State probability  $\pi_l(X)$  is not suitable to evaluate the power dissipation and the timing error rate.
  - $\pi_l(X)$  is not directly related to actual time.
    - Speed level is changed immediately once a warning signal is observed.
    - Periods (# cycles) of being at a certain speed level are <u>not</u> <u>always the same</u>.
  - Need <u>"Time"-based state probability</u>
  - $N_{rem}(l)$ : the average cycle of a single stay at speed level l

 $N_{rem}(l) = 1 \cdot P_{w} + 2 \cdot (1 - P_{w})P_{w} + \dots + N_{mon}(1 - P_{w})^{N_{mon}-1}P_{w} + N_{mon}(1 - P_{w})^{N_{mon}}$ 

 $P_{w}$ : warning probability

#### Conversion to time based state probability



#### Expected power and timing error rate



Expected power dissipation of the system with canary FF:  $P_{ow, avg}(X)$ 

•  $P_{ow}(l, X)$  is the power dissipation at speed level l and condition X (given parameter).

$$P_{\text{ow, avg}}(X) = \sum_{l=l_{\min}}^{l_{\max}} P_{\text{ow}}(l, X) \cdot P_{\text{time}}(l, X)$$

• Timing error rate:  $N_{\rm err}(X)$ 

- Average interval between timing errors
- Similarly defined to MTBF (Mean Time Between Failures)

$$N_{\rm err}(X) = \frac{\text{Operating Time}}{\text{Number of failures}} = \frac{\sum_{l} N_{\rm rem}(l) \cdot \pi_{l}}{\sum_{l} N_{\rm rem}(l) \cdot \pi_{l} \cdot P_{\rm err}(l)}$$

# **Table of Contents**

- Background
- Systematic evaluation of power dissipation and timing error rate
- Experimental results
- Conclusion

#### **Experimental setup**

- Circuit: 32-bit ripple carry adder (S[0] S[32])
- Supply voltage: 300mV (subthreshold operation)
- Clock period  $T_c$  : 100ns (10MHz)
- Focus on adaptive speed control for temperature
  - Consider a temperature variation from 0°C to 80°C
  - Sweeping temperature from 0°C to 80°C by 1°C, evaluate worst N<sub>err</sub> and average P<sub>ow, avg</sub>
- $P_i(t)$ ,  $P_{all}(t)$  and  $P_{ow}$  are given as closed-form expressions.
  - derived by numerical fitting based on circuit simulations
- Speed control is implemented by body biasing.

# Experimental results (1/4)

- Trade-off between power dissipation and timing error rate
  - Buffer delay is changed with 5ns step at each inserted location.



#### Experimental results (2/4)

Inserted location of canary FF and power dissipation

- Constraint worst timing error rate  $N_{err} > 10^{14}$  cycles
- Insert canary FF with minimum buffer delay



# Experimental results (3/4)

- Monitoring cycle and power dissipation
  - Power can be reduced by lengthening monitoring cycle.





Too large  $N_{mon}$  deteriorates adjustment response to temperature change.

# Experimental results (4/4)

- Optimal design
  - Inserted location of canary FF is freely selected with optimum delay buffer.



# Conclusion

- Propose a framework that systematically evaluates power dissipation and timing error rate for selfadaptive circuits with timing error prediction
- Experiments using a 32-bit ripple carry adder
  - Reveal the trade-off between the timing error rate and the power dissipation
  - Demonstrate that the trade-off depends on design parameters and the optimal design parameters vary depending on required error rate