

# Exploring the Fidelity-Efficiency Design Space using Imprecise Arithmetic

Jiawei Huang John Lach University of Virginia 01/28/2011

#### Challenges in Nanometer CMOS Era

Power consumption



#### **Portable electronics**

- Reliability
  - PVT variation
  - Soft errors
  - Aging effects



#### State-of-the-art Solutions

- Power reduction:
  - Power/clock gating
  - DVS
  - Sub-Vt
  - Multi-Vth

- Design for manufacturability:
  - Design for the worst-case
  - Adaptive circuits with sensors
  - Locally asynchronous design
- Fault-tolerant design:
  - Error-correcting code (ECC)
  - Redundancy

#### **Overhead** !

#### Limitation of Current Solutions

• Max performance or min power set by critical path.



• Technology scaling exacerbates critical path variation.



#### What if We Sacrifice Correctness?

• Timing errors occur.



### What if We Sacrifice Correctness?

Timing errors occur.



- But applications may be able to tolerate errors.
  - **Digital signal processing** \_\_\_\_
  - Communication
  - Recognition, Mining and Synthesis \_\_\_\_
  - Numerical methods

**CDMA** 



6

### **Circuit Design that Allows Errors**

• BTWC (T. Austin et al., Michigan)

input  $\rightarrow$  imprecise component  $\rightarrow$  error corrector  $\rightarrow$  output

• ANT (B. Shim et al., UIUC)



• ERSA (S. Mitra et al., Stanford)

• Stochastic computation (N.R. Shanbhag et al., UIUC)

#### **Overview of This Work**

- Motivation
  - Lack of comparison of different imprecise circuits
  - Lack of design methodology of imprecise circuits
- Scope
  - Arithmetic circuits: adders (ACA, ETAIIM, VOS and reduced precision)
  - Errors from aggressive design (shorten critical path)
- Contributions
  - Imprecise circuits characterization
  - Comparison framework and design methodology
  - Imprecise design rules of thumb

#### Imprecise Adder (ACA)



#### Imprecise Adder (ETAIIM)



Error-tolerant Adder Type II M [N. Zhu et al.]

# Imprecise Adder (VOS)

• Voltage-overscaling [J. Sartori et al.]



 Reduced precision:
 32-bit adder
 24-bit adder
 24-bit adder
 Zero padding
 WC precision

#### **Error Characteristics of Imprecise Adder**

- ILM: Infrequent Large-magnitude error
- FSM: Frequent Small-magnitude error

ETAIIM, reduced precision

ACA, VOS









Error =  $E(K, BPB, L, W, V_{dd}, p)$ Delay =  $D(K, BPB, L, W, V_{dd}, p)$ Power =  $P(K, BPB, L, W, V_{dd}, p)$ Energy =  $D \cdot P$ Current =  $I_{nom}(K, BPB, L, W, V_{dd}, p)$ at  $Vdd_{nom}$  and  $p_{nom}$  $P = I_{nom} \cdot \frac{V_{dd}}{Vdd_{nom}} \cdot \frac{p_{nom}}{p} \cdot V_{dd}$ 



Error =  $E(K, BPB, L, W, V_{dd}, p)$ Delay =  $D(K, BPB, L, W, V_{dd}, p)$ Power =  $P(K, BPB, L, W, V_{dd}, p)$ Energy =  $D \cdot P$ Current =  $I_{nom}(K, BPB, L, W, V_{dd}, p)$ at  $Vdd_{nom}$  and  $p_{nom}$  $P = I_{nom} \cdot \frac{V_{dd}}{Vdd_{nom}} \cdot \frac{p_{nom}}{p} \cdot V_{dd}$ 

### Model Fitting

• Remove variables that have no effect.

$$I_{ACA}(K, B \not \! B , \not \! K, \not \! N, V_{dd}, p)$$

• Determine function types based on implementation knowledge.

$$I_{ACA}(K, V_{dd}, p) = (A \log_2 K + B \cdot K + C) \cdot \frac{V_{dd}^2}{p}$$

Curve-fitting



### Pareto Frontier Solver

- Pareto frontier: the collection of best possible (E,D,P) points
   I
   error delay power
- Small-sized problems: exhaustive search
  - Sweep the ranges of all variables, find the points not dominated by others
- Sensitivity balancing [D. Markovic et al., Z. Qi et al.]:

| $\partial E_{ETAIIM}(BPB, L, V_{dd})$ | $\partial D_{ETAIIM}(BPB, L, V_{dd})$ | $\partial P_{ETAIIM}(BPB, L, V_{dd})$ |     |
|---------------------------------------|---------------------------------------|---------------------------------------|-----|
| $\partial BPB$                        | $\partial BPB$                        | $\partial BPB$                        |     |
| $\partial E_{ETAIIM}(BPB, L, V_{dd})$ | $\partial D_{ETAIIM}(BPB, L, V_{dd})$ | $\partial P_{ETAIIM}(BPB, L, V_{dd})$ | - ( |
| $\partial L$                          | $\partial L$                          | $\partial L$                          | - 0 |
| $\partial E_{ETAIIM}(BPB, L, V_{dd})$ | $\partial D_{ETAIIM}(BPB, L, V_{dd})$ | $\partial P_{ETAIIM}(BPB, L, V_{dd})$ |     |
| $\partial V_{dd}$                     | $\partial V_{_{dd}}$                  | $\partial V_{dd}$                     |     |

# Case Study: CORDIC

- Goal
  - to test the effectiveness of various imprecise design techniques for implementing a real application
  - □ under the same accuracy constraint

Power-delay Pareto frontier
Energy-delay product

□ to discover rules of thumb for choosing imprecise designs

# CORDIC

- Compute any elementary function using <u>shift</u> and <u>add</u> + - × / sin cos arctan sinh ln exp  $\sqrt{-\cdots}$
- Iterative algorithm

$$X_{i+1} = X_i - Y_i \cdot d_i \cdot 2^{-i}$$

$$Y_{i+1} = Y_i + X_i \cdot d_i \cdot 2^{-i}$$
precomputed
$$Z_{i+1} = Z_i - d_i \cdot \tan^{-1}(2^{-i})$$
where  $d_i = -1$  if  $Z_i < 0$ , +1 otherwise

• To compute  $\sin \theta$  • To compute  $\sqrt{\omega}$ 

set 
$$\begin{cases} X_0 = x_0 \\ Y_0 = 0 \\ Z_0 = \theta \end{cases}$$
 get 
$$\begin{cases} X_n = K_1 x_0 \cos \theta \\ Y_n = K_1 x_0 \sin \theta \end{cases}$$

set 
$$\begin{cases} X_0 = \omega + \frac{1}{4} \\ Y_0 = \omega - \frac{1}{4} \\ Z_0 = \text{don't care} \end{cases}$$
 get 
$$\begin{cases} X_n = K_{-1} \sqrt{\omega} \\ Y_n = 0 \end{cases}$$
 20

# **Our CORDIC Architecture**



| Data<br>width                                                           | CORDIC<br>function | Candidate adders                                                                                                                                            | Fidelity<br>metric        | Technology |    |
|-------------------------------------------------------------------------|--------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------|------------|----|
| 32 bit                                                                  | sine<br>sqrt       | <u>Precise</u> : RCA-32, KSA-32<br><u>Imprecise</u> : ETAIIM-32, ACA-32<br><u>Reduced precision</u> : RCA-24, KSA-24<br><u>VOS</u> : VOS-RCA-32, VOS-KSA-32 | Mean<br>absolute<br>error | 130nm      |    |
| $\frac{\sin 0^{\circ} \sim \sin 90^{\circ}}{\sqrt{0} \sim \sqrt{0.75}}$ |                    | $V_{dd} = 0.5V \sim 1.2V$                                                                                                                                   | $\frac{\sum  O' - O }{n}$ |            | 21 |



#### Experimental Results (Pareto frontier)

$$MAE = 2^{-24}$$



23

#### **Energy/op-Delay Product Reduction**





| ETAIIM | ACA | VOS-RCA | VOS-KSA | RCA reduced precision | KSA reduced precision |
|--------|-----|---------|---------|-----------------------|-----------------------|
| 19%    | 11% | 5%      | 10%     | 70%                   | 63%                   |

#### Experimental Results (time plot)



Algorithm

25

# Conclusions

- There exists a fidelity-efficiency continuum for efficiency ۲ improvement when fidelity is relaxed.
- For CORDIC, simply lowering the precision gives the best fidelity-• efficiency tradeoff than more complex imprecise design techniques.
- Rules of thumb: ۲
  - FSM error  $\rightarrow$  low-power high-latency design
  - ILM error  $\rightarrow$  high-power low-latency design
- Future work: •
  - Need protection mechanism (e.g. input pattern detection) for iterative algorithms.
  - Error modeling. •
  - Combination of multiple imprecise techniques (e.g. VOS+ACA). 26

# Thank you!