#### A Low-Power High-Speed Accuracy-Controllable Approximate Multiplier

Tongxin Yang, Tomoaki Ukezono, Toshinori Sato

Fukuoka University, Japan Jan. 25, 2018



# Outline



# Introduction

Approximate multiplier with fixed accuracy

- Low power consumption
- High computing speed
- Why accuracy-controllable approximate multiplier? Application quality requirement may vary significantly, The above static approximate multiplier may,
  - fail to meet application quality requirement
  - waste power when high quality is not required

The objective is to control accuracy according to application requirements at runtime.



### **Approximate Tree Compressor**

#### Accurate half adder



$$c,s$$
 = a + b = 2c + s = (c + s) + c;

#### Incomplete Adder Cell (iCAC)



$$p = c + s; q = c;$$

Sum: {c,s} = p + q

4



### **Approximate Tree Compressor**

#### Structure of multi-bit iCACs





# **Approximate Tree Compressor**

Structure of an ATC with eight inputs (ATC-8)
ATC for partial product reduction (PPR)



*n* inputs can be reduced to n/2 **P**s and *one* **V**.



#### Equivalent circuit of a half adder





a half adder

equivalent circuit of a half adder



#### Carry-maskable half adder (CMHA)



> mask\_x =0(Approximate)

> mask\_x =1 (Accurate)

s = a XOR b;Cout = a AND b;



#### Carry-maskable full adder (CMFA)



Approximate condition: mask\_x =0, Cin =0 S = a OR b; Cout = 0;



#### Carry-maskable adder (CMA)

CMA for accuracy controllability



If  $m_x_0$ ,  $m_x_1$ , ..., and  $m_x_7$  are all "0", carry chain is masked to "0"



### **Accuracy-Controllable Multiplier**

#### 8-bit Accuracy-Controllable Multiplier

#### Stage1: ATC for PPR





### **Accuracy-Controllable Multiplier**





# **Experimental Setup**

#### **Experimental setup**

All multipliers are eight bits,

synthesized using the same condition

#### For power, delay, and area

- Library: NanGate 45nm
- RTL language: Verilog HDL
- Simulator: Synopsys VCS
- Synthesis: Synopsys Design Compiler
- Power Consumption: Synopsys Power Compiler
- Test pattern: 65,536
- Data changing rate: 0.5GHz

#### For accuracy

- NMED: Normalized Mean Error Distance
- RMED: Relative Mean Error Distance
- ER: Error Rate



Bits of OR gates

#### unmasked u bits for accurate result masked (7 – u) bits for approximate result

**Experiment** 



#### Experiment

### Accuracy

|       | Multipliers | NMED (%) | MRED (%) | ER (%) |
|-------|-------------|----------|----------|--------|
| Good  | m_7b        | 0.25     | 0.85     | 36.16  |
|       | m_6b        | 0.26     | 0.99     | 43.46  |
|       | m_5b        | 0.29     | 1.31     | 52.07  |
|       | m_4b        | 0.35     | 1.93     | 61.05  |
|       | m_3b        | 0.49     | 3.05     | 69.61  |
|       | m_2b        | 0.71     | 4.57     | 74.93  |
|       | m_1b        | 1.05     | 6.50     | 78.10  |
| Bad 🔱 | m_0b        | 1.64     | 9.02     | 80.02  |
|       | AMER_10b    | 0.20     | 0.62     | 31.59  |
|       | AMER_8b     | 0.24     | 1.16     | 55.44  |
|       | AMER_6b     | 0.46     | 3.23     | 71.12  |
|       | AMER_4b     | 1.20     | 7.53     | 79.54  |
|       | ACCI_M2     | 0.04     | 0.62     | 72.29  |



# **Power Consumption**

#### Power results relative to MRED



16



## **Critical Path Delay**

#### Delay results relative to MRED



17

# **Design Area**



Experiment



## **Image Processing**



# Summary

#### Objectives

- Low-power, High-speed multiplier
- Accuracy-controllable multiplier

#### Solutions

- Approximate Tree Compressor (ATC)
- Carry-maskable Adder (CMA)
- Results
  - 47.3% ~ 56.2% power reduction
  - 29.9% ~ 60.5% delay reduction
  - 19.7 ~ 52.5 PSNR controllability



# Thank you!

