Switching Activity Driven Gate Sizing and Vth Assignment for Low Power Design Yu-Hui Huang **Po-Yuan Chen** TingTing Hwang

> National Tsing Hua University Taiwan

- Introduction and Motivation
- Related Work
- Algorithm
- Experimental Result
- Conclusion

### Introduction

- Power = Active Power +
  - (1 ) Idle Power
- Active Power
  - Dynamic power
    - Gate sizing
  - Leakage power
    - Vth Re-assignment
- Idle Power
  - Leakage power
- Minimize total power

### Motivation

- To enhance the performance of a circuit, we can size-up gates or replace the Vth of gates from high to low.
  - Size-up :

increase dynamic power and small leakage power

- Replace the Vth of cells from high to low : Increase leakage power
- Which one is better?
  - Depends on the switching activity of a gate.

## Motivation

|      | Inverter<br>A | Inverter<br>B |
|------|---------------|---------------|
| Vth  | _             | lower         |
| Size | larger        | _             |

- Inverter A and B have same delay and output loading
- Comparison function

dyn(A) - dyn(B)

lea(B) - lea(A)



## Motivation

#### Switching Activity

- Gate Sizing
- Vth re-assignment

| Switching   | Ratio (%) |      |      |      |      |      |         |
|-------------|-----------|------|------|------|------|------|---------|
| activity () | TOP       | MAC  | AVG  | GCC  | RSA  | AES  | Average |
| 0 % < <22%  | 71.0      | 48.9 | 70.9 | 55.3 | 84.5 | 60.8 | 65.3    |
| 22 % <      | 29.0      | 51.1 | 29.1 | 44.7 | 15.5 | 39.2 | 34.7    |

- Introduction and Motivation
- Related Work
- Algorithm
- Experimental Results
- Conclusions

### **Related Work**

- Previous work focused on minimizing power on non-critical path.
- We can minimize power both on critical path and non-critical path.
  - On critical path:
    - We can re-assign Vth to high and up-size gates which has small switching activity.
  - On non-critical path:
    - Slack can be used to down-size gate or assign Vth to high.

- Introduction and Motivation
- Related Work
- Algorithm
- Experimental Result
- Conclusion

## Algorithm –Design Flow





## Algorithm for Critical Path (Step 3.1)



## Step 3.1: Constructing Path Balanced Graph

- Path-Balanced Graph
  - Yutaka Tamiya, "Performance Optimization Using Separator Sets", ICCAD 1999

 $ds(e) = slack(head _ node(e)) - slack(tail _ node(e))$ 



## Step 3.1: Computing Cost

Set cost of each node

 $cost(g) = \gamma * penalty(g) +$ 

 $\delta * delay_reduction(g)$ 

penalty(g) =  $\alpha * p_penalty(g) + \beta * a_penalty(g)$ 

 $p_penalty(g) = per * ($ Active mode  $\sum_{j \in fanin(g)} E(j) * C_{inc}(g) * V^2$   $+ leak_{inc}(g))$ Idle mode  $+ (1 - per) * leak_{inc}(g)$  E(j) is the transition density of node j

## Step 3.1: Finding Separator Set

 Find separator set of minimal cost in the graph
Delay improvement min{0.7,0.5} = 0.5



(x,y,z) means (slack ,delay-reduction ,cost)

## Step 3.1: Finding Separator Set

 Find separator set of minimal cost in the graph







# Step 3.2: Computing Penalty



delay-penalty(g) = Delay(new\_g) – Delay(g)

p\_saving(g) = p\_penalty(g) - p\_penalty(new\_g)

# Step 3.2: Replacing Cell



down-sizing or re-assigning Vth to high

 If only one delay penalty of two options is less than available slack ?



- choose the available one
- If delay penalties of both options are less than available slack ?

depends on the larger power saving

- Introduction and Motivation
- Related Work
- Algorithm
- Experimental Result
- Conclusion

#### Benchmarks

| Cir. | Cell<br>Count | Characteristics                 |
|------|---------------|---------------------------------|
| ТОР  | 463           | An Alarm Clock                  |
| MAC  | 2425          | Multiplier and Accumulator      |
| AVG  | 6361          | Average Number Calculator       |
| GCC  | 8204          | Gravity Center Calculator       |
| RSA  | 14815         | Asymmetric Crypto-<br>Processor |
| AES  | 16824         | Advanced Encryption Core        |

### • TOOLS

- DesignCompiler
- TSMC 0.13um library
- PrimeTime
- PrimePower

#### Power saving

#### is the fraction of active time

| Circuit | = 100%        |        | =             | 50%    | = 10%         |        |
|---------|---------------|--------|---------------|--------|---------------|--------|
|         | <i>P</i> (mW) | Red    | <i>P</i> (mW) | Red    | <i>P</i> (mW) | Red    |
| ТОР     | 0.363         | 11.95% | 0.179         | 14.24% | 0.0371        | 19.12% |
| MAC     | 0.790         | 18.56% | 0.397         | 21.09% | 0.0837        | 35.97% |
| AVG     | 1.65          | 5.75%  | 0.835         | 8.79%  | 0.211         | 14.84% |
| GCC     | 0.753         | 6.48%  | 0.412         | 8.69%  | 0.142         | 15.46% |
| RSA     | 2.12          | 39.20% | 1.08          | 41.40% | 0.239         | 53.50% |
| AES     | 13.4          | 15.60% | 6.70          | 16.99% | 1.41          | 21.33% |
| Average |               | 16.26% |               | 18.53% |               | 26.70% |

#### Time Penalty

#### is the fraction of active time

| Circuit | Original<br>T | = 100% |              | = 50% |              | = 10% |              |
|---------|---------------|--------|--------------|-------|--------------|-------|--------------|
|         |               | Τ'     | T<br>penalty | Τ'    | T<br>penalty | Τ'    | T<br>penalty |
| ТОР     | 1.43          | 1.37   | -4.2%        | 1.39  | -2.8%        | 1.40  | -2.1%        |
| MAC     | 3.30          | 3.33   | 0.8%         | 3.33  | 0.8%         | 3.33  | 0.8%         |
| AVG     | 23.78         | 23.13  | -2.7%        | 23.46 | -1.3%        | 23.54 | -1.0%        |
| GCC     | 26.30         | 26.65  | 1.3%         | 26.73 | 1.6%         | 26.34 | 0.2%         |
| RSA     | 10.00         | 10.08  | 0.8%         | 10.03 | 0.3%         | 10.10 | 1.0%         |
| AES     | 2.29          | 2.21   | -3.5%        | 2.27  | -0.9%        | 2.27  | -0.9%        |

- Introduction and Motivation
- Related Work
- Algorithm
- Experimental Result
- Conclusion

### Conclusion

- Switching activity of a gate plays an important role in making decision to choose gate sizing or Vth assignment.
- Under the timing constraint, our circuit have 16% and 18% improvement as compared to the original circuits where the fraction of active time are 100% and 50%, respectively.

Thank you