### Improved Clock-Gating Control Scheme for Transparent Pipeline

#### J.H. Choi<sup>1</sup>, B.G. Kim<sup>2</sup>, A. Dasgupta<sup>3</sup>, and K. Roy<sup>2</sup>

<sup>1</sup>SoC Architecture Lab, DMC R&D Center, Samsung Electronics, Korea
 <sup>2</sup>Electrical and Computer Eng., Purdue University, W. Lafayette, IN, USA
 <sup>3</sup>SoC Enabling Group, Intel Corporation, Austin, TX, USA

## Outline

Introduction
Previous Works
Proposed Approach
Simulation Results
Conclusion

## Introduction

### What is the clock-gating?

- Reduce power by blocking clock pulses to inactive logic blocks
- Used for dynamic power reduction
- Applicable in different levels of design hierarchy from the individual register level to the system level



## **Stage-Level Clock-Gating**

Traditional stage-level clock-gating



# Stage-Level Clock-Gating (cont.)

Traditional stage-level clock-gating



### **Transparent Pipeline**

[Jacobson, ISLPED'04]

#### Transparent pipeline

- Save clock power by dynamically making registers transparent
- Intermediate stages are replaced with transparent stages which can selectively bypass data.



Example: the 2<sup>nd</sup> stage bypass the data A driven by the 1<sup>st</sup> stage.



## **Transparent Pipeline (cont.)**



[C: clocked, U: clock-gated (hold), T: transparent (bypassing)]

# **Collapsible Pipeline**

[Shimada et al., ISLPED'03]

Adjust pipeline depth with *transparent* stages

Normal operation with freq. = F



Clock power saving in *shallow* mode
No power reduction in normal operation

## Contributions

#### Previous transparent pipeline [Jacobson, ISLPED'04]

- Limited to two consecutive transparent stages
- Transparent stages need two separate clock lines for separate control of master and slave latch.
  Flip flop





Proposed approach

- Improved control logic for transparent pipeline
  - Applicable to any number of pipeline stages
  - Easily extended for pipeline collapsing
- Low overhead FF for transparent mode

### **Proposed Transparent Pipeline**



### **Pipeline Registers**

#### Normal (opaque) stage



- EN: Clock enable signal
- B: Transparent state bit
  - (0: transparent, 1: opaque)

Transparent stage



### **Pipeline Registers (cont.)**

#### Schematic of static D-FF



## Pipeline Register (cont.)

D-FF with transparent mode



## **Proposed Transparent Pipeline**



- Criteria for correct functionality (to avoid race conditions): At least one opaque stage between two different data sets
- Req signal: Generated when a stage needs to accept new data set, to notify that other stage in the downstream should take the data.

## **Control Logic**

Control logic for transparent stage registers



| Inputs                                         | Outputs                            |  |  |  |  |  |  |  |
|------------------------------------------------|------------------------------------|--|--|--|--|--|--|--|
| <b>B_curr</b> : current state (0: transparent) | B_next: next state                 |  |  |  |  |  |  |  |
| Valid: data valid bit                          | EN: clock enable signal            |  |  |  |  |  |  |  |
| <b>Req_prev:</b> from the prev stage           | <b>Req_next:</b> to the next stage |  |  |  |  |  |  |  |

# Control Logic (cont.)

#### Truth table and gate-level implementation

when both Valid and Req\_prev are 1.



16

# **Control Logic (cont.)**

#### Working example in 5-stage pipeline

| Time | 1st<br>(opaque) |   |    | 2nd<br>(trans) |    |    |   |    | 3rd<br>(trans) |    |    |   |    | 4th<br>(trans) |    |    |   |    | 5th<br>(opaque) |   |    |
|------|-----------------|---|----|----------------|----|----|---|----|----------------|----|----|---|----|----------------|----|----|---|----|-----------------|---|----|
| step | V               | Е | R+ | V              | B- | B+ | Е | R+ | V              | B- | B+ | Е | R+ | V              | B- | B+ | Е | R+ | V               | Е | R+ |
| (1)  | 1               | 1 | 1  | 0              | 0  | 0  | 0 | 1  | 0              | 0  | 0  | 0 | 1  | 0              | 0  | 0  | 0 | 1  | 0               | 0 | 0  |
| (2)  | 0               | 0 | 0  | 1              | 0  | 0  | 0 | 0  | 0              | 0  | 0  | 0 | 0  | 0              | 0  | 0  | 0 | 0  | 0               | 0 | 0  |
| (3)  | 0               | 0 | 0  | 0              | 0  | 0  | 0 | 0  | 1              | 0  | 0  | 0 | 0  | 0              | 0  | 0  | 0 | 0  | 0               | 0 | 0  |
| (4)  | 1               | 1 | 1  | 0              | 0  | 0  | 0 | 1  | 0              | 0  | 0  | 0 | 1  | 1              | 0  | 1  | 1 | 0  | 0               | 0 | 0  |
| (5)  | 0               | 0 | 0  | 1              | 0  | 0  | 0 | 0  | 0              | 0  | 0  | 0 | 0  | 0              | 1  | 1  | 0 | 0  | 1               | 1 | 1  |
| (6)  | 0               | 0 | 0  | 0              | 0  | 0  | 0 | 0  | 1              | 0  | 0  | 0 | 0  | 0              | 1  | 1  | 0 | 0  | 0               | 0 | 0  |
| (7)  | 0               | 0 | 0  | 0              | 0  | 0  | 0 | 0  | 0              | 0  | 0  | 0 | 0  | 1              | 1  | 0  | 0 | 1  | 0               | 0 | 0  |
| (8)  | 0               | 0 | 0  | 0              | 0  | 0  | 0 | 0  | 0              | 0  | 0  | 0 | 0  | 0              | 0  | 0  | 0 | 0  | 1               | 1 | 1  |





1st

2nd

3rd

5th

4th

# **Control Logic (cont.)**

#### Extension for collapsible pipeline







### **Simulation Results**

 Comparison with the previous transparent pipeline technique [Jacobson, ISLPED'04]

- Vdd=1.2V, f=500MHz, IBM 90nm technology
- Pipeline utilization varies from 0.1 to 0.9.



19

## Simulation Results (cont.)

Power saving over traditional stage-level clock-gating

- Various pipeline depth and width are considered.
- Power overhead of control logics is included.



- More power saving from wider and deeper pipelines
- At high utilization (0.9), the proposed approach consumes more power due to control overhead and increased glitch.

## Simulation Results (cont.)

#### Extension to collapsible pipeline

- 64-bit 5-stage pipeline (2nd and 4th stages are collapsible)
- f<sub>normal</sub>=500MHz, f<sub>shallow</sub>=250MHz



## Summary

- Clock-gating technique for dynamic power reduction
  - Compact control logic for transparent pipeline
  - Applicable to any number of pipeline stages
  - Transparent FF with low hardware overhead
  - Energy/performance trade-off through pipeline stage collapsing