High Performance Current-Mode Differential Logic

Ling Zhang, Jianhua Liu, Haikun Zhu, Chung-Kuan Cheng, Univ. of California, San Diego Masanori Hashimoto Osaka University

## Outline

### Introduction

### Current-Mode Differential Logic (CMDL)

- Basic Concepts
- Structure of CMDL
- Examples
- Design Cases
  - 32-bit Multiplexer
  - 8-bit shifter
  - 16-bit adder
- Performance comparison
- Conclusions and future work

## Introduction

- Continuous scaling of semiconductor technology requires high performance circuit design:
  - Clock skew, wire delay and pipeline overhead increase:
    - Challenges on operation speed
  - □ Clock frequency becomes higher:
    - Challenges on power budgeting
  - $\Box$  design that has lower delay\*power is desired.

## Previous works

### Voltage-mode logic:

- Cascode Voltage Switch Logic (CVSL) [Heller et al. 1984]
- □ Complementary Pass-transistor Logic (CPL) [Yano et al. 1990],
- Low Voltage-Swing Logic (LVS) [Deleganes et al. 2004];

### Current-mode logic:

Dynamic Current-Mode Logic (DyCML) [Allam et al. 2001]

### Properties:

- Differential logic: Common
- □ Low swing vs. full swing output
  - Low swing output: LVS, CPL
  - Full swing output: DyCML, CVSL
- Pre-charge/reset required vs. not required
  - Pre-charge/reset not required: CPL
  - Pre-charge/reset required: others

### Cascode Voltage Switch Logic [Heller et al. 1984]

- Introduced the concept of differential logic.
- Full swing outputs
- □ Outputs are pre-charged to low.
- □ Improvements:
  - SODS [Acosta et al. 1995]
  - DCSL [Somasekhar et al. 1996]
  - CSDL [Park et al. 1999]



# Complementary Pass-transistor Logic [Yano et al. 1990]



- Differential inputs and outputs.
- Drain and gate inputs
- Low swing outputs: extra inverters are needed.
- No pre-charge phase. 2008-1-17

### Low Voltage Swing logic [Deleganes et al. 2004]



- Differential inputs and outputs
- Low swing outputs: sense amplifier is needed
- Reset operation is needed.
  - Transistor count doubles.
  - Load on clock increases.
  - Need one extra stage of logic for thru-gate.
  - □ A reset phase is inserted to each clock cycle.

### Dynamic Current-Mode Logic [Allam et al. 2001]



Differential inputs and outputs

- □ Full swing outputs
- Pre-charged is required.

## Current-mode differential logic



- Differential inputs and outputs
- Low swing outputs: sense amplifier is needed.
  - □ Fast operation speed
- Pre-charge/reset is not needed, only evaluation phase is needed.
  - □ Less number of transistors
  - □ No need for thru-gate: more headroom of logic depth
  - □ Low power consumption
- More reliable against noise effect

## Outline

### Introduction

### Current-Mode Differential Logic (CMDL)

- Basic Concepts
- Structure of CMDL
- Examples

### Design Cases

- □ 32-bit Multiplexer
- □ 8-bit shifter
- □ 16-bit adder
- Performance comparison
- Conclusions and future work

### Basic concepts of Current-mode logic

### 1. Fast operation speed

Reduced RC time constant

### 2. To maintain the low swing output

- Current-mode logic inherently enables low swing operation.
- □ For voltage-mode logic: reset operation is required.
- 3. To reduce the noise effect



### 3. Immunity of noise in current-mode logic

- Current-mode loop in steady state can be approximated by a resistor loop.
- Node a1 has voltage V1, which is subjected to a noise △ V.
  - □ Noise in high-branch:

$$\frac{\Delta V_{rs}}{\Delta V} = \frac{r_s}{r_{total} - r_1}$$

Noise in low-branch:

$$\frac{\Delta V_{rs}}{\Delta V} = -\frac{r_s}{r_{total} - r_1}$$



V1

## **Basic Design blocks of CMDL**



## Design rules for CMDL

- The internal and output nodes are low swing, and the output must be greater than 0.1v (For Vdd=1.0v)
  - □ To guarantee the low swing outputs:
    - For any input pattern, the differential inputs must be connected through a shunt resistor or a closed transistor.
  - $\Box$  To guarantee the differential output is larger than 0.1v:
    - For each pair of differential output, there shall be no other shunt resistors or close transistors on the active path.
- The DCN can have multiple inputs and multiple outputs.
- Any path from the input to output has at most six stages of logic.



#### 2-inputs NAND gate in CMDL





2008-1-17

#### 2-inputs NOR gate in CMDL



#### 2-inputs XOR gate in CMDL



2008-1-17

#### 4:1 MUX in CMDL



When  $S_0=1$ ,  $S_1=1$ , shunt transistor is needed to maintain the low swing at  $b_1$ , $b_1$ '



#### 2-bit adder in CMDL



Controlled shunt resistor is used to avoid multiple shunts along  $C_{in} \rightarrow C_{out}$  path

## Outline

### Introduction

### Current-Mode Differential Logic (CMDL)

- □ Basic Concepts
- □ Structure of CMDL
- □ Examples

### Design Cases

- 32-bit Multiplexer
- 8-bit shifter
- 16-bit adder
- Performance comparison
- Conclusions and future work

## 32-bit Multiplexer

- Use 4:1 MUX as building block.
- Build 16:1 MUX with five 4:1 MUX.
- Build 32:1 MUX with two 16:1 MUX and one 2:1 MUX.
- Maximal logic depth is five.

4:1



## 8-bit rotator/shifter

- Adopt the barrel shifter structure proposed in [Pereira et al. 1995].
- Can left rotate or shift the operand by 0 to 7 bits.
- Maximal logic depth is 4.
- Function correctness:
  - proper input pattern

| ls | lr | c1 | c2 | Out | action         |
|----|----|----|----|-----|----------------|
| 0  | 0  | 1  | 0  | ln1 | No shift       |
| 1  | 1  | 0  | 1  | ln2 | rotate         |
| 1  | 0  | 0  | 0  | 0   | Padding 0      |
| 0  | 1  | 1  | 1  | -   | Not<br>allowed |

Function table for RO/PA 2008-1-17



## 16-bit carry-skip adder

- Two kinds of cells:
  - □ Carry-skip cell (CS)
  - □ Full adder cell (FA)
- Primary inputs:
  - □ Carry propagation signal: Pi
  - Carry generation signal: Gi
  - Carry kill signal: Ki
  - □ Carry-skip control signal: Pij
- Maximal logic depth is six.



## Sense amplifier

- A traditional sense amplifier from textbooks
- When En signal is high, sense amp is pre-charged to low.
- The cross-coupled PMOS and NMOS pair provide positive feedback loop for quick restoring.



## Outline

### Introduction

### Current-Mode Differential Logic (CMDL)

- □ Basic Concepts
- □ Structure of CMDL
- □ Examples

### Design Cases

- □ 32-bit Multiplexer
- □ 8-bit shifter
- □ 16-bit adder
- Performance comparison
- Conclusions and future work

## Experiment settings

- Three different logics are compared:
  - □ CMOS logic (standard cell),
  - LVS logic
  - CMDL.
- Three design cases are compared:
  - □ 32-bit MUX, 8-bit shifter and 16-bit adder
- Simulation tool: Hspice
- Library: TSMC-90nm technology
- Inputs and outputs:
  - $\hfill\square$  inverters are used as inputs drivers and loads.
- Cycle time for each logic:
  - □ determined by the worst case delay.
- Sense amp outputs:
  - $\Box$  The high voltage of sense amp is greater than 0.8v.
- Power measurements:
  - □ 100 randomly generated input patterns are used.

## Performance comparison

|                | 32-bit MUX |       |       | 8-bit Shifter |       |       | 16-bit Adder |       |       |
|----------------|------------|-------|-------|---------------|-------|-------|--------------|-------|-------|
|                | CMOS       | LVS   | CMDL  | CMOS          | LVS   | CMDL  | CMOS         | LVS   | CMDL  |
| Cycle time(ps) | 200        | 215   | 180   | 200           | 210   | 180   | 800          | 350   | 380   |
| Delay(ps)      | 195.6      | 153.8 | 118.7 | 165.3         | 148.4 | 120.7 | 709.6        | 251.5 | 286.6 |
| Norm. Delay    | 1.00       | 0.79  | 0.61  | 1.00          | 0.90  | 0.73  | 1.00         | 0.35  | 0.40  |

### CMDL operates faster than CMOS

- □ Differential small signal.
- Diffusion Connected Network
- The speed of CMDL is comparable to LVS
  - $\Box$  Slower in adder case by 9%.
  - With the elimination of reset stage, the differential output needs to be charged from the opposite voltage level instead of zero.

## Performance comparison

|                     | 32-bit MUX |           |           | 8-bit Shifter |           |           | 16-bit Adder |           |           |
|---------------------|------------|-----------|-----------|---------------|-----------|-----------|--------------|-----------|-----------|
|                     | CMOS       | LVS       | CMDL      | CMOS          | LVS       | CMDL      | CMOS         | LVS       | CMDL      |
| Avg/Peak power(mW)  | 0.38/5.69  | 0.45/3.63 | 0.38/3.10 | 0.36/2.81     | 0.48/3.23 | 0.41/2.34 | 0.26/2.37    | 0.53/6.29 | 0.32/2.70 |
| Norm. Avg Power     | 1.00       | 1.18      | 1.00      | 1.00          | 1.33      | 1.14      | 1.00         | 2.04      | 1.23      |
| Input Power(mW)     | 0.15       | 0.17      | 0.16      | 0.08          | 0.09      | 0.09      | 0.03         | 0.04      | 0.04      |
| Load Power(mw)      | 0.004      | 0.001     | 0.004     | 0.01          | 0.04      | 0.04      | 0            | 0.04      | 0.04      |
| Sense Amp Power(mW) | -          | 0         | 0.002     | -             | 0.12      | 0.09      | -            | 0.17      | 0.13      |
| Logic Power(mW)     | 0.23       | 0.28      | 0.21      | 0.27          | 0.23      | 0.20      | 0.23         | 0.28      | 0.11      |

- CMDL is more power efficient than LVS
  - Power saving: 15%, 14% and 40% for three cases.
  - Due to the elimination of reset network
- CMDL dissipates more power than CMOS
  - Power increase: 14% and 23% for shifter and adder.
  - □ Static current: ~10uA
  - More overhead comes from inputs, loads and sense amps.

## Performance comparison

|                                         | 32-bit MUX |        |       | 8-bit Shifter |       |       | 16-bit Adder |        |       |
|-----------------------------------------|------------|--------|-------|---------------|-------|-------|--------------|--------|-------|
|                                         | CMOS       | LVS    | CMDL  | CMOS          | LVS   | CMDL  | CMOS         | LVS    | CMDL  |
| $Delay \times Power(fJ)$                | 74.33      | 69.21  | 45.11 | 59.51         | 71.23 | 49.49 | 184.50       | 133.30 | 91.71 |
| Norm. Delay $\times$ Power              | 1.00       | 0.93   | 0.61  | 1.00          | 1.20  | 0.83  | 1.00         | 0.72   | 0.50  |
| $Delay^2 \times Power(pJ \times ps)$    | 14.54      | 10.64  | 5.35  | 9.84          | 10.57 | 5.97  | 130.9        | 33.5   | 26.28 |
| Norm. Delay <sup>2</sup> $\times$ Power | 1.00       | 0.73   | 0.37  | 1.00          | 1.07  | 0.61  | 1.00         | 0.26   | 0.20  |
| Total Transistor Count                  | 312        | 322    | 162   | 392           | 316   | 226   | 393          | 450    | 315   |
| Transistor Overhead                     | 0          | 145.8% | 23.7% | 0             | 49.1% | 6.6%  | 0            | 50.5%  | 5.4%  |

### CMDL has the best delay\*power metric.

- $\Box$  The reduction is up to 50%.
- The delay<sup>2\*</sup>power is also reduced by up to 80%.
- CMDL has the smallest number of transistors.
  - Usage of Diffusion Connected Network
  - Elimination of the reset network

## Waveforms of different logics



### Conclusions and future work

- The effectiveness of CMDL is demonstrated by three design cases.
- Simulation results show that:
  - CMDL can achieve much better delay\*power and delay\*power.

### Next steps:

- Detailed experiments of energy overhead of CMDL on small circuit
- Noise test of CMDL
- Technology scaling
- Other possible alternative architectures

## Thank you