A Software Technique to Improve Yield of Processor Chips in Presence of Ultra-Leaky SRAM Cells Caused by Process Variation

Maziar Goudarzi, Tohru Ishihara, Hiroto Yasuura

System LSI Research Center Kyushu University, Fukuoka, Japan

### Outline

#### Background

- Process variation
- Power in nanometer embedded processorbased systems
- Our work<sup>1</sup>
  - Motivation
  - Approach
  - Experiments

#### Summary and Future work

<sup>1</sup> This is part of the CREST "Ultra Low Power Design Projects" sponsored by Japan Science and Technology Corporation (JST), http://www.slrc.kyushu-u.ac.jp/~ishihara/CREST/e\_kenkyu.html



#### Both inter-die and intra-die variations become increasingly important!

\* Source: X. Li, J. Le, L. Pileggi, "Projection-Based Statistical Analysis of Full-Chip Leakage Power with Non-Log-Normal Distributions," DAC, 2006.

### Our Focus: Intra-die (Within-die) V<sub>th</sub> Variation

#### Large Intra-Die Variation Current 3-sigma = 13%V<sub>th</sub> 3-sigma = 67mV

### Variation is huge in small transistors

$$\sigma_{Vth} = \frac{q}{C_{ox}} \sqrt{\frac{N_a \cdot W_{dm}}{3 \cdot L \cdot W}}$$

*L, W*: Effective channel length and width *q*: electron charge  $C_{ox}$ : oxide capacitance  $N_a$ : substrate doping concentration  $W_{dm}$ : maximum depletion width



Eijiro Toyoda, "DFM: Device & Circuit Design Challenges", Int'l Forum on Semiconductor Technology, 2004

### Unavoidable Cause of V<sub>th</sub> Variation: Random Dopant Fluctuation (RDF)





Source: S. Borkar

Nature of variations
 – Systematic
 – Random
 ITBS 2005 roodmore

ITRS-2005 roadmap forecast



### Our Focus: Leakage Power

#### Power consumption

- Dynamic
  activity-based
- Static (leakage)
  - activity-independent
- Trend
  - Traditionally:
    Dynamic >> Static
    Nanometer technologies
    Static >> Dynamic





### Our Focus: Caches Memories

#### Largest portion of chips

- biggest leakage
- Minimum-area transistors

=>

=>

## most susceptible to process variation



#### PowerPC<sup>TM</sup> 40% of core area



#### StrongARM-110<sup>™</sup> 75% of core area



### **Process Variation at 90nm**

$$I_{Subthreshold} \propto \frac{W \cdot V_T^2}{T_{ox} \cdot L} \cdot \exp\left(\frac{-V_{th}}{\alpha \cdot V_T}\right)$$

 $V_T$ : Thermal voltage (25mV@room temperature)  $\alpha$ : Sub-threshold factor (1.40~1.65)

| Year | min. <i>L</i> [nm] | <sup>1</sup> V <sub>TH</sub> [V] | <sup>2</sup> V <sub>TH</sub> [V] |
|------|--------------------|----------------------------------|----------------------------------|
| 2004 | 37 (90)            | 0.32                             | 0.12                             |
| 2005 | 32 (80)            | 0.33                             | 0.09                             |
| 2006 | 28 (70)            | 0.34                             | 0.06                             |



### **Ultra-Leaky SRAM Cells Problem**

Ultra-Leaky Cache Cells and Ultra-leaky Cache Lines: Those containing one or more ULT

#### Problem

- Ultra-leaky cache cells dissipate lots of power
- Especially for longstandby applications, cause rapid discharge of battery



### Ultra-Leaky SRAM Cells Problem (cont'd)

#### Naïve solution

- Mark as faulty, replace with spare row/column
- Disadvantages
  - Spares may be leaky themselves
  - Spares should replace slow/faulty cells as well
  - Fuse-blowing expensive and slow
  - Aging may introduce ULTs over time
  - Temperature may also introduce ULTs



### Our Fundamental Observation: Cell Leakage is Value-Dependant



### Flow of Operations



### **Offline Testing Phase**

#### Goal:

- Detect location of ULTs
- Location accuracy: cache line or cache cell

#### Idea

 $-\Delta I_{DDQ}$  Testing:

If the leaky cell is sensitized, the quiescent current reflects an abnormal change.

#### General outline

 Write all 0's, then all 1's to every cache line and measure the leakage current

### Improvement in Leakage Yield

Leakage Yield = % of chips meeting a given leakage constraint



Nominal transistor leakage =0.345 nA

Experiments:

- Monte Carlo simulation
- 1000 chips
- 32 Kb data + 22 Kb tag
- 60mv within-die  $V_{th}$  variation
- Nominal values from a 90nm process

$$V_{th}=320mv$$

# Maximum Leakage Power Saving vs. Within-die Variation



Nominal transistor leakage =0.345 nA

### **Associated Costs**

| Costs       | Why to pay                                                               | When to pay                             |
|-------------|--------------------------------------------------------------------------|-----------------------------------------|
| Power       | Run instructions to store<br>leakage-safe values in<br>leaky cache lines | When going to standby mode              |
| Performance | Invalidated, but<br>later-referenced, cache<br>contents                  | After returning<br>from standby<br>mode |
| Area        | Leakage-measurement<br>on-chip circuitry                                 | Chip design & manufacturing             |

### Analysis of Costs

#### Energy benefit & Performance cost linearly depend on the number of leaky cells cured (*N*)

$$EnergySaving(t) = N \times \left(P_{leak} \times t - E_{lock} - E_{fetch}\right)$$
$$Perf.Penalty \le N \times (T_M - T_c)$$

*N*: Number of leaky cells cured *t*. Time duration spent in standby  $P_{leak}$ : Avg. power saved per cured cache line  $E_{lock}$ : Energy for locking leakage-safe value in the cache  $E_{fetch}$ : Energy for fetching invalidated data if needed  $T_M$ : Memory access time  $T_c$ : Cache access time



#### Max. Performance Penalty (ns)

Results for M32R processor: 0.18u process, 200mW @ 50MHz Memory latency: 10 ns Cache latency: 1 ns

### Effect of the Processor Used

M32R - ARM920 ■ M32R ■ ARM920 0.2 4000 Leakage saving (nW) 0.18 3000 Minimum standby 0.16 duration (s) 0.14 2000 0.12 1000 0.1 0.08 0 0.06 111 0.04 9 *0*0 Ś 135 15<sup>5</sup> 0.02 Max. Perf. Penalty (ns) 0 900nA 200nA 100114 M32R: 0.18u, 200mW @ 50 MHz ARM920: 0.18u, 0.8mW / MHz 1 **ULT Leakage (nA)** 

### Sun Thanks! + Q&A Work

Presented a software technique to suppress, during standby mode, leakage of ultra-leaky transistors

- No major hardware/circuit change required
- Only uses already-popular cache-control instructions
- Useful even for dynamic effects such as aging and temperature

#### Results

- Reduced leakage power in standby mode
- Salvage chips containing ULTs => higher yield for long-standby low-power applications

#### Future work

Reduce leakage power, even in active mode, by matching cache contents with the less-leaky state of cache cells