# **ASP-DAC 2006**

**Session 8C-5: Inductive Issues in Power Grids and Packages** 

# **Controlling Inductive Cross-talk and Power in Off-chip Buses using CODECs**

Authors:

Brock J. LaMeres Kanupriya Gulati, Sunil P. Khatri Agilent Technologies Texas A&M University Texas A&M University

# Motivation

#### • Power delivery is the biggest challenge facing designers entering DSM

- The IC core current continues to increases (P4 = 80Amps).
- The package interconnect inductance limits instantaneous current delivery.
- The inductance leads to ground and power supply bounce.

#### • SSN on signal pins is the leading cause of inter-chip bus failure

- Ground/power supply bounce causes unwanted switching.
- Mutual Inductive cross-talk causes edge degradation which limits speed.
- Mutual Inductive cross-talk causes glitches which results in unwanted switching.

#### • Further, power in off-chip buses can be significant.

- Large percentage of power may be consumed in the output stages

#### • Aggressive package design helps, but is too expensive:

- Flip-Chip technology can reduce the interconnect inductance.
- Flip-Chip requires a unique package design for each ASIC.
- This leads to longer process time which equals cost.
- 90% of ASIC design starts use wire-bonding due to its low cost.
- Wire-bonding has large parasitic inductance that must be addressed.

# **Our Solution**

# "Encode Off-Chip Data to Avoid Inductive Cross-talk & Power Consumption"

- Avoid the following cases:
- 1) Excessive switching in the same direction = reduce ground/power bounce
- 2) Excessive X-talk on a signal when switching
- 3) Excessive X-talk on signal when static
- 4) At the same time, limit the number of transitions = reduce power

= reduce edge degradation

= reduce glitching

# **Our Solution**

- This results in:
- 1) A subset of vectors is transmitted that avoids inductive X-talk & power.
- 2) The off-chip bus can now be ran at a higher data rate.
- 3) The subset of vectors running faster can achieve a higher throughput over the original set of vectors running slower.



# Agenda

- 1) Inductive X-talk & Power
- 2) Terminology
- 3) Methodology
- 4) Experimental Results
- 5) Conclusion

#### 1) Inductive X-Talk

# **Supply Bounce**

•The instantaneous current that flows when signals switch induces a voltage across the inductance of the power supply interconnect following:

$$V_{bnc} = L \cdot \left(\frac{di}{dt}\right)$$

•When more than one signal returns current through one supply pin, the expression becomes:

$$V_{bnc} = L \cdot \sum_{j} \left( \frac{di}{dt} \right)$$

**NOTE:** Reducing the number of signals switching in the same direction at the same time will reduce the supply bounce.

#### 1) Inductive X-Talk

# Glitching

• Mutual inductive coupling from neighboring signals that are switching cause a voltage to induce on the victim that is static:

$$V^{i}_{glitch} = \pm M_{ik} \cdot \left(rac{di_{k}}{dt}
ight)$$

•The net coupling is the summation from all neighboring signals that are switching:

$$V_{glitch}^{i} = \sum_{k=1}^{m} \pm M_{ik} \cdot \left(\frac{di_{k}}{dt}\right) \qquad \qquad M_{ik} = K_{ik} \cdot \sqrt{L_{i} \cdot L_{k}}$$

NOTE: The mutual inductive coupling can be canceled out when two neighbors of equal  $K_{ik}$  switch in opposite directions. Also,  $K_{ik}$  is the mutual inductive coupling coefficient

## 1) Inductive X-Talk

# **Edge Degradation**

• Mutual inductive coupling from neighboring signals that are switching cause a voltage to be induced on the victim that is also switching. This follows the same expression as glitch coupling:

$$V_{glitch} = \sum_{1}^{k} \pm M_{1k} \cdot \left(\frac{di_{k}}{dt}\right)$$

• The mutual inductive coupling can be manipulated to cause a positive (negative) glitch for a rising (falling) signal.

• Mutual coupling can thus be exploited so as to *help* the transition resulting in a faster rise-time or fall-time (alternately, to *not hinder* the risetime of the transition)

#### 1) Power

# **Power Consumption**

• The power consumed in the output stage is proportional to the capacitance being driven, the output voltage swing, and the switching frequency.

$$p_{pin} = C \cdot V_{DD}^2 \cdot f$$

**NOTE:** Power is proportional to the number of switching pins.



#### **Define the following:**

- *n* = width of the bus segment
  where each bus segment consists of *n*-2 signals
  and 1 VDD and 1 Vss.
- j = the segment consisting of an n-bit bus. j is the segment under consideration. j-1 is the segment to the immediate left. j+1 is the segment to the immediate right. each segment has the same VDD/Vss placement.



 $V_i^j$  = the transition (vector sequence) that the *i*<sup>th</sup> signal in the *j*<sup>th</sup> segment is undergoing, where

$$V_i^j = 1 = rising edge$$
  
 $v_i^j = -1 = falling edge$   
 $v_i^j = 0 = signal is static$ 

This 3-valued algebra enables us to model mutual inductive coupling of any sign

January 27, 2006

## 2) Terminology

#### Define the following coding constraints:

#### **Supply Bounce**

if  $v_i^j$  is a supply pin, the total bounce on this pin is bounded by  $P_{bnc}$ .  $P_{bnc}$  is a user defined constant.

#### **Glitching**

if  $v_i^j$  is a signal pin and is static ( $v_i^j = 0$ ), the total magnitude of the glitch from switching neighbors should be less than  $P_0$ .  $P_0$  is a user defined constant.

#### **Edge Degradation**

if  $v_i^j$  is a signal pin and is switching  $(v_i^j = 1/-1)$ , the total magnitude of the coupling from switching neighbors should be greater than  $P_1/P_{-1}$ . This coupling should not hurt (should aid) the transition.  $P_1/P_{-1}$  is a user defined constant.

2) Terminology - Power

**Define the following coding constraints:** 

#### **Power**

for a given segment *j*, the total power consumption on that segment is bounded by *Ppower*. *Ppower* is a user defined constant.



Also define the following:

- p = how far away to consider coupling (ex., p = 3, consider  $K_{11}$ ,  $K_{12}$ , and  $K_{13}$  on each side of the victim)
- $k_q$  = Magnitude of coupled voltage on pin *i* when its  $q^{th}$  neighbor *p* switches:

$$k_q = \left| \boldsymbol{M}_{ip} \cdot \left( \frac{di_p}{dt} \right) \right|$$



•For each pin  $v_i^j$  within segment *j*, we will write a series of constraints that will bound the inductive cross-talk magnitude.

•The constraints will differ depending on whether  $v_i^j$  is a signal or power pin.

•The coupling constraints will consider signals in adjacent segments (*j*+1, *j*-1) depending on *p*.

3) Methodology – Signal Pin Constraints

**<u>Glitching</u>** : coupling is bounded by  $P_{o}$ 

**Example:** 

 $v_2^{j} = 0$ , and p = 3. This means the three adjacent neighbors on either side of  $v_2^{j}$  need to be considered  $(v_4^{j-1}, v_0^{j}, v_1^{j}, v_3^{j}, v_4^{j}, v_0^{j+1})$ .

Note we use *modulo n* arithmetic (and consider adjacent segments as required).

$$v_{2}^{j} = 0 \text{ (static) } \overset{0}{} \overset{0}{} \overset{0}{} \overset{0}{} + k_{1} \cdot (v_{1}^{j}) + k_{1} \cdot (v_{3}^{j}) + k_{2} \cdot (v_{4}^{j}) + k_{3} \cdot (v_{0}^{j+1}) \leq P_{0}$$

The constraint equation is tested against each possible transition and the transitions that violate the constraint are eliminated.

## 3) Methodology – Signal Pin Constraints

# **Edge Degradation** : coupling is bounded by *P*<sub>1</sub> and *P*<sub>-1</sub>

**Example:** 

 $v_2^{j} = 1$  or -1, and p = 3. This means the three adjacent neighbors on either side of  $v_2^{j}$  need to be considered  $(v_4^{j-1}, v_0^{j}, v_1^{j}, v_3^{j}, v_4^{j}, v_0^{j+1})$ .

$$v_{2}^{j} = 1 \text{ (rising) } \overset{0}{k_{3}} (v_{2}^{j/1}) + k_{2} (v_{0}^{j}) + k_{1} (v_{1}^{j}) + k_{1} (v_{3}^{j}) + k_{2} (v_{4}^{j}) + k_{3} (v_{0}^{j+1}) \ge P_{1}$$

Again, the constraint equations are tested against each possible transition and the transitions that violate the constraints are eliminated.

## 3) Methodology – Power Pin Constraints

**Supply Bounce** : coupling is bounded by  $P_{bnc}$ 

**Example:** 

 $v_0^{j}$  =VDD or VSS. The total number of switching signals that use  $v_0^{j}$  to return current must be considered. Due to symmetry of the bus arrangement, signal pins will always return current through two supply pins. i.e.,  $(v_0^{j-1} \text{ and } v_0^{j})$  or  $(v_d^{j} \text{ and } v_d^{j+1})$ . This results in the self inductance of the return path being divided by 2. Let z = |L di/dt| for any pin. Then,  $v_0^{j} = \text{VDD}$ 

$$(z/2) \cdot (\# \text{ of } v_i^j \text{ pins that are } 1) \leq P_{bnc}$$

$$v_i^j = Vss$$

$$(z/2) \cdot (\# \text{ of } v_i^j \text{ pins that are } -1) \leq P_{bnc}$$

$$v_i^{j} = V_{bnc}$$

$$(z/2) \cdot (\# \text{ of } v_i^j \text{ pins that are } -1) \leq P_{bnc}$$

3) Methodology – Power Constraints

**Power Consumption** : consumption is bounded by *P*power

**Example:** 

For segment j. The total number of switching signals can be constrained to reduce power.

Segment j (# of  $v_i^j$  pins that are 1 or -1)  $\leq P_{power}$  3) Methodology – Constructing Legal Vectors Sequences

- For each bit in the *j*<sup>th</sup> segment bus, constraints are written.
- If the pin is a signal, 3 constraint equations are written;
  v<sub>0</sub><sup>j</sup> = 0, the bit is static and a *glitching constraint* is written
  v<sub>0</sub><sup>j</sup> = 1, the bit is rising and an *edge degradation* constraint is written.
  v<sub>0</sub><sup>j</sup> = -1, the bit is falling and an *edge degradation* constraint is written.
- If the pin is VDD, 1 constraint equation is written to avoid *supply bounce*.
- If the pin is Vss, 1 constraint equation is written to avoid ground bounce.
- For the segment, 1 constraint equation is written to constrain *power*.

3) Methodology – Constructing Legal Vectors Sequences

• This results in the total number of constraint equations written is:

$$(3 \cdot n - 3)$$

• Each equation must be evaluated for each possible transition to verify if the transition meets the constraints. The total number of transitions that are evaluated depends on *n* and *p*:

$$3^{(n+2p-6)}$$

• This follows since there are *n*-2 signal pins in the segment *j*, and 2*p*-4 signal pins in neighboring segments.

• The values of *n* and *p* are small in practice, hence this is tractable.

**3) Methodology – Constructing the CODEC** 

- The remaining legal transitions are used to create the CODEC.
- The total number of remaining legal transitions will depend on how aggressive the user-defined constants are chosen  $(P_0, P_1, P_{-1}, P_{bnc}, P_{power})$
- From the remaining legal transitions, find the effective bus width *m* that can be encoded using a physical bus of width *n*, using a memory-based CODEC.
  - Utilize a fixpoint computation

**3) Methodology – Constructing the CODEC** 

`010`

- Represent remaining legal transitions in a digraph
- Algorithm to find CODEC:
- Let *n* = size of physical bus
- Let *m* = size of effective bus
- Then the digraph of legal transitions of the *n* bit bus can encode an *m* bit bus (m < n) iff
  - -We can find a closed set S of nodes such that
    - $|S| \ge 2^m$

• Each vertex *s* in *S* has at least 2<sup>*m*</sup> out-edges (including self-edges) to vertices *s*' in *S* 

• Now we can synthesize the encoder and decoder (memory based).

4) Experimental Results – 5 Signal Pins

**Example Bus:** n=7, p=2



Aggressive Encoding Non-Aggressive Encoding Power Encoding

#### Po, P1, P-1, Pbnc

5% of VDD 12.5% of VDD 20% of Max

# 4) Experimental Results – Constraint Equations

# **Transitions Eliminated due to Rule Violations**

| Rule(s) Violated  |                   |                       |  |  |
|-------------------|-------------------|-----------------------|--|--|
| <u>Transition</u> | Aggressive        | <u>Non Aggressive</u> |  |  |
| 011               | violates 1,4      | _                     |  |  |
| 0-1-1             | violates 4,11     | -                     |  |  |
| 101               | violates 1,7      | -                     |  |  |
| 110               | violates 1,10     | -                     |  |  |
| 111               | violates 1,2,5,8  | violates 11           |  |  |
| 11-1              | violates 1        | -                     |  |  |
| 1-11              | violates 1        | -                     |  |  |
| 1-1-1             | violates 11       | -                     |  |  |
| -10-1             | violates 7,11     | -                     |  |  |
| -111              | violates 1        | -                     |  |  |
| -11-1             | violates 11       | -                     |  |  |
| -1-10             | violates 10,11    | -                     |  |  |
| -1-11             | violates 11       | -                     |  |  |
| -1-1-1            | violates 3,6,9,11 | violates 1            |  |  |

• Encoded data avoids Inductive X-talk pattern



0.15 Original 0.12 -- Aggressive 0.09 Non-Aggressive - -0.06 0.03 **(Sold Series**) 0.00 **(Sold Series**) 0.00 **(Sold Series**) 0.00 \_\_\_\_\_ -0.06 1 -0.09 -0.12 -0.15 1.3 0.0 0.3 0.5 0.8 1.0 1.5 1.8 2.0 2.3 2.5 Time (ns)

**Ground Bounce Simulation** 

January 27, 2006

0.50 0.40 Original 0.30 Aggressive --- Non-Aggressive 0.20 0.10 (**Solution**) 0.00 -0.10 -0.20 -0.30 -0.40 -0.50 0.5 1.3 1.8 0.0 0.3 0.8 1.0 1.5 2.0 2.3 2.5 Time (ns)

# **Glitch Simulation**

January 27, 2006

2.00 1.75 1.50 ----1.25 1.00 (**Solution**) 0.75 0.50 0.25 Original Aggressive 0.00 --- Non-Aggressive -0.25 -0.50 1.3 1.5 0.0 0.3 0.5 0.8 1.0 1.8 2.0 2.3 2.5 Time (ns)

**Edge Degradation Simulation** 

## 4) Experimental Results – CASE 2: Variable di/dt

- di/dt was swept for both the non-encoded and encoded configuration.
- the maximum di/dt was recorded that resulted in a failure.
- Failure : 5% of VDD (Aggressive) and 12.5% of VDD (Non-Aggressive)
- the maximum di/dt was converted to data rate and throughput.

|                                    | <u>Original</u> | Aggressive | Non-Aggr  |
|------------------------------------|-----------------|------------|-----------|
| Maximum di/dt:                     | 8 MA/s          | 19.9 MA/s  | 37 MA/s   |
| Maximum data-rate per pin:         | 133 Mb/s        | 333 Mb/s   | 667 Mb/s  |
| Effective bus width:               | 5               | 4          | 2         |
| Total Throughput:                  | 667 Mb/s        | 1332 Mb/s  | 1332 Mb/s |
| Improvement                        | -               | 100%       | 100%      |
| <b>Power Constraint (% of Max)</b> | 100%            | 20%        | 20%       |

4) Experimental Results – ASIC Synthesis

- A 0.13um, TSMC ASIC process was used.
- Delay and Area Extracted

|               | Bus Size (m) | Style      |                |
|---------------|--------------|------------|----------------|
|               | -            | aggressive | non-aggressive |
|               | 2            | 0.170      | N/A            |
| Delay (ns)    | 4            | 0.670      | 0.503          |
|               | 6            | 1.150      | 0.955          |
|               | 8            | 1.310      | 0.983          |
|               | 2            | 22         | N/A            |
| Area $(um^2)$ | 4            | 152        | 114            |
|               | 6            | 614        | 509            |
|               | 8            | 1,181      | 886            |

# 4) Experimental Results – FPGA Implementation

- A Xilinx, Virtex-II, 0.35um, FPGA was used.
- Delay and Area Extracted

|                | Bus Size (m) | Style                       |
|----------------|--------------|-----------------------------|
|                | -            | aggressive & non-aggressive |
|                | 2            | 0.351                       |
| Delay (ns)     | 4            | 1.020                       |
|                | 6            | 1.450                       |
|                | 8            | 1.610                       |
|                | 2            | < 1%                        |
| FPGA Usage     | 4            | < 1%                        |
| Elvood         | 6            | < 1%                        |
|                | 8            | < 1%                        |
|                | 2            | 3x, 2-Input FG's            |
| FPGA           | 4            | 6x, 4-Input FG's            |
| Implementation | 6            | 9x, 6-Input FG's            |
|                | 8            | 12x, 8-Input FG's           |

## 5) Conclusion

- Using a single mathematical framework, inductive X-talk & power constraints can be written that consider supply bounce, glitching, and edge degradation.
- This technique can be used to encode off-chip data transmission to reduce inductive X-talk & power to acceptable levels.
- It was demonstrated that even after reducing the effective bus size, the improvement in per pin data-rate resulted in an *increase* in throughput compared to a non-encoded bus.

# Thank you!