### CGTA: Current Gain-based Timing Analysis for Logic Cells

S. Nazarian, M. Pedram

University of Southern California EE-Systems, Los Angeles CA 90089



#### T. Lin, E. Tuncer

Magma Design Automation Santa Clara, CA 95054



#### Crosstalk-aware Logic Cell (Gate) Timing Analysis – Background

- Gate-level timing analysis tools such as STA and SSTA tools are used as efficient alternatives to Spice with an acceptable level of accuracy
- In many timing analysis tools, large errors can be observed when crosstalk noise is present
- Example: Static Timing Analysis (STA)



#### **Motivation**



Circuit (Timing) Delay Analysis in STA and SSTA tools

# Crosstalk-aware Logic Cell Timing Analysis – Motivation

- STA/SSTA tools utilize delay models for both interconnections and logic cells
  - The function of a cell delay model is to take an input (which may be subjected to coupling noise) waveform and produce the waveform for the cell output
    - -This process is known as the cell delay or (timing) analysis
- Two main classes of techniques
  - 1.Voltage-based techniques
  - 2.Current-based techniques



# Voltage-based Cell Delay Modeling

 Conventional cell delay analysis tools are inaccurate, mainly due to approximation of input with a saturated ramp, i.e., Γ<sub>eff</sub>



eff



The larger the number of 0.5V<sub>dd</sub> crossing points, the large the pessimism can be

# Non-voltage-based Cell Delay Modeling

• Equation-based: Characterization of real silicon to equation-based models is not generally feasible



- Current-based is more accurate than voltage-based modeling, esp. in considering the impact of the shape of a noisy waveform
- Motivation for the proposed current-based model:
  - The existing current-based cell delay models are too complex to use in a CAD tool

# Existing Current-based Models vin Cell

• Keller et al model:

$$\mathbf{i}_{\text{out}} = \mathbf{I}(\mathbf{V}_{\text{in}}, \mathbf{V}_{\text{out}}) + \mathbf{C}_{g}(\mathbf{v}_{\text{in}}, \mathbf{v}_{\text{out}}) \frac{\Delta \mathbf{v}_{\text{out}}}{\Delta t} - \mathbf{C}_{M}(\mathbf{v}_{\text{in}}, \mathbf{v}_{\text{out}}) \frac{\Delta \mathbf{v}_{\text{in}}}{\Delta t}$$

1<sub>out</sub>

Y(s)

- Miller capacitor  $C_M = C_{in}$
- Internal parasitic (to ground) capacitor  $C_g = C_{in} + C_{out}$
- A 2-D lookup table is used to store values of I(V<sub>in</sub>, V<sub>out</sub>) which are found through a series of DC Spice-base simulations
- C<sub>M</sub> and C<sub>g</sub> are assumed to be constant and calculated through a series of transient simulations with voltage transitions applied at the input and output nodes, during which the current flowing through the output node is measured

# Current-based Models (Cont'd)

- Keller et al model is the most accurate current-based model; however, it is too complex to be utilized in existing CAD tools and flows
- Blade is a simpler model with C<sub>M</sub> (or C<sub>in</sub>) set to 0
- Complexity is mainly due to the dependence on output voltage

# Our CGTA Cell Delay Model – Current Gain

- Goal: Given a (noisy) voltage waveform at the cell input, determine the output voltage waveform which has min error w.r.t. the actual output waveform
- Clearly, the output voltage of a cell is a function of the input voltage, the output parasitic capacitances, the output load, and V<sub>dd</sub>
- We define the current gain,  $\rho_{c,}$  as the derivative of the output current to the input voltage, i.e.,  $\Delta i_{out}/\Delta v_{in}$



I<sub>gain</sub> Table

 Each logic cell in a standard cell library is precharacterized with an I<sub>gain</sub>(K×L) lookup table



10

#### **Output Voltage Calculation**

$$\dot{i}_{out}(t_{k+1}) = \dot{i}_{out}(t_k) + \rho_c(t_k) \cdot \Delta V_{in}(t_{k+1}) + \frac{1}{2}\rho'_c(t_k) \{\Delta V_{in}(t_{k+1})\}^2 + \frac{1}{2}\rho'_c(t_k) \{\Delta V_{in}(t_k)\}^2 + \frac{1}{2}\rho'_c(t_k) +$$

...+
$$\frac{1}{n!}\rho_{c}^{(n-1)}(t_{k})\{\Delta v_{in}(t_{k+1})\}^{n}$$

- $i_{out}(t_0)$  is initialized to zero.  $\rho_c(t_k)$  is a shorthand notation for  $\rho_c(v_{in}(t_k))$
- $\rho_{c}(v_{in}(t_{k}))$  is found from the  $I_{gain table}$ , possibly by interpolation •  $\rho_{c}^{(n-1)}(t_{k}) = \frac{\Delta \rho_{c}^{(n-1)}}{\Delta v_{in}^{n}}(t_{k}) = \frac{\Delta^{n} i_{out}}{\Delta v_{in}^{n}}(t_{k})$  is the n<sup>th</sup> order current gain, which is calculated directly during the initial characterization process or is approximated from entries in the  $I_{gain}$
- In practice n=1 (or 2) is sufficient for accurate timing analysis of a logic cell subjected to a noiseless ramp (or a noisy input waveform); ∠t= t<sub>k+1</sub>-t<sub>k</sub> is the sampling time
- Note that the P computed output values may not be equidistant. A set of P equidistant points are computed based on weighted average of the two nearest values found by Taylor expansion

#### An Example Result of the CGTA Model



12

#### CGTA Cell Timing Analyzer – Experimental Results



- Size of the I<sub>gain</sub> table: (20,5)
- Comparison with Hspice: the output voltage waveforms generated by the CGTA delay calculator matched Hspice-produced waveforms with only a 1-3% error

13

### **Conventional Cell Delay Modeling**

• Find an equivalent input line,  $\Gamma_{\rm eff}$ , such that when it is applied to the input of a cell, it generates an output waveform that matches the actual waveform in terms of its arrival time and transition time



# **Existing Cell Delay Analyzers**



- Least Square Fit approximations
  - Least Square Fitting (LSF):  $\Gamma_{\rm eff}$  is the best least square linear fit of the noisy input waveform

$$\sum_{t_{10\%}}^{t_{90\%}} \{ v_{in}^{\text{noisy}} (t_k) - (a \times t_k + b) \}^2$$
 15

# Existing Cell Delay Analyzers (cont.)

- LSF approximations (cont.)
  - Weighted LSF

$$\sum_{t_{10\%}}^{t_{90\%}} \{ \rho^{\text{noiseless}} (t_k) (v_{in}^{\text{noisy}} (t_k) - (a \times t_k + b))^2 \}$$
• Elmore-based

• Slope of the line is selected such that the area, which is encapsulated by that line and  $v_1(t) = 0.5V_{dd}$ ,  $v_2(t) = V_{dd}$ , is equal to the area surrounded between the noisy input and lines  $v_1$  and  $v_2$ 





# Experimental Setup

- For both configurations we set the arrival time and slew (transition time) of the victim line input to 1000ps and 150ps.
- Configuration I is a pair of 1000µm coupled interconnect lines running parallel to one another with a total distributed coupling value of 100fF.
  - Both aggressor and victim line inputs have a slew of 150ps. For configuration I we swept the arrival time of the aggressor line input from 500 to 1500ps in steps of 5ps.
- Configuration II includes two aggressor lines each with 100fF total coupling and a victim, all of which are 500µm long.
  - We maintained a fixed offset of -100ps between signal arrival time of the 1st and 2nd aggressor line inputs, while sweeping that of the 2nd aggressor line input arrival time. The two aggressor inputs have slews 200ps, and 400ps, respectively.

#### Current-based Cell Delay Analysis -Experimental Results





| Method                     | ~ Runtime per<br>case (μsec) | Delay error (psec) = delay <sub>Hspice</sub> – delay <sub>method</sub> |      |                  |      |
|----------------------------|------------------------------|------------------------------------------------------------------------|------|------------------|------|
|                            |                              | Configuration I                                                        |      | Configuration II |      |
|                            |                              | Max                                                                    | Avg  | Max              | Avg  |
| Noiseless-Point-based      | 40                           | 81.3                                                                   | 29.3 | 134.2            | 48.5 |
| Noisy Point-based          | 40                           | 82.7                                                                   | 24.5 | 144.5            | 51.3 |
| Least Square Fitting (LSF) | 40                           | 75.1                                                                   | 30.9 | 110.8            | 45.4 |
| Elmore-based [Nazarian]    | 40                           | 82.3                                                                   | 14.5 | 145.3            | 33.4 |
| Weighted LSF [Hashimoto]   | 60                           | 42.4                                                                   | 10.3 | 49.3             | 17.4 |
| CGTA                       | 100                          | 11.4                                                                   | 3.7  | 13.8             | 3.9  |

• All cells from a TSMC 130nm, 1.2Vproduction cell library

• Crosstalk noise pulse amplitude of less than 0.36V (i.e., 30% of  $V_{dd}$ ) <sup>18</sup>

### Conclusions

- The CGTA logic cell timing analyzer was presented to address the complexity issue with the existing current-based cell delay models
- A pre-characterized table of current gain of i<sub>out</sub> to V<sub>in</sub> and C<sub>out</sub> values is utilized in combination with the Taylor series expansion to progressively compute the output current waveform
- The output voltage is then produced by integrating the output current
- Experimental results show the accuracy and efficiency of this new delay model

# **BACKUP SLIDES**



The function of a cell delay model is to take an input (subjected to noise) waveform and produce the waveform for the cell output This process is known as the cell delay or (timing) analysis

#### **Conventional Pre-Characterization**





# Complexity Analysis

- All conventional gate delay propagation techniques can determine the required crossing points for the waveforms such as the 0.5Vdd crossing points in O(P) time
- They can all apply closed form formulas (e.g. the one for LSF) to find the coefficients a and b for  $\Gamma_{\rm eff}$  in O(P), because the closed form formulas consist of several summations over P
- WLS has an additional (characterization step to calculate the weighting factor in the LSF formula which is likewise of order O(P)
- Characterization process: CGTA needs to estimate  $\rho_{\rm c}$  which is also of order O(size(I<sub>gain</sub>))
- Taylor expansion also has the complexity of O(P)
- )
- CGTA-based cell delay analysis technique takes O(P) to calculate current
- To compute the output voltage the integration takes O(P)
- Hence, the worst case complexity of CGTA (similar to that of the conventional voltage-based techniques) is O(P



#### CGTA-based Cell Delay Model – Experimental Results



- 200, 200, and 220fF of coupling capacitances exists and the signal transitions on aggressor lines occur close enough to create large crosstalk-induced fluctuations around 0.5V<sub>dd</sub> level and hence cause multiple 0.5V<sub>dd</sub> crossing points at the output of the victim
- Although the error in 0.5V<sub>dd</sub> propagation delay value is quite low (less than 1%,) it is seen that the equivalent output waveform does not match the Hspice waveform as close as those in parts (a) and (b)



### Weighted LSF Calculation Steps (Step I)

• Find the derivative for the noiseless input:

$$\rho^{\text{noiseless}}(t) = \partial v_{\text{out}}^{\text{noiseless}}(t) / \partial v_{\text{in}}^{\text{noiseless}}(t) = \frac{\partial v_{\text{out}}^{\text{noiseless}}(t) / dt}{\partial v_{\text{in}}^{\text{noiseless}}(t) / dt}$$

- Calculate the noiseless critical region [t<sub>10%</sub>, t<sub>90%</sub>]
- $\rho^{\text{noiseless}}$  is non-zero only for points in the noiseless critical region; otherwise it is set to zero



# Weighted LSF Calculation Steps (Step II)

• Determining  $\Gamma_{\text{eff}}$ :



