Automated Extraction of Accurate Delay/Timing Macromodels of Digital Gates and Latches using Trajectory Piecewise Methods

> Sandeep Dabas\*, Ning Dong<sup>+</sup> Jaijeet Roychowdhury\*

\* University of Minnesota, Twin Cities, USA Texas Instruments, Dallas, USA

## **Timing Models for Digital Logic**

- Replace gate with <u>simple</u> macromodel that captures timing/delay properties
  - motivation: <u>fast timing analysis</u> of large digital systems





## **Existing Timing/Delay Modelling Methods**

- **Current-source models** struggling with:
  - internal nodes / capacitances
  - <u>memory</u> and dynamics (latches/registers)
  - > multiple input switching (MIS)
  - > power/ground supply droop
  - > dynamic nonlinear loading
- Ad-hoc, manually derived topological templates
  - > difficult to manually abstract second-order device effects

## High Speed Digital == Analog/RF!

- Shrinking device dimensions
  - highly non-ideal device characteristics
- Increasing chip density/complexity
  - interference and noise
- Increasingly visible analog/high-frequency effects
  - > nonlinear resistive/capacitive loading
  - > interconnect (inductive/capacitive/transmission lines)
  - > dynamic IR drops, crosstalk

# **High Speed Digital == Analog/RF!** Large Circuit/System b(ť $\frac{d}{dt}q(x(t)) + f(x(t)) + b(t) = 0$ Automated Algorithms for Macromodel generation Macromodel (small, simple) Anonymity Speedups

### **Trajectory Piecewise Macromodelling**

- <u>Push-button</u> macromodel generation for nonlinear systems - previously applied to analog/RF
- Example: clipping and slew-rate captured for currentmirror op-amp



## **TP Macromodelling for Digital Logic**



Dynamical system complexity

## Automated Delay Model Extraction (ADME)

- Technique for <u>extracting accurate timing delay models</u>
  <u>from SPICE-level netlists</u>
  - Core: trajectory-piecewise nonlinear macromodelling (TPWL/PWP)
- <u>Automated</u>: push-button extraction via algorithm
- Extracts accuracy from lowest (transistor) level
- Effectively captures complex nonlinearities and effects
  - > multiple input/output transitions
  - > linear/nonlinear loading and capacitive effects
  - > supply droop and substrate interference
- Validated on important combinatorial/sequential circuits
- General in applicability: independent of design-style, complexity, topology, process technology

### Generating Delay Models via ADME: an illustration

- Example: 2-input XOR gate
- Designed for 0.18micron static CMOS technology
- MOS models modelled using BSIM3



• Important controlling parameters for ADME algorithm:

- raining input / expansion points
- > merging of trajectories
- > optimal order size

### Training Input and Expansion Points: speed and accuracy tradeoff

- Good training input:
  - > covers extreme bound of state-space
  - > covers frequently visited state-space
  - > capture dynamic nonlinearities
- Selection of macromodel "expansion points":
  - > relative error >  $\alpha$  (error tolerance)
  - > lower  $\alpha$ : more expansion points, lower speedup
- For XOR-2, α=0.005 ~ 0.05, N=36, q=10, speedup=2x



### Re-usability of Macromodel and Merging: broadly applicable macromodel

- Same training input:
  - no re-generation of macromodel.
  - > good accuracy achieved even with different inputs.



- better state-space coverage
- redundancy lower, negligible reduction in simulation speedup. (1.5x here)





## **Optimal Model Order (Size):**

#### common minimum subspace

- *Singular Value* based common subspace:
  - > SVD of projection bases
  - sudden drop in value => indicates common minimum subspace.



- Effect of order less than optimal q=10:
  - Plot shown for q=8.
  - Model does not converge for q < 8.</p>



### Application and Validation of ADME: accuracy and speedup illustration

- Combinatorial circuits:
  - > multi-input gates (NAND-2, NOR-2, XOR-3, 1-bit Full-Adder)
  - > multi-level cascade (internal nodes effect)
- Sequential circuits:
  - NAND based latch
  - NOR based latch
- Effects to be studied with above circuits:
  - internal node (capacitive) effects
  - Ioading effect
  - ransistor internal nonlinear effects

## **Multi-input Combinatorial Gate/Circuits**

- 2-input NAND:
  - W/L: 3 (nmos), 6 (pmos)
  - capacitance of internal node
    'X' affects propagation delay based on input pattern
- Effects observed with ADME based macromodel:
  - captures above internal node effect
  - case(b) indicates worst-case delay (A=1, B=1 -> 0)
- Simulation results:
  - ≻ Full: 28.7s
  - > ADME: 16.6s (speedup 1.7x)
  - > MM generation time: 4s





## **Multi-input Combinatorial Gate/Circuits**

- 3-input XOR:
  - > 24 MOSFETs (n=68, q=24)
  - manual macromodelling more laborious than 2-input
- Effects observed with ADME based macromodel:
  - captures internal node effect c as shown by black curve
  - propagation delay with load (red) is higher than unloaded (cyan), as expected
- Simulation results:
  - ≻ Full: 168.7s
  - > ADME: 39.5s (speedup 4.2x)
  - > MM generation time: 12s





## **Multi-input Combinatorial Gate/Circuits**

- 1-bit Full Adder:
  - ≻ 42 MOSFETs (n=113, q=28)
  - manual modelling difficult and error-prone than automated
- Effects observed with ADME based macromodel:
  - matches actual data accurately
  - sum (red) bit L-H delay more than H-L delay as expected (weak pull-up: MOS in series)
- Simulation results:
  - Full: 219.2s
  - > ADME: 32.8s (speedup 6.7x)
  - > MM generation time: 25s





## **Multi-level Cascade Combinatorial Circuits**

- Chain of basic gates:
  - > 4-input circuit (n=70, q=22)
  - > 5pF capacitive load applied
- Effects observed with ADME based macromodel:
  - matches actual data accurately even for cascaded gates, even with 4-input circuit
  - internal node waveform (black) shows good matching at internal nodes too.
- Simulation results:
  - ≻ Full: 143.8s
  - > ADME: 28.2s (speedup 5x)
  - > MM generation time: 14s





## **Basic Sequential Circuits**

- NAND/NOR based latch:
  > set-reset latch (n=26, q=8)
  > no capacitive load applied
- Effects observed with ADME based macromodel:
  - effectively maintains and captures memory (even don't care) state of latch (red and magenta)
  - multi-output waveforms matching also verified
- Simulation results:
  - Full: 53.8s
  - > ADME: 18.2s (speedup 3x)
  - > MM generation time: 10s



B

## **Summary and Future Directions**

- <u>ADME</u>: automated extraction of accurate timing delay models from SPICE-level netlists
- Key advantages:
  - <u>Automated</u>: push-button extraction via algorithm
  - Accurate: from lowest (transistor) level
  - Broadly applicable:
    - > multiple input/output transitions
    - > linear/nonlinear loading and capacitive effects
    - > supply droop and substrate interference
    - internal dynamics
    - > memory and latches
- Validated on important combinatorial/sequential circuits
- Future work
  - specialization/reimplementation of TPW core to obtain much greater speedups