ASP-DAC 2009 Technical Program

The 14th Asia and South Pacific Design Automation Conference

Session 4C Signal/Power Integrity and Simulation
Time: 10:15 - 12:20 Wednesday, January 21, 2009
Location: Room 414+415
Chairs: Hideki Asai (Shizuoka University, Japan), Sheldon Tan (University of California, Riverside, United States)

4C-1 (Time: 10:15 - 10:40)

Title	Stochastic Current Prediction Enabled Frequency Actuator for Runtime Resonance Noise Reduction
Author	*Yiyu Shi (University of California, Los Angeles, United States), Jinjun Xiong, Howard Chen (IBM Thomas J. Watson Research Center, United States), Lei He (University of California, Los Angeles, United States)
Page	pp. 373 - 378
Keyword	Stochastic current modeling, frequency actuator, resonance noise
Abstract	Power delivery network (PDN) is a distributed RLC network with its dominant resonance frequency in the low-to-middle frequency range. Though high-performance chips’ working frequencies are much higher than this resonance frequency in general, chip runtime loading frequency is not. When a chip executes a chunk of instructions repeatedly, the induced current load may have harmonic components close to this resonance frequency, causing excessive power integrity degradation. Existing PDN design solutions are, however, mainly targeted at reducing high-frequency noise and not effective to suppress such resonance noise. In this work, we propose a novel approach to proactively suppress this type of noise. A method based on a high dimension generalized Markov process is developed to predict current load variation. Based on such prediction, a clock frequency actuator design is proposed to proactively select an optimal clock frequency to suppress the resonance. To the best of our knowledge, this is the first in-depth study on proactively reducing runtime instruction execution induced PDN resonance noise.

4C-2 (Time: 10:40 - 11:05)

Title	Fast Analysis of Nontree-Clock Network Considering Environmental Uncertainty by Parameterized and Incremental Macromodeling
Author	Hai Wang (University of California, Riverside, United States), Hao Yu (Berkeley Design Automation, United States), *Sheldon X.D. Tan (University of California, Riverside, United States)
Page	pp. 379 - 384
Keyword	clock network, environmental uncertainties, macromodeling
Abstract	It is challenging to verify clock-skew for large-scale nontree clock network with environmental uncertainties such as supply voltage fluctuation and thermal temperature gradient. This paper presents a fast clock-skew analysis via parameterized incremental truncated-balanced-realization, called {\it piTBR} method. Environmental uncertainties are parametrically and structurally added into the state equation of clock network. A compact macromodel is obtained by the subspace projection constructed from the singular value decomposition (SVD) of circuit output waveforms. To reduce the computational cost, we propose an incremental SVD method that only needs to partially update the projection matrix by analyzing the perturbed output waveform owning to environmental uncertainties. Experiments on a number of clock networks show that compared with the macromodeling by the fast TBR method, our method reduces the computational cost in the order of $100 \times$ with a similar accuracy. In addition, compared with the macromodeling by the Krylov-subspace-based method, our method reduces the waveform error by $2 \times$ with a similar runtime.

4C-3 (Time: 11:05 - 11:30)

Title	High Performance On-Chip Differential Signaling Using Passive Compensation for Global Communication
Author	Ling Zhang, Yulei Zhang (University of California, San Diego, United States), Akira Tsuchiya (Kyoto University, Japan), Masanori Hashimoto (Osaka University, Japan), Ernest Kuh (University of California, Berkeley, United States), *Chung-Kuan Cheng (University of California, San Diego, United States)
Page	pp. 385 - 390
Keyword	High performance, passive compensation, on-chip T-line
Abstract	To address the performance limitation brought by the scaling issues of on-chip global wires, a new configuration for global wiring using on-chip lossy transmission lines is proposed and optimized. We propose a signaling structure to compensate the distortion and attenuation of on-chip transmission lines, which uses passive compensation and inserts repeated transceivers composing sense amplifiers and inverter chains. An optimization flow for designing this scheme based on eye-diagram prediction and sequential quadratic programming (SQP) is devised. This flow is used to study the latency, power dissipation and throughput performance of the new global wiring scheme as the technology scales from 90nm to 22nm. Comparing to repeated RC wire, experimental results demonstrate that at 22nm technology node, the new scheme can reduce the normalized delay by 80%-95%. , the normalized energy consumption by 50%-94%. The normalized latency is 10 ps/mm , the energy per bit is 20 pJ/m, and the throughput is 15 Gbps/um. All performance metrics are scalable with technology, which makes this approach a potential candidate to break the "interconnect wall" of digital system performance.

4C-4 (Time: 11:30 - 11:55)

Title	Noise Minimization During Power-Up Stage for a Multi-Domain Power Network
Author	*Wanping Zhang (Qualcomm Inc./University of California, San Diego, United States), Yi Zhu (University of California, San Diego, United States), Wenjian Yu (Tsinghua University, China), Amirali Shayan, Renshen Wang (University of California, San Diego, United States), Zhi Zhu (Qualcomm Inc., United States), Chung-Kuan Cheng (University of California, San Diego, United States)
Page	pp. 391 - 396
Keyword	Noise, Power-up sequence, Multi-domain
Abstract	With the popularity of Multiple Power Domain (MPD) design, the multi-domain power network noise analysis and minimization is becoming important. This paper describes an efficient heuristic algorithm to arrange the power-up sequence in a multi-domain power network in order to minimize the noise. We present a formulation of this problem and show it is NP-complete. Therefore, we propose a simulated annealing (SA) based algorithm with preprocessing. Experimental results show that the proposed algorithm can minimize the noise close to the minimal values. In terms of efficiency, the SA algorithm is more than hundreds of times faster than the enumerating method and the running time scales well for these cases with the number of domains. In addition, we discuss the trade off between power-up efficiency and noise.

4C-5s (Time: 11:55 - 12:07)

Title	Parallel Transistor Level Circuit Simulation using Domain Decomposition Methods
Author	*He Peng, Chung-Kuan Cheng (University of California, San Diego, United States)
Page	pp. 397 - 402
Keyword	SPICE, parallel circuit simulation, domain decomposition, multi-core simulation
Abstract	This paper presents an efficient parallel transistor level full-chip circuit simulation tool with SPICE-accuracy. The new approach partitions the circuit into a linear domain and several non-linear domains based on circuit non-linearity and connectivity. The linear domain is solved by parallel fast linear solver while nonlinear domains are parallelly distributed into different processors and solved by direct solver. Parallel domain decomposition technique is used to iteratively solve the different partitions of the circuit and ensure convergence. Different domain decomposition techniques are discussed. Orders of magnitude speedup over SPICE is observed for sets of large-scale VLSI circuits.

4C-6s (Time: 12:07 - 12:19)

Title	Fast Circuit Simulation on Graphics Processing Units
Author	Kanupriya Gulati (Texas A&M University, United States), John F. Croix (Nascentric, Inc., United States), *Sunil P. Khatri (Texas A&M University, United States), Rahm Shastry (Nascentric, Inc., United States)
Page	pp. 403 - 408
Keyword	SPICE, device model evaluations, Graphics Processing Units
Abstract	SPICE based circuit simulation is a traditional workhorse in the VLSI design process. Given the pivotal role of SPICE in the IC design flow, there has been significant interest in accelerating SPICE. Since a large fraction (on average 75%) of the SPICE runtime is spent in evaluating transistor model equations, a significant speedup can be availed if these evaluations are accelerated. This paper reports our early efforts to accelerate transistor model evaluations using a Graphics Processing Unit (GPU). We have integrated this accelerator with a commercial fast SPICE tool. Our experiments demonstrate that significant speedups (2.36X on average) can be obtained. The asymptotic speedup that can be obtained is about 4X. We demonstrate that with circuits consisting of as few as about 1000 transistors, speedups in the neighborhood of this asymptotic value can be obtained. By utilizing the recently announced (but not currently available) quad GPU systems, this speedup could be enhanced further, especially for larger designs.
Slides