ASP-DAC 2009 Technical Program

The 14th Asia and South Pacific Design Automation Conference

Session 3B Advances in Timing Analysis and Modeling
Time: 15:55 - 18:00 Tuesday, January 20, 2009
Location: Room 413
Chairs: Shih-Hsu Huang (Chung Yuan Christian University, Taiwan), Atsushi Takahashi (Tokyo Institute of Technology, Japan)

3B-1 (Time: 15:55 - 16:20)

Title	Accelerating Statistical Static Timing Analysis Using Graphics Processing Units
Author	Kanupriya Gulati, *Sunil P. Khatri (Texas A&M University, United States)
Page	pp. 260 - 265
Keyword	Graphics Processing Units, Monte Carlo, Statistical Static Timing Analysis
Abstract	In this paper, we explore the implementation of Monte Carlo based statistical static timing analysis (SSTA) on a Graphics Processing Unit (GPU). SSTA via Monte Carlo simulations is a computationally expensive, but important step required to achieve design timing closure. It provides an accurate estimate of delay variations and their impact on design yield. The large number of threads that can be computed in parallel on a GPU suggests a natural fit for the problem of Monte Carlo based SSTA to the GPU platform. Our implementation performs multiple delay simulations at a single gate in parallel. A parallel implementation of the Mersenne Twister pseudo-random number generator on the GPU, followed by Box-Muller transformations (also implemented on the GPU) is used for generating gate delay numbers from a normal distribution. The mean and standard deviation of the pin-to-output delay distributions for all inputs and for every gate, are obtained using a memory lookup, which benets from the large memory bandwidth of the GPU. Threads which execute in parallel have no data/control dependencies on each other. All threads compute identical instructions, but on different data, as required by the Single Instruction Multiple Data (SIMD) programming semantics of the GPU. Our approach is implemented on a NVIDIA GeForce GTX 8800 GPU card. Our results indicate that our approach can obtain an average speedup of about 260X as compared to a serial CPU implementation. With the recently announced quad 8800 GPU cards, we estimate that our approach would attain a speedup of over 785X. The correctness of the Monte Carlo based SSTA implemented on a GPU has been verified by comparing its results with a CPU based implementation.
Slides

3B-2 (Time: 16:20 - 16:45)

Title	Trade-off Analysis between Timing Error Rate and Power Dissipation for Adaptive Speed Control with Timing Error Prediction
Author	*Hiroshi Fuketa, Masanori Hashimoto, Yukio Mitsuyama, Takao Onoye (Osaka University, Japan)
Page	pp. 266 - 271
Keyword	adaptive speed control, timing error prediction, canary FF, low power design, subthreshold circuit
Abstract	Timing margin of a chip varies chip by chip due to manufacturing variability, and depends on operating environment and aging. Adaptive speed control with timing error prediction is a promising approach to mitigate the timing margin variation, whereas it inherently has a critical risk of timing error occurrence when a circuit is slowed down. This paper presents how to evaluate the relation between timing error rate and power dissipation in self-adaptive circuits with timing error prediction. The discussion is experimentally validated using a 32-bit ripple carry adder in subthreshold operation in a 90nm CMOS process. We show a trade-off between timing error rate and power dissipation, and reveal the dependency of the trade-off on design parameters.
Slides

3B-3 (Time: 16:45 - 17:10)

Title	Statistical Analysis of On-Chip Power Grid Networks by Variational Extended Truncated Balanced Realization Method
Author	*Duo Li, Sheldon Tan (University of California at Riverside, United States), Gengsheng Chen, Xuan Zeng (Fudan University, China)
Page	pp. 272 - 277
Keyword	Power grid, TBR, Reduction, Interconnect, Variation
Abstract	In this paper, we present a novel statistical analysis approach for large power grid network analysis under process variations. The new algorithm is very efficient and scalable for huge networks with a large number of variational variables. This approach, called varETBR for variational extended truncated balanced realization, is based on model order reduction techniques to reduce the circuit matrices before the variational simulation. It performs the parameterized reduction on the original system using variation-bearing subspaces. varETBR calculates variational response Gramians by Monte-Carlo based numerical integration considering both system and input source variations for generating the projection subspace. varETBR is very scalable for the number of variables and is flexible for different variational distributions and ranges as demonstrated in experimental results. After the reduction, Monte-Carlo based statistical simulation is performed on the reduced system and the statistical responses of the original system are obtained thereafter. Experimental results, on a number of IBM benchmark circuits [15] up to 1.6 million nodes, show that the varETBR can be 4500X faster than the Monte-Carlo method and is much more scalable than one of the recently proposed approaches.

3B-4 (Time: 17:10 - 17:35)

Title	Bound-Based Identification of Timing-Violating Paths Under Variability
Author	*Lin Xie, Azadeh Davoodi (University of Wisconsin at Madison, United States)
Page	pp. 278 - 283
Keyword	variability, statistical timing analysis, timing-violating path, violation probability
Abstract	We introduce a bound-based technique to identify the top M timing-violating paths in a circuit under variability. These are the paths with the highest violation probability (i.e., C_p) which is the probability that a path (i.e., p) violates the timing constraint. To compute C_p, we require the violation probabilities of the nodes (i.e., C_n) and edges (i.e., C_e) on the path. First, we show computing C_n and C_e of all the nodes and edges requires only two rounds of Statistical Static Timing Analysis and then for each node/edge we need one table lookup for probability calculation using a technique known as Pearson Curve. Given C_n and C_e, our major contribution is in computing upper and lower bounds for C_p of an arbitrary path segment. We show constant-time for incremental update of the bounds when extending a path segment to a longer one. These bounds can be used to exactly construct the top violating paths. If the goal is to find the single most-violating path, we show a bound-based formulation that can prune a large portion of circuit without losing optimality. In our simulations, we verify the correctness and accuracy of our bounds for individual paths. We also verify identification of selected paths using Monte Carlo simulation. We obtain near-optimal accuracy with extremely fast runtimes.
Slides

3B-5 (Time: 17:35 - 18:00)

Title	Adaptive Techniques for Overcoming Performance Degradation due to Aging in Digital Circuits
Author	Sanjay Kumar, Chris Kim, *Sachin S. Sapatnekar (University of Minnesota, United States)
Page	pp. 284 - 289
Keyword	Reliability, Adaptive Body Bias, NBTI, Leakage, Delay
Abstract	Negative Bias Temperature Instability (NBTI) in PMOS transistors has become a major reliability concern in present-day digital circuit design. Further, with the recent usage of Hf-based high-k dielectrics for gate leakage reduction, Positive Bias Temperature Instability (PBTI), the dual effect in NMOS transistors has also reached significant levels. Consequently, designers are required to build in substantial guardbands into their designs, leading to large area and power overheads, in order to guarantee reliable operation over the lifetime of a chip. We propose a guard-banding technique based on adaptive body bias (ABB) and adaptive supply voltage (ASV), to recover the performance of an aged circuit, and compare its merits over previous approaches.
Slides