Tuesday, January 19, 2010 |
Room 101A | Room 101B | Room 101C | Room 101D |
---|---|---|---|
Opening 8:30 - 9:00 |
|||
Keynote Session I 9:00 - 10:00 |
|||
Keynote Session II 10:20 - 11:20 |
|||
Keynote Session III 11:20 - 12:20 |
|||
13:30 - 15:10 |
13:30 - 15:10 |
13:30 - 15:10 |
13:30 - 15:10 |
15:30 - 17:10 |
15:30 - 17:10 |
15:30 - 17:10 |
15:30 - 17:10 |
Wednesday, January 20, 2010 |
Room 101A | Room 101B | Room 101C | Room 101D |
---|---|---|---|
8:30 - 10:10 |
8:30 - 10:10 |
8:30 - 10:10 |
8:30 - 10:10 |
10:30 - 12:10 |
10:30 - 12:10 |
10:30 - 12:10 |
10:30 - 12:10 |
13:30 - 15:10 |
13:30 - 15:10 |
13:30 - 15:10 |
13:30 - 15:10 |
15:30 - 17:10 |
15:30 - 17:10 |
15:30 - 17:10 |
15:30 - 17:10 |
Thursday, January 21, 2010 |
Tuesday, January 19, 2010 |
Title | (Keynote Address) I Attended the Nineteenth Design Automation Conference |
Author | Chung-Laung Liu (National Tsing Hua University, Taiwan) |
Abstract | I presented a technical paper at the Nineteenth Design Automation Conference in 1982. Twenty Eight years since then was a short time, a long time, and a wonderful time for the profession! |
Title | (Keynote Address) Delivering 10X Design Improvements |
Author | Walden C. Rhines (Mentor Graphics, U.S.A.) |
Abstract | Time and time again, escalating complexity has threatened to derail the IC industry from the extraordinary 35% annual reduction in transistor pricing it has enjoyed the past 40+ years. Fortunately, in each and every instance, creative engineers and companies have seen this as a challenge and opportunity to innovate. As a result, the electronic design automation industry has repeatedly delivered order of magnitude improvements in every aspect of the IC design cycle for over three decades. Today, the exponential rise in complexity has quickened its pace as the industry moves toward adoption of 28 nm and below. Dr. Wally Rhines will discuss how in the next five years, 10X improvements in design methodologies are needed in four principal areas: high-level system design, verification, embedded software development, and back-end physical design and test. He will provide a roadmap for the next wave of changes needed to successfully negotiate rising complexity, highlighting where they will most likely occur. |
Title | (Keynote Address) IC Design for the Intuitive Life Style |
Author | Jim Lai (Global Unichip Corporation, Taiwan) |
Abstract | In the past two decades, computer, consumer and communication products remain the main driving force to push design technology further deep to advanced technology node. The consumer market seems to run out of fuel in the past year. Will new innovation or convergence drive the next explosion in consumer applications? In this speech, Mr. Jim Lai will express how our lifestyle has been changed by technology and how human needs push the IC design to different applications. From the close observation of the IC design industry, he will point out the challenges which IC design industry currently face, and draw the trend and potential solutions to support the continuous evolution of human life. |
Title | A PUF Design for Secure FPGA-Based Embedded Systems |
Author | *Jason H. Anderson (University of Toronto, Canada) |
Page | pp. 1 - 6 |
Keyword | Embedded systems, hardware security, FPGAs, PUF, IC counterfeiting |
Abstract | The concept of having an integrated circuit (IC) generate its own unique digital signature has broad application in areas such as embedded systems security, and IP/IC counterpiracy. Physically unclonable functions (PUFs) are circuits that compute a unique signature for a given IC based on the process variations inherent in the IC manufacturing process. This paper presents the first PUF design specifically targeted for field-programmable gate arrays (FPGAs). Our novel design makes use of the underlying FPGA architecture, and unlike prior published PUFs, the proposed PUF can be naturally embedded into a design’s HDL, consuming very little area, and does not require the use of “hard macros” with fixed routing. Measured results on the Xilinx Virtex-5 65 nm FPGA demonstrate PUF signatures to be both unique and reliable under temperature variation. |
Slides |
Title | Adaptive Power Management for Real-Time Event Streams |
Author | *Kai Huang (ETH Zurich, Switzerland), Luca Santinelli (Scuola Superiore Sant'Anna of Pisa, Italy), Jian-Jia Chen, Lothar Thiele (ETH Zurich, Switzerland), Giorgio C. Buttazzo (Scuola Superiore Sant'Anna of Pisa, Italy) |
Page | pp. 7 - 12 |
Keyword | Adaptive Power Management, Energy Minimization, Real-Time Event Streams, Real-Time Calculus |
Abstract | Dynamic power management has become essential for battery-driven embedded systems. This paper explores how to efficiently and effectively reduce the energy consumption of a device (system) for serving multiple event streams under hard real-time constraints. Considering two different preemptive scheduling, i.e., earliest deadline first and fixed priority, we propose algorithms to adaptively control the power mode of the device according to historical arrivals of events. Our algorithms can not only tackle arbitrary event arrivals but also provide hard real-time guarantees with respect to both timing and backlog constraints. We also present simulation results to demonstrate the effectiveness of our approaches. |
Slides |
Title | An Alternative Polychronous Model and Synthesis Methodology for Model-Driven Embedded Software |
Author | Bijoy Antony Jose, *Sandeep Kumar Shukla (FERMAT Lab, Virginia Tech, U.S.A.) |
Page | pp. 13 - 18 |
Keyword | Embedded Software, Polychrony, Model-driven software, code synthesis |
Abstract | Multi-clocked synchronous (a.k.a. Polychronous) specification languages do not assume that execution proceeds by sampling inputs at predetermined global synchronization points. The software synthesized from such specifications are paced by arrival of certain inputs, or evaluation of certain internal variables. Here, we present an alternate polychronous model of computation termedMulti-rate Instantaneous Channel connected Data Flow (MRICDF) actor network model. Sequential embedded software from MRICDF specifications can be synthesized using epoch analysis, a technique proposed to form a unique order of events without a reference time line. We show how to decide on the implementability of MRICDF specification and how additional epoch information can help in synthesizing deterministic sequential software. The semantics of an MRICDF is akin to that of SIGNAL, but is visual and easier to specify. Also, our prime implicate based epoch analysis technique avoids the complex clocktree based analysis required in SIGNAL. We experimented with the usability of MRICDF formalism by creating EmCodeSyn, our visual specification and synthesis tool. Our attempt is to make polychronous specification based software synthesis more accessible to engineers, by proposing this alternativemodel with different semantic exposition and simpler analysis techniques. |
Slides |
Title | Trace-based Performance Analysis Framework for Heterogeneous Multicore Systems |
Author | Shih-Hao Hung, *Chia-Heng Tu, Thean-Siew Soon (National Taiwan University, Taiwan) |
Page | pp. 19 - 24 |
Keyword | Performance analysis tool, heterogeneous multicore platform, trace-based performance analysis |
Abstract | Performance evaluation is key to the optimization of computer applications on multicore systems. While many techniques and profiling tools are available for measuring performance on homogeneous multicore platforms, most of them depend on the hardware support from the vendors. For developing applications on heterogeneous multicore systems, very few analysis tools exist to help the developers. This paper describes a software-based trace collection and performance analysis framework that can be ported to a variety of platforms via code instrumentation at the source level. A pure software profiling toolkit, called ParallelTracer, were implemented based on ANTLR, an open source parser generator, to support this framework. In this paper, we present our framework and toolkit. We use the IBM Cell processor as a case study to demonstrate the capability of ParallelTrace. Our results show that ParallelTracer provided useful information for programmers to understand program behaviors and identify potential performance bottlenecks via graphical visualization. We also discuss the runtime overhead of ParallelTracer. With proper usage, the performance and code size overhead introduced by our toolkit are limited around 19% to 5% and 9%, respectively, for the benchmark program in the case study. |
Slides |
Title | Efficient Model Reduction of Interconnects Via Double Gramians Approximation |
Author | Boyuan Yan, *Sheldon Tan (UC Riverside, U.S.A.), Gengsheng Chen (Fudan University, China), Yici Cai (Tsinghua University, China) |
Page | pp. 25 - 30 |
Keyword | model order reduction, interconnect, SVD, simulation |
Abstract | The gramian approximation methods have been proposed recently to overcome the high computing costs of classical balanced truncation based reduction methods. But those methods typically gain efficiency by projecting the original system only onto one dominant subspace of the approximate system gramian (for instance using only controllability gramian). This single gramian reduction method can lead to large errors as the subspaces of controllability and observability can be quite different for general interconnects with unsymmetric system matrices. In this paper, we propose a fast balanced truncation method where the system is balanced in terms of two approximate gramians as achieved in the classical balanced truncation method. The novelty of the new method is that we can keep the similar computing costs of the single gramian method. The proposed algorithm is based on a generalized SVD-based balancing scheme such that the dominant subspace of the approximate gramian product can be obtained in a very efficient way without explicitly forming the gramians. Experimental results on a number of published benchmarks show that the proposed method is much more accurate than the single gramian method with similar computing costs. |
Title | Wideband Reduced Modeling of Interconnect Circuits by Adaptive Complex-Valued Sampling Method |
Author | Hai Wang, *Sheldon Tan (UC Riverside, U.S.A.), Gengsheng Chen (Fudan University, China) |
Page | pp. 31 - 36 |
Keyword | interconnect, simulation, model order reduction |
Abstract | In this paper, we propose a new wideband model order reduction method for interconnect circuits by using a novel adaptive sampling and error estimation scheme. We try to address the outstanding error control problems in the existing sampling-based reduction framework. In the new method, called WBMOR, we explicitly compute the exact residual errors to guide the sampling process. We show that by sampling along the imaginary axis and performing a new complex-valued reduction, the reduced model will match exactly with the original model at the sample points. We show theoretically that the proposed method can achieve the error bound over a given frequency range. Practically the new algorithm can help designers choose the best order of the reduced model for the given frequency range and error bound via adaptive sampling scheme. As a result, it can perform wideband accurate reductions of interconnect circuits for analog and RF applications. We compare several sampling schemes such as linear, logarithmical, and recently proposed re-sampling methods. Experimental results on a number of RLC circuits show that WBMOR is much more accurate than all the other simple sampling methods and the recently proposed re-sampling scheme with the same reduction orders. Compared with the real-valued sampling methods, the complex-valued sampling method is more accurate for the same computational costs. |
Title | VISA: Versatile Impulse Structure Approximation for Time-Domain Linear Macromodeling |
Author | *Chi-Un Lei, Ngai Wong (The University of Hong Kong, Hong Kong) |
Page | pp. 37 - 42 |
Keyword | Linear Macromodeling, Interpolations, Walsh theorem, Rational function, Time domain |
Abstract | We develop a rational function macromodeling algorithm named VISA (Versatile Impulse Structure Approximation) for macromodeling of system responses with (discrete) time-sampled data. The ideas of Walsh theorem and complementary signal are introduced to convert the macromodeling problem into a non-pole-based Steiglitz-McBride (SM) iteration (a class of first- and second-order interpolations) without initial guess and eigenvalue computation. We demonstrate the fast convergence and the versatile macromodeling requirement adoption through a P-norm approximation expansion, using examples from practical measured data. |
Slides |
Title | An Extension of the Generalized Hamiltonian Method to S-parameter Descriptor Systems |
Author | *Zheng Zhang, Ngai Wong (Department of Electrical and Electronic Engineering, The University of Hong Kong, Hong Kong) |
Page | pp. 43 - 47 |
Keyword | Passivity, Descriptor system, Bounded Realness, Scattering parameter |
Abstract | A generalized Hamiltonian method (GHM) was recently proposed for the passivity test of hybrid descriptor systems [1]. This paper extends the GHM theory to its S-parameter counterpart. Based on the S-parameter GHM, a passivity test flow is proposed, which is capable of detecting nonpassive regions of descriptor-form physical models. The proposed method is applicable to S-parameter and hybrid systems either in the standard state-space or descriptor forms. Experimental results confirm the effectiveness and accuracy of the proposed method. |
Title | Simultaneous Slack Budgeting and Retiming for Synchronous Circuits Optimization |
Author | *Shenghua Liu, Yuchun Ma, Xian-Long Hong (Tsinghua University, China), Yu Wang (Tsinghua Univ., China) |
Page | pp. 49 - 54 |
Keyword | Retiming, Slack budgeting, gate-level synthesis, Maximum Independent Set |
Abstract | With the challenges of growing functionality and scaling chip size, the possible performance improvements should be considered in the earlier IC design stages, which gives more freedom to the later optimization. Potential slack as an effective metric of possible performance improvements is considered in this work which, as far as we known, is the first work that maximizes the potential slack by retiming for synchronous sequential circuit. A simultaneous slack budgeting and incremental retiming algorithm is proposed for maximizing potential slack. The overall slack budget is optimized by relocating the FFs iteratively with the MIS-based slack estimation. Compared with the potential slack of a well-known min-period retiming, our algorithm improves potential slack averagely 19.6% without degrading the circuit performance in reasonable runtime. Furthermore, at the expense of a small amount of timing performance, 0.52% and 2.08%, the potential slack is increased averagely by 19.89% and 28.16% separately, which give a hint of the tradeoff between the timing performance and the slack budget. |
Slides |
Title | A Fast SPFD-based Rewiring Technique |
Author | *Pongstorn Maidee, Kia Bazargan (University of Minnesota, U.S.A.) |
Page | pp. 55 - 60 |
Keyword | rewiring, SPFD, SAT, BDD |
Abstract | Circuit rewiring can be used to explore a larger solution space by modifying circuit structure to suit a given optimization problem. Among several rewiring techniques that have been proposed, SPFD-based rewiring has been shown to be more effective in terms of solution space coverage. However, its adoption in practice has been limited due to its long runtime. We propose a novel SAT-based algorithm that is much faster than the traditional BDD-based methods. Unlike BDD-based methods that completely specify all pairs of SPFD using BDDs, our algorithm uses a few SAT instances to perform rewiring for a given wire without explicitly enumerating all SPFDs. Experimental results show that our algorithm's runtime is only 13% of that of a conventional one when each wire has at most 25 candidate wires and the runtime scales well with the number of candidate wires considered. Our approach evaluates each rewiring instance independently in the order of milliseconds, rendering deployment of an SPFD-based rewiring inside the optimization loop of synthesis tools a possibility. |
Slides |
Title | iRetILP: An Efficient Incremental Algorithm for Min-period Retiming under General Delay Model |
Author | Debasish Das (Northwestern University, U.S.A.), Jia Wang (Illinois Institute of Technology, U.S.A.), *Hai Zhou (Northwestern University, U.S.A.) |
Page | pp. 61 - 67 |
Keyword | integer linear programming, algorithm design, timing optimization |
Abstract | Retiming is one of the most powerful sequential transformations that relocates flip-flops in a circuit without changing its functionality. The min-period retiming problem seeks a solution with the minimal clock period. Since most min-period retiming works assume a simple constant delay model that does not take into account many prominent electrical effects in ultra deep sub micron vlsi designs, a general delay model was proposed to improve the accuracy of the retiming optimization. Due to the complexity of the general delay model, the formulation of min-period retiming under such model is based on integer linear programming (ILP). However, because the previous ILP formulation was derived on a dense path graph, it incurred huge storage and running time overhead for the ILP solvers and the application was limited to small circuits. In this paper, we present the iRetILP algorithm to solve the min-period retiming problem efficiently under the general delay model by formulating and solving the ILP problems incrementally. Experimental results show that iRetILP is on average 100× faster than the previous algorithm for small circuits and is highly scalable to large circuits in term of memory consumption and running time. |
Slides |
Title | (Invited Paper) Room-Temperature Fuel Cells and Their Integration into Portable and Embedded Systems |
Author | *Naehyuck Chang, Jueun Seo, Donghwa Shin, Younghyun Kim (Seoul National University, Republic of Korea) |
Page | pp. 69 - 74 |
Keyword | Fuel cell, DMFC, hybrid, BOP control |
Abstract | Direct methanol fuel cells (DMFCs) are a promising next-generation energy source for portable applications, due to their high energy density and the ease of handling of the liquid fuel. However, the limited range of output power obtainable from a fuel cell requires hybridization the introduction of a battery to form a stand-alone portable power source. Furthermore, the stringent operating conditions to be met by active DMFC systems mandate complicated balance of plant (BOP) control. We present a complete hybrid active DMFC system design and implementation in which a DMFC stack and a li-ion battery are linked by a hybridization circuit to share the applied load to exploit high energy density of the fuel cell and high power density of the battery. We describe systems for fuel delivery, air supply, temperature management, current and voltage measurement, DC--DC conversion and power distribution, motor driving, battery charge management, DMFC and circuit protection, and control of the DMFC and battery as a hybrid. We have designed and implemented an embedded system controller that consists of a 32-bit microcontroller, running under a real-time operating system, that incorporating multiple cascaded feedback control loops which manage the dynamics of BOP control. We demonstrate reliable and efficient maintenance of a constant fuel cell output current in spite of severe fluctuation of the load current. |
Title | (Invited Paper) Maximizing the Harvested Energy for Micro-power Applications through Efficient MPPT and PMU Design |
Author | Hui Shao, *Chi-Ying Tsui, Wing-Hung Ki (Hong Kong University of Science and Technology, Hong Kong) |
Page | pp. 75 - 80 |
Keyword | Energy harvessting, micro-power systems, MPPT, power management |
Abstract | Energy harvesting is becoming more and more popular for micro-power applications where the environmental energy is used to power up the systems. In order to prolong the device lifetime and guarantee the system operation, the harvested power from the energy transducer to supply the system load should be maximized. This paper reviews different techniques and solutions to maximize the harvested power. Different environmental energy sources and the characteristics of the corresponding energy transducers are discussed. Algorithms to detect and track the maximum power point (MPP) of the energy transducer are summarized. Different power management unit (PMU) designs to execute MPP tracking (MPPT) algorithms are presented. |
Title | (Invited Paper) Dynamic Power Management in Environmentally Powered Systems |
Author | Clemens Moser, *Jian-Jia Chen, Lothar Thiele (ETH Zurich, Switzerland) |
Page | pp. 81 - 88 |
Keyword | power management, embedded systems, energy harvesting, model predictive control, optimization |
Abstract | In this paper a framework for energy management in energy harvesting embedded systems is presented. As a possible example scenario, we focus on wireless sensor nodes which are powered by solar cells. We demonstrate that classical power management solutions have to be reconceived and/or new problems arise if perpetual operation of the system is required. In particular, we provide a set of algorithms and methods for different application scenarios, including real-time scheduling, application rate control as well as reward maximization. The goal is to optimize the performance of the application subject to given energy constraints. Our methods optimize the system performance which allows the usage of, e.g., smaller solar cells and smaller batteries. Our theoretical results are supported by simulations using long-term measurements of solar energy in an outdoor environment. Furthermore, to demonstrate the practical relevance of our approaches, we measured the implementation overhead of our algorithms on real sensor nodes. |
Title | (Invited Paper) Micro-scale Energy Harvesting: A System Design Perspective |
Author | Chao Lu, *Vijay Raghunathan, Kaushik Roy (Purdue University, U.S.A.) |
Page | pp. 89 - 94 |
Keyword | Energy Harvesting, Low Power Design, Power Management |
Abstract | Harvesting electrical power from environmental energy sources is an attractive and increasingly feasible option for several micro-scale electronic systems such as biomedical implants and wireless sensor nodes that need to operate autonomously for long periods of time (months to years). However, designing highly efficient micro-scale energy harvesting systems requires an in-depth understanding of various design considerations and tradeoffs. This paper provides an overview of the area of micro-scale energy harvesting and discusses the various challenges and considerations involved from a system-design perspective. |
Title | Co-Optimization of Memory Access and Task Scheduling on MPSoC Architectures with Multi-Level Memory |
Author | Yi He (The University of Texas at Dallas, U.S.A.), *Chun Jason Xue (City University of Hong Kong, Hong Kong), Cathy Qun Xu, Edwin Sha (The University of Texas at Dallas, U.S.A.) |
Page | pp. 95 - 100 |
Keyword | co-optimization, scheduling, memory access, MPSoC |
Abstract | An MPSoC system usually consists of a number of processors, a memory hierarchy and a communication mechanism between processors. Because of the gap between the constantly increasing processor speed and slower memory access, how to utilize the memory subsystem more efficiently has become a critical issue for improving the overall system performance. To address this problem, two algorithms are proposed in this paper. The first one uses the integer linear programming method so that the memory access cost is minimized while tasks are scheduled in as short a time as possible. The second one is a heuristic algorithm which can achieve close to optimum results with linear running time. The experimental results show that the memory access cost can be reduced up to 56% comparing to LIST scheduling. |
Slides |
Title | A New Compilation Technique for SIMD Code Generation across Basic Block Boundaries |
Author | *Hiroaki Tanaka, Yutaka Ota, Nobu Matsumoto (Center for Semiconductor Research and Development, Semiconductor Company, Toshiba Corporation, Japan), Takuji Hieda, Yoshinori Takeuchi, Masaharu Imai (Graduate School of Information Science and Technology, Osaka University, Japan) |
Page | pp. 101 - 106 |
Keyword | Compiler Optimization, SIMD instructions, Control Flow |
Abstract | Although SIMD instructions are effective for many digital signal processing applications, current compilers cannot take full advantage of SIMD instructions. One factor inhibiting SIMD code generation is control flow structure; the target scope of SIMD code generation is currently limited to single basic block or loop that consists of single basic block. SIMD instructions cannot be mapped typically across basic block boundaries even if basic blocks inside the control structure have enough parallelism. In this paper, a new compilation technique to generate SIMD code without modifying control flow structure is proposed. The data dependency between basic blocks is exploited to generate SIMD instructions. The packing cost is introduced for effective vectorization to maintain data dependency across basic block boundaries. Experimental results show that the new SIMD code generation technique reduced 67% of dynamic execution cycles of inter prediction in H.264 decoder. |
Slides |
Title | LibGALS: A Library for GALS Systems Design and Modeling |
Author | *Wei-Tsun Sun, Zoran Salcic, Avinash Malik (University of Auckland, New Zealand) |
Page | pp. 107 - 112 |
Keyword | GALS, Asynchronous, Synchronous, Programming Languages, Operating Systems |
Abstract | LibGALS is a library and run-time environment that extends a multi-process host operating system (OS) to support the design of Globally Asynchronous Locally Synchronous (GALS) software systems and models. LibGALS provides an application programming interface (API) that enables the designer to describe GALS concurrent programs and reactivity in sequential programming languages. Moreover, it facilitates the interface between the GALS concurrent program and other processes through the services provided by the host OS. LibGALS is also suitable as a target for code generation from GALS and synchronous concurrent languages. The experiments demonstrate code size and run-time gains when compared with other approach to GALS system implementation. |
Slides |
Title | Joint Variable Partitioning and Bank Selection Instruction Optimization on Embedded Systems with Multiple Memory Banks |
Author | *Tiantian Liu, Minming Li, Chun Jason Xue (City University of Hong Kong, Hong Kong) |
Page | pp. 113 - 118 |
Keyword | partitioned memory architecture, bank switching, variable partition |
Abstract | Bank switching is a technique to increase memory size without extending address buses. A special instruction, Bank Selection Instruction (BSL) is inserted into programs to modify the bank register to point to the right bank, which increases both the code size and runtime overhead. In this paper, we carefully partition variables into different banks and insert BSLs at different positions so that the overheads can be minimized. Minimizing code size and runtime overhead are two objectives investigated in this paper. |
Slides |
Title | On-Chip Power Network Optimization with Decoupling Capacitors and Controlled-ESRs |
Author | Wanping Zhang (Qualcomm Inc./UCSD, U.S.A.), Ling Zhang, Amirali Shayan (UCSD, U.S.A.), Wenjian Yu (Tsinghua University, China), Xiang Hu (UCSD, U.S.A.), Zhi Zhu (Qualcomm Inc., U.S.A.), Ege Engin (SDSU, U.S.A.), *Chung-Kuan Cheng (UCSD, U.S.A.) |
Page | pp. 119 - 124 |
Keyword | Power Network, Decap, Controlled-ESR |
Abstract | In this paper, we propose an efficient approach to minimize the noise on power networks via the allocation of decoupling capacitors (decap) and controlled equivalent series resistors (ESR). The controlled-ESR is introduced to reduce the on-chip power voltage fluctuation, including both voltage drop and overshoot. We formulate an optimization problem of noise minimization with the constraint of decap budget. A revised sensitivity calculation method is derived to consider both voltage drop and overshoot. The sequential quadratic programming (SQP) algorithm is adopted to solve the optimization problem where the revised sensitivity is regarded as the gradient. Experimental results show that considering voltage drop without overshoot leads to underestimating noise by 4.8%. We also demonstrate that the controlled-ESR is able to reduce the noise by 25% with the same decap budget. |
Slides |
Title | An Adaptive Parallel Flow for Power Distribution Network Simulation Using Discrete Fourier Transform |
Author | Xiang Hu, Wenbo Zhao, Peng Du, Amirali Shayan, *Chung-Kuan Cheng (University of California, San Diego, U.S.A.) |
Page | pp. 125 - 130 |
Keyword | power distribution network, discrete Fourier transform, parallel processing |
Abstract | A frequency-time-domain co-simulation flow using discrete Fourier transform (DFT) is introduced in this paper to analyze large power distribution networks (PDN’s). The flow not only allows designers to gain an insight to the frequency-domain characteristics of the PDN but also to obtain accurate time-domain voltage responses according to different load current profiles. An adaptive method achieves accurate results within even shorter time compared to the basic DFT flow. In addition, parallel processing is incorporated which leads to a significant reduction in simulation time. Error bounds of the DFT flow are derived to assure the accuracy of simulation results. Experimental results show that the proposed flow has a relative error of 0.093% and a speedup of 10x compared to SPICE transient simulation with a single processor. |
Slides |
Title | Technique for Controlling Power-Mode Transition Noise in Distributed Sleep Transistor Network |
Author | *Yongho Lee, Taewhan Kim (Seoul National University, Republic of Korea) |
Page | pp. 131 - 136 |
Keyword | leakage power, circuit, current noise, performance, optimization |
Abstract | Power gating technique is one of the effective technologies to achieve both low leakage and high performance in circuits. This work proposes a systematic solution to the problem of integrating the power-up controlling of sleep transistors into the power gated design flow in distributed sleep transistor network to take into account power-mode transition noise constraint as well as performance loss constraint. |
Slides |
Title | A Novel FDTD Algorithm Based on Alternating-Direction Explicit Method with PML Absorbing Boundary Condition |
Author | *Shuichi Aono (SESAME Technology Inc., Japan), Masaki Unno, Hideki Asai (Shizuoka University, Japan) |
Page | pp. 137 - 141 |
Keyword | FDTD method, explicit method, PML |
Abstract | In this paper, we propose a new FDTD (Finite-Difference Time-Domain) method using the alternating-direction explicit (ADE) method for the efficient electromagnetic field simulation. Furthermore, the modified PML (Perfectly Matched Layer) absorbing boundary condition, which is applicable to the proposed new method, is introduced. Finally, The efficiency of the ADE-FDTD method is evaluated by computer simulations. |
Title | Speeding Up SoC Virtual Platform Simulation by Data-Dependency-Aware Synchronization and Scheduling |
Author | Kuen-Huei Lin, Siao-Jie Cai, *Chung-Yang (Ric) Huang (National Taiwan University, Taiwan) |
Page | pp. 143 - 148 |
Keyword | Virtual platform simulation, data-dependency-aware, virtual synchronization, trace-driven simulation |
Abstract | In this paper, we proposed a novel simulation scheme, called data-dependency-aware synchronization and scheduling, for SoC virtual platform simulation. In contrast to the conventional clock- or transaction-based synchronization, our simulation scheme can work with the clock decoupling and direct-data-access techniques to implement the trace-driven virtual synchronization methodology. In addition, we further extend the virtual synchronization concept to handle the interrupt signals in the system. This enables the porting of operating system (uCLinux) in our virtual platform. The experimental results show that our virtual platform can achieve 3 to 5 million-instructions-per-second simulation speed, or 44 times speed-up over the conventional cycle accurate approach, while still maintaining the same cycle-count accuracy. |
Title | SCGPSim: A Fast SystemC Simulator on GPUs |
Author | Mahesh Nanjundappa (FERMAT LAB, Virginia Polytechnic Institute and State University, U.S.A.), Hiren D Patel (Department of ECE, University of Waterloo, Canada), Bijoy A Jose, *Sandeep K Shukla (FERMAT LAB, Virginia Polytechnic Institute and State University, U.S.A.) |
Page | pp. 149 - 154 |
Keyword | GPGPU, SystemC, CUDA, Parallel Simulation |
Abstract | The main objective of this paper is to speed up the simulation performance of SystemC designs at the RTL abstraction level by exploiting the high degree of parallelism afforded by today's general purpose graphics processors (GPGPUs). Our approach parallelizes SystemC's discrete-event simulation (DES) on GPGPUs by transforming the model of computation of DES into a model of concurrent threads that synchronize as and when necessary. Unlike the cooperative threading model employed in the SystemC reference implementation, our threading model is capable of executing in parallel on the large number of simple processing units available on GPUs. Our simulation infrastructure is called SCGPSim and it includes a source-to-source (S2S) translator to transform synthesizable SystemC models into parallelly executable programs targeting an NVIDIA GPU. The translator retains the simulation semantics of the original designs by applying semantics preserving transformations. The resulting transformed models mapped onto the massively parallel architecture of GPUs improve simulation efficiency quite substantially. Preliminary experiments with varying-sized examples such as AES, ALU, and FIR have shown simulation speed-ups ranging from 30x to 100x. Considering that our transformations are not yet optimized, we believe that optimizing them will improve the simulation performance even further. |
Slides |
Title | A Flexible Hybrid Simulation Platform Targeting Multiple Configurable Processors SoC |
Author | *Hao Shen, Frédéric Pétrot (TIMA Laboratory, INP Grenoble, France) |
Page | pp. 155 - 160 |
Keyword | semi-hosting, hybrid, simulation, MPSoC, configurable processor |
Abstract | Multiple Configurable Processors System-on-Chip (MCPSoC) platforms have both performance and power advantages for embedded applications. Unfortunately, at early design stages, because of the processor configuration, I/O device changes and MCPSoC architecture modifications, designers waste much time on the Operating System (OS) porting work with general Instruction Set Simulator (ISS) based SoC simulation platforms. In this paper, we propose a hybrid simulation platform which uses general ISS and implements the Hardware Abstraction Layer (HAL) Application Programming Interfaces (APIs) and I/O device driver APIs with the SystemC modules on host machines directly. This hybrid simulation platform can shorten the application validation process by avoiding assembly code and hard-coded address modifications of traditional OS porting work. We show the advantages of our new hybrid simulation platform with a video decoding case study in the end. |
Slides |
Title | A Fast Heuristic Scheduling Algorithm for Periodic ConcurrenC Models |
Author | *Weiwei Chen, Rainer Doemer (Center for Embedded Computer Systems, University of California, Irvine, U.S.A.) |
Page | pp. 161 - 166 |
Keyword | ConcurrenC, Static Scheduling, Model of Computation, System Level Description Language |
Abstract | Embedded system design usually starts from an executable specification model described in a C-based System Level Description Language (SLDL), such as SystemC or SpecC. In this paper, we identify a subset of well-defined C-based design models, called periodic ConcurrenC models, that can be statically scheduled, resulting in significant higher simulation and execution speed. We propose a novel heuristic scheduling algorithm that not only is faster than classic matrix-based synchronous data flow (SDF) scheduling approaches, but also reduces the model execution time by an order of magnitude over the default discrete event simulation. |
Slides |
Title | (Invited Paper) Design of Networks on Chips for 3D ICs |
Author | *Srinivasan Murali (iNoCs/EPFL, Switzerland), Luca Benini (University of Bologna, Italy), Giovanni De Micheli (EPFL, Switzerland) |
Page | pp. 167 - 168 |
Keyword | Networks on Chips, 3D, topology, application-specific |
Abstract | Three-dimensional integrated circuits, where multiple silicon layers are stacked vertically have emerged recently. The 3D ICs have smaller form factor, shorter and efficient use of wires and allow integration of diverse technologies in the same device. The use of Networks on Chips (NoCs) to connect components in a 3D chip is a necessity. In this short paper, we present an outline on designing application-specific NoCs for 3D ICs. |
Title | (Panel Discussion) 3D Integration and Networks on Chips (Panel) |
Author | Organizer & Moderator: Srinivasan Murali (iNoCs/EPFL, Switzerland), Panelists: Ruchir Puri (IBM, U.S.A.), Paull Marchal (IMEC, Belgium), Yuan Xie (Pennsylvania State University, U.S.A.), Ahmed Jerraya (LETI, France), Nobuaki Miyakawa (Honda Research, Japan) |
Abstract | Vertical stacking of multiple silicon layers, referred to as 3D stacking, is emerging as an attractive solution to continue the pace of growth of Systems on Chips (SoCs). 3D designs have a smaller footprint and shorter wires, leading to lower wire delay and power consumption. Heterogeneous systems can be built effectively, with each layer supporting a diverse technology. The 3D technology has been maturing over the years in addressing thermal issues and achieving high yield. To tackle the on-chip communication problem, a scalable networking paradigm, Networks on Chips (NoCs) has recently emerged. NoCs provide better structure, modularity and scalability when compared to traditional interconnect solutions. NoCs are a necessity for 3D chips: they provide arbitrary scalability of the interconnects across additional layers, efficiently parallelize communication in each layer and help controlling the number of vertical wires needed for inter-layer communication. The combined use of 3D integration technologies and NoCs introduces new opportunities and challenges for designers. In this panel, we will discuss the current state-of-the-art of 3D technologies and how NoC based solutions solve the interconnect problems. We will discuss the opportunities and challenges in adopting NoCs for 3D ICs. |
Wednesday, January 20, 2010 |
Title | Three-Dimensional Integrated Circuit (3D IC) Floorplan and Power/Ground Network Co-synthesis |
Author | Paul Falkerstern, Yuan Xie (Pennsylvania State University, U.S.A.), Yao-Wen Chang (National Taiwan University, Taiwan), *Yu Wang (Tsinghua University, China) |
Page | pp. 169 - 174 |
Keyword | 3D Integration, Emerging Technology, Floorplanning, Co-design |
Abstract | Three Dimensional Integrated Circuits (3D ICs) are emerging technology to improve existing 2D designs by providing smaller chip areas and higher performance and lower power consumption. However, before 3D ICs become a viable technology, the 3D design space needs to be fully explored and 3D EDA tools need to be developed. To help explore the 3D design space and help fill the need for 3D EDA tools, the 3D Floorplan and Power/Ground (P/G) Co-synthesis tool is developed in this work, which develops the floorplan and the P/G network concurrently. Most current 3D IC floorplanners neglect the effects of the 3D P/G network on the design, which may lead to large IR drops in the circuit. To create feasible floorplans with efficient P/G networks, the 3D Floorplan and P/G Co-synthesis tool optimizes the floorplan in terms of wirelength, area and P/G routing area and IR drops. The tool integrates a 3D B*-tree floorplan representation, a resistive P/G mesh, and a Simulated Annealing (SA) engine to explore the floorplan and P/G network of a 3D IC. The results of experiments using the 3D Floorplan and P/G Co-synthesis tool show that 3D ICs tend to increase the P/G routing area while decreasing the IR drops in the circuit. By considering the IR drop while floorplanning, exploring the 3D P/G design space, and evaluating 3D IC’s effect on 3D P/G networks, the 3D Floorplan and P/G Co-synthesis tool can develop a more efficient 3D IC. |
Title | Power and Slew-aware Clock Network Design for Through-Silicon-Via (TSV) Based 3D ICs |
Author | *Xin Zhao, Sung Kyu Lim (Georgia Institute of Technology, U.S.A.) |
Page | pp. 175 - 180 |
Keyword | 3D clock delivery, low power, clock slew control |
Abstract | In this paper, three effective design techniques are presented to effectively reduce the clock power consumption and slew of the 3D clock distribution network: (1) controlling the bound of through-silicon-vias (TSVs) used in between adjacent dies, (2) controlling the maximum load capacitance of the clock buffer, (3) adjusting the clock source location in the 3D stack. We discuss how these design factors affect the overall wirelength, clock power, slew, skew, and routing congestion in the practical 3D clock network design. SPICE simulation indicates that: (1) a 3D clock tree with multiple TSVs achieves up to 31% power saving, 52% wirelength saving and better slew control as compared with the single-TSV case; (2) by placing the clock source on the middle die in the 3D stack, an additional 7.7% power savings, 9.2% wirelength savings, and 33% TSV savings are obtained compared with the clock source on the topmost die. This work aims at helping designers construct reliable low-power and low-slew 3D clock network by making the right decisions on TSV usage, clock buffer insertion, and clock source placement. |
Slides |
Title | A Novel Si-Tunnel FET based SRAM Design for Ultra Low-Power 0.3V VDD Applications |
Author | Jawar Singh (University of Bristol, U.K.), Ramakrishnan Krishnan, Saurabh Mookerjea, Suman Datta, *Vijaykrishnan Narayanan (The Pennsylvania State University, U.S.A.), Dhiraj Pradhan (University of Bristol, U.K.) |
Page | pp. 181 - 186 |
Keyword | TFET, SRAM |
Abstract | Steep sub-threshold transistors are promising candidates to replace the traditional MOSFETs for sub-threshold leakage reduction. In this paper, we explore the use of Inter-Band Tunnel Field Effect Transistors (TFETs) in SRAMs at ultra low supply voltages. The uni-directional current conducting TFETs limit the viability of 6T SRAM cells. To overcome this limitation, 7T SRAM designs were proposed earlier at the cost of extra silicon area. In this paper, we propose a novel 6T SRAM design using Si- TFETs for reliable operation with low leakage at ultra low voltages. We also demonstrate that a functional 6T TFET SRAM design with comparable stability margins and faster performances at low voltages can be realized using proposed design when compared with the 7T TFET SRAM cell. We achieve a leakage reduction improvement of 700X and 1600X over traditional CMOS SRAM designs at VDD of 0.3V and 0.5V which makes it suitable for use at ultra-low power applications. |
Title | CAD Reference Flow for 3D Via-Last Integrated Circuits |
Author | *Chang-Tzu Lin, Ding-Ming Kwai, Yung-Fa Chou, Ting-Sheng Chen, Wen-Ching Wu (SoC Technology Center, Industrial Technology Research Institute, Taiwan) |
Page | pp. 187 - 192 |
Keyword | 3D-LSI, CAD, Via-last, Through-Silicon Via (TSV), Face-to-Back Bonding |
Abstract | Next-decade computing power and interconnect bottle-neck challenge conventional IC design due to the ever increasing demands for high frequency and great bandwidth. Three-dimensional large-scale integration (3D-LSI) provides an opportunity to realize such high performance cores while reducing long latency. In this paper, we present a reference flow for the implementation of 3D via-last ICs in scalable face-to-back bonding style which leverages a mature set of 2D IC physical design tools. The first enabling technology of 3D-LSI is through-silicon via (TSV). Two kinds of TSV diameters are exemplified in the flow, namely, 5μm and 50μm. We propose an easy-to-adopt method to address the TSV-aware mixed-sized placement by considering the obstructions generated from adjacent-tier's floorplan, subject to certain TSV alignment constraints. Furthermore, the technique of clock tree synthesis (CTS) for a homogeneous die stack is developed to dramatically reduce the clock latency and skew. The mixed-sized placement and CTS of each tier can be done without iteration. To the best of our knowledge, no work has ever been published in literature discussing CTS for 3D via-last integration in a face-to-back fashion. Finally, to complete the proposed flow 2D timing-driven routing and modified off-line design rule check (DRC) and layout versus schematic (LVS) verification are performed very well. |
Slides |
Title | Energy and Performance Driven Circuit Design for Emerging Phase-Change Memory |
Author | Dimin Niu, *Yibo Chen, Xiangyu Dong, Yuan Xie (The Pennsylvania State University, U.S.A.) |
Page | pp. 193 - 198 |
Keyword | Phase Change Memory, access device, design methodology |
Abstract | Phase-Change Random Access Memory (PRAM) has become one of the most promising emerging memory technologies, due to its attractive features such as high density, fast access, non-volatility, and good scalability. The physical characteristics of a PRAM cell mainly depend on the material characteristic and the fabrication process. However, the access device and the operating voltage have significant impact on the PRAM performance, energy dissipation, and lifetime. In this paper, we study the design constraints for PRAM memory array, and propose design optimizations of the access device and the circuit operational voltage. The important features of PRAM memory, such as power consumption, read/write stability, speed, as well as lifetime are all considered as the constrained conditions in the proposed optimizations. Experimental results showed that the proposed methodology can provide a reliable design space for the access device and the operating voltage. |
Title | Current Source Modeling in the Presence of Body Bias |
Author | Saket Gupta, *Sachin S. Sapatnekar (University of Minnesota, U.S.A.) |
Page | pp. 199 - 204 |
Keyword | Current source modeling, timing analysis, body bias |
Abstract | With the increasing use of adaptive body biases in high-performance designs, it has become necessary to build timing models that can include these effects. State-of-the-art timing tools use current source models (CSMs), which have proven to be fast and accurate. However, a straightforward extension of CSMs to incorporate multiple body biases results in unreasonably large characterization tables for each cell. We propose a new approach to compactly capture body bias effects within a mainstream CSM framework. Our approach features a table reduction method for compact storage, and a fast and novel waveform sensitivity method for timing evaluation. On a 45nm technology, we demonstrate high accuracy, with worst-case errors of under 5% in both slew and delay as compared to HSPICE. We show a speedup of over five orders of magnitude over HSPICE and almost 70x over conventional CSMs. |
Slides |
Title | Manifold Construction and Parameterization for Nonlinear Manifold-Based Model Reduction |
Author | *Chenjie Gu, Jaijeet Roychowdhury (University of California, Berkeley, U.S.A.) |
Page | pp. 205 - 210 |
Keyword | Model Reduction, Manifold, Integral Curve |
Abstract | We present a new manifold construction and parameterization algorithm for model reduction approaches based on projection on manifolds. The new algorithm employs two key ideas: (1) we define an ideal manifold for nonlinear model reduction to be the solution of a set of differential equations with the property that the tangent space at any point on the manifold spans the same subspace as the low-order subspace (e.g., Krylov subspace generated by moment-matching techniques) of the linearized system; (2) we propose the concept of normalized integral curve equations, which are repeatedly solved to identify an almost-ideal manifold. The manifold constructed by our algorithm inherits the important property in [1] that it covers important system responses such as DC and AC responses. It also preserves better local distance metrics on the manifold, thanks to the employment of normalized integral curve equations. To gauge the quality of the resulting manifold, we also derive an error bound of the moments of linearized systems, assuming moment- matching techniques are employed to generate low-order subspaces for linearized systems. The algorithm is also more systematic and generalizable to higher dimensions than the ad hoc procedure in [1]. We illustrate the key ideas through a simple 2-D example. We also combine this new manifold construction and parameterization algorithm with maniMOR [1] to generate reduced models for a quadratic nonlinear system and a CMOS circuit. Simulation results are provided, together with comparisons to full models as well as TPWL reduced models [2]. |
Slides |
Title | A Fast Analog Mismatch Analysis by an Incremental and Stochastic Trajectory Piecewise Linear Macromodel |
Author | *Hao Yu (Berkeley Design Automation, U.S.A.), Xuexin Liu, Hai Wang, Sheldon Tan (UC Riverside, U.S.A.) |
Page | pp. 211 - 216 |
Keyword | mismatch, stochastic differential-algebra-equation, nonlinear macromodel |
Abstract | To cope with an increasing complexity when analyzing analog mismatch in sub-90nm designs, this paper presents a fast non-Monte-Carlo method to calculate mismatch in time domain. The local random mismatch is described by a noise source with an explicit dependence on geometric parameters, and is further expanded by stochastic orthogonal polynomials (SOPs). This forms a stochastic differential-algebra-equation (SDAE). To deal with large-scale problems, the SDAE is linearized at a number of snapshots along the nominal transient trajectory, and hence is naturally embedded into a trajectory-piecewise-linear (TPWL) macromodeling. The TPWL is improved with a novel incremental aggregation of subspaces identified at those snapshots. Experiments show that the proposed method, isTPWL, is hundreds of times faster than Monte-Carlo method with a similar accuracy. In addition, our macromodel further reduces runtime by up to 25X, and is faster to build and more accurate to simulate compared to existing approaches. |
Slides |
Title | Formal Verification of Tunnel Diode Oscillator with Temperature Variations |
Author | *Kusum Lata, H S Jamadagni (CEDT,Indian Institute of Science, Bangalore, India) |
Page | pp. 217 - 222 |
Keyword | Analog and Mixed Signal Design, Formal Verification, Simulation, Hybrid Systems |
Abstract | In this paper, we propose an extension to the formal verification approach of hybrid systems to verify the Tunnel Diode Oscillator (TDO) with temperature variations. This enables the same platform that is used for validating the hybrid system, to be also used to formally verify the Tunnel Diode Oscillator with temperature variations. The proposed approach utilizes the simulation traces from the actual implementation of the analog circuits to carry out the formal analysis and verification. We demonstrate our approach around Checkmate [1] and Tunnel diode Oscillator (TDO) as a case study. Current-Voltage simulations were performed on a tunnel diode and the basic feature of the I-V characteristics were analyzed in the temperature range 100-300K. TDO is designed and validated based on these characteristics. In particular, TDO has been verified formally for the continuous range of initial conditions at this temperature range. |
Title | Constrained Global Scheduling of Streaming Applications on MPSoCs |
Author | *Jun Zhu, Ingo Sander, Axel Jantsch (Royal Institute of Technology, Sweden) |
Page | pp. 223 - 228 |
Keyword | synchronous data flow, scheduling, buffer minimization, streaming applications, MPSoCs |
Abstract | We present a global scheduling framework for synchronous data flow (SDF) streaming applications on MPSoCs, based on optimized computation and contention-free routing. The global scheduling of processors computing and communication transactions are formulated as constraint based problem, to avoid the scheduling overhead in TDMA-like heuristic schemes. A public domain constraint solver is exploited to solve the NP-complete scheduling efficiently, together with problem specific constraint modeling techniques. Experimental results show that the proposed framework can achieve a high predictable application throughput with minimized buffer cost. For instance, for applications in communication domain, higher throughput (up to 87%) has been observed with less buffer cost, compared to scenarios considering the heuristic scheduling overhead. |
Slides |
Title | Analyzing Impact of Multiple ABB and AVS Domains on Throughput of Power and Thermal-Constrained Multi-Core Processors |
Author | Jungseob Lee, Shi-Ting Zhou, *Nam Sung Kim (University of Wisconsin-Madison, U.S.A.) |
Page | pp. 229 - 234 |
Keyword | Multicore, AVS, ABB |
Abstract | Recently, semiconductor industries have integrated more cores in a single die, which substantially improves the throughput of the processors running highly-parallel applications. However, many existing applications do not have high enough parallelism to exploit multiple cores in a die, slowing the transition to many-core processors with smaller and more cores that benefit future applications with high parallelism. In this paper, we analyze the impact of multiple adaptive voltage scaling (AVS) and adaptive body biasing (ABB) domains on the throughput of power and thermal-constrained multi-core processors when they are combined with per-core power-gating (PCPG). Both AVS and ABB can be effectively used to either increase frequency (thus throughput) or decrease power consumption of the processors. Meanwhile, PCPG can provide extra power and thermal headroom when application’s parallelism is limited. First, we analyze the throughput impact of applying AVS, ABB, and PCPG for power and thermal constrained multi-core processors. Second, we investigate the impact of multiple AVS and ABB domains on the throughput, and recommend the most cost-effective number of domains for AVS and ABB in 16 and 8-core processors. Our analysis using the 32nm predictive technology model considering within-die variations suggests that the most cost-effective number of domains for AVS and/or ABB should be one for each when they are combined with PCPG in both 16 and 8-core processors. Since within-die core-to-core variations provide many choices in terms of core frequency and power consumption for limited-parallelism applications, one AVS or ABB domain can leads to the throughput improvement by 1.77~2.49×; more than one AVS and/or ABB domains only improve the throughput marginally. |
Title | Source-Level Timing Annotation for Fast and Accurate TLM Computation Model Generation |
Author | Kai-Li Lin, *Chen-Kang Lo, Ren-Song Tsay (National Tsing Hua University, Taiwan) |
Page | pp. 235 - 240 |
Keyword | TLM, timing annotation |
Abstract | This paper proposes a source-level timing annotation method for generation of accurate transaction level models for software computation modules. While Transaction Level Modeling (TLM) approach is widely adopted now for system modeling and simulation speed improvement, timing estimation accuracy often is compromised. To have reliable and accurate estimation results at system level, we propose a timing annotation method for accurate TLM computation model generation considering processor architecture with pipeline and cache structures, which are challenging but critical to accurate timing estimation. The experiments show that our results are within 2% of cycle accurate results and the approach is three orders faster than conventional ISS approaches. |
Title | Improved On-Chip Router Analytical Power and Area Modeling |
Author | Andrew B. Kahng, Bill Lin, *Kambiz Samadi (UC San Diego, U.S.A.) |
Page | pp. 241 - 246 |
Keyword | Network-on-Chip, System-level, On-Chip Router |
Abstract | Over the course of this decade, uniprocessor chips have given way to multi-core chips which have become the primary building blocks of today’s computer systems. The presence of multiple cores on a chip shifts the focus from computation to communication as a key bottleneck to achieving performance improvements. As industry moves towards many-core chips, networks-on-chip (NoCs) are emerging as the scalable fabric for interconnecting the cores. With power now the first-order design constraint, early-stage estimation of NoC power has become crucially important. Existing power models (e.g., ORION 2.0 [12], Xpipes [7], etc.) are based on certain router microarchitecture and circuit implementation. Therefore, when validated against different NoC prototypes - different router implementations – we saw significant deviation (up to 40% on average) that can lead to erroneous NoC design choices. This has prompted our development of a new, accurate architecture- and circuit implementation-independent router power and area modeling methodology with complete portability across existing NoC component libraries. Also, validation against a range of implemented router designs confirms substantial improvement in accuracy over existing models. |
Slides |
Title | (Invited Paper) Data Learning Based Diagnosis |
Author | *Li-C. Wang (Univ. of California, Santa Barbara, U.S.A.) |
Page | pp. 247 - 254 |
Keyword | diagnostics, machine learning, rule induction, yield |
Abstract | This paper illustrates a link between traditional perspective of diagnosis and a new perspective where diagnosis is seen as a form of data learning. We illustrate a diagnosis framework that employs various data learning techniques to implement two diagnosis approaches: feature ranking and rule extraction. We review the work that has been accomplished for implementing this framework and further discuss issues with its practical application. |
Slides |
Title | (Invited Paper) Using Introspective Software-based Testing for Post-silicon Debug and Repair |
Author | Todd Austin (Univ. of Michigan, U.S.A.) |
Title | (Invited Paper) Post-silicon Debugging for Multi-core Designs |
Author | *Valeria Bertacco (Univ. of Michigan, U.S.A.) |
Page | pp. 255 - 258 |
Keyword | validation, post-silicon, multi-core |
Abstract | Escaped errors in released silicon are growing in number due to the increasing complexity of modern processor designs and shrinking production schedules. Worsening the problem are recent trends towards chip multiprocessors (CMPs) with complex and sometimes on-deterministic memory subsystems prone to subtle, devastating bugs. This deteriorating situation is causing a growing portion of the validation effort to shift to post-silicon, when the first few hardware prototypes become available and where validation experiments are run directly on newly manufactured prototype hardware. In this work we first discuss the current needs of the industry in this space. We then overview some recent ideas developed in our research group to leverage the performance advantage of post-silicon validation, while sidestepping its limitations of low observability and debuggability. Finally we present some of today's general trends in post-silicon validation research. |
Title | (Invited Paper) Low-cost Design for Repair with Circuit Partitioning |
Author | Kyungho Kim, Byungtae Kang, Dongyun Kim (Samsung Electronics Co., Republic of Korea), Sungchul Lee, Juyong Shin, *Hyunchul Shin (Hanyang University, Republic of Korea) |
Page | pp. 259 - 261 |
Keyword | low-cost, repair, partition |
Abstract | Silicon validation becomes difficult because of rapidly increasing complexity and operation speed of integrated circuits. When an error is found after a chip is fabricated, post-silicon repair is necessary. Full mask revision may significantly increase the cost and time-to-market. In this paper, we describe partial metal revision techniques in which only top-level metal layers are revised to fix “small” errors with minimal increase of the cost. When an error cannot be fixed by partial metal layer revision, full metal revision or full mask revision is necessary. However, frequently errors are small enough to be fixed by partial metal layer revision. Effective partitioning and pin-extension to top-level metal layers can significantly improve the repairability by using top-level metal revision. |
Title | (Invited Paper) On Signal Tracing in Post-silicon Validation |
Author | *Qiang Xu, Xiao Liu (The Chinese University of Hong Kong, Hong Kong) |
Page | pp. 262 - 267 |
Keyword | Verification, Design, Trace-Based, Post-Silicon Validation |
Abstract | It is increasingly difficult to guarantee the first silicon success for complex integrated circuit (IC) designs. Post-silicon validation has thus become an essential step in the IC design flow. Tracing internal signals during circuit's normal operation, being able to provide real-time visibility to the circuit under debug (CUD), is one of the most effective silicon debug techniques and has gained wide acceptance in industrial designs. Trace-based debug solution, however, involves non-trivial design for debug overhead. How to conduct signal tracing effectively for bug elimination is therefore a challenging task for IC designers. In this paper, we provide in-depth discussion for trace-based debug strategy and review recent advancements in this important area. |
Title | CrossRouter: A Droplet Router for Cross-Referencing Digital Microfluidic Biochips |
Author | *Zigang Xiao, Evangeline F.Y. Young (The Chinese University of Hong Kong, Hong Kong) |
Page | pp. 269 - 274 |
Keyword | droplet routing, biochip, cross-referencing, DMFB, microfluidic |
Abstract | Digital Microfluidic Biochip (DMFB) has drawn lots of attention today. It offers a promising platform for various kinds of biochemical experiments. DMFB that uses cross-referencing technology to drive droplets movements scales down the control pin number on chip, which not only brings down manufacturing cost but also allows large-scale chip design. However, the cross-referencing scheme that imposes different voltage on rows and columns to activate the cells, might cause severe electrode interference, and hence greatly decreases the degree of parallelism of droplet routing. Most of the previous papers get a direct-addressing result first, and then convert to cross-referencing compatible result.This paper proposes a new method that solves the droplet routing problem on cross-referencing biochip directly. Experimental results on public benchmarks demonstrate the effectiveness and efficiency of our method in comparison with the latest work on this problem. |
Slides |
Title | Optimal Simultaneous Pin Assignment and Escape Routing for Dense PCBs |
Author | *Hui Kong, Tan Yan, Martin D. F. Wong (University of Illinois at Urbana-Champaign, U.S.A.) |
Page | pp. 275 - 280 |
Keyword | Pin assignment, Escape routing, PCB routing, Algorithms |
Abstract | In PCB designs, pin positions greatly affect routability of the design. State-of-the-art pin assignment algorithms are guided by simple (heuristic) metrics to estimate routability and thus have no guarantee to obtain a routable solution. In this paper, we present a novel approach to obtain a pin assignment solution that guarantees routability. We show that the problem of simultaneous pin assignment and escape routing can be solved optimally in polynomial time. We then focus on the pin assignment and escape routing for the terminals in a bus, and present algorithmic enhancements as well as discuss the trade-offs between single-layer and multi-layer implementations. We tested our approach on a state-of-the-art industrial board with 80 buses (over 7000 nets). The pin assignment and escape routing solutions for all the 80 buses are successfully obtainted in less than 5 minutes of CPU time. |
Title | CAFE router: A Fast Connectivity Aware Multiple Nets Routing Algorithm for Routing Grid with Obstacles |
Author | *Yukihide Kohira (The University of Aizu, Japan), Atsushi Takahashi (Osaka University, Japan) |
Page | pp. 281 - 286 |
Keyword | river routing, length-matching, PCB routing |
Abstract | In this paper, we propose CAFE router which obtains routes of multiple nets with target wire lengths for single layer routing grid with obstacles. CAFE router extends the route of each net from a pin to the other pin greedily so that the wire length of the net approaches its target wire length. Experiments show that CAFE router obtains the routes of nets with small length error in short time. |
Title | Obstacle-Aware Longest Path using Rectangular Pattern Detouring in Routing Grids |
Author | Jin-Tai Yan, Ming-Ching Jhong, *Zhi-Wei Chen (Chung Hua University, Taiwan) |
Page | pp. 287 - 292 |
Keyword | detailed routing, detouring path, bus routing, pattern routing |
Abstract | As the clock frequency increases, signal propagation delays on PCBs are requested to meet the timing specifications with very high accuracy. Generally speaking, the length controllability of a net decides the routing delay of the net. If a routing result has the higher length controllability, the routing delay will be obtained with higher accuracy. In this paper, given a start terminal, S, and a target terminal, T, in mxn routing grids with obstacles, based on the rectangular partition in routing grids and the analysis of unreachable grids in rectangular pattern detouring, an efficient O(mnlog(mn)) algorithm is proposed to generate the longest path in routing grids from S to T. Compared with the US routing[5], our proposed routing approach can achieve longer paths for tested examples in less CPU time. |
Slides |
Title | A Performance-Constrained Template-Based Layout Retargeting Algorithm for Analog Integrated Circuits |
Author | Zheng Liu, *Lihong Zhang (Memorial University of Newfoundland, Canada) |
Page | pp. 293 - 298 |
Keyword | Retargeting, Performance Sensitivity, Layout Parasitics, Piecewise, Constraints |
Abstract | Performance of analog integrated circuits is highly sensitive to layout parasitics. This paper presents an improved template-based algorithm that automatically conducts performance-constrained parasitic-aware retargeting and optimization of analog layouts. In order to achieve desired circuit performance, performance sensitivities with respect to layout parasitics are first determined. Then the algorithm applies a piecewise-sensitivity model to control parasitic-related layout geometries by directly constructing a set of performance constraints subject to maximum performance deviation due to parasitics. The formulated problem is finally solved using graph-based techniques combined with mixed-integer nonlinear programming. The proposed method has been incorporated into a parasitic-aware automatic layout optimization and retargeting tool. It has been demonstrated to be effective and efficient especially when adapting layout design for new technologies or updated specifications. |
Slides |
Title | Symmetry-Aware TCG-Based Placement Design under Complex Multi-Group Constraints for Analog Circuit Layouts |
Author | *Rui He, Lihong Zhang (Memorial University of Newfoundland, Canada) |
Page | pp. 299 - 304 |
Keyword | placement, analog layout, symmetry constraints |
Abstract | This paper presents a solution to handling complex multi-group symmetry constraints in the placement design using transitive closure graph (TCG) representation for analog layouts. We propose a set of symmetric-feasible conditions, which can automatically satisfy symmetry requirements. We also develop a new contour-based packing scheme with time complexity of O(g*n*lgn), where g is the number of symmetry groups and n is the number of the placed cells. Furthermore, we devise a set of perturbation operations with time complexity of O(n). Our experimental results show the effectiveness and superiority of this proposed scheme compared to the other state-of-the-art placement algorithms for analog layout design. |
Slides |
Title | Regularity-Oriented Analog Placement with Diffusion Sharing and Well Island Generation |
Author | *Shigetoshi Nakatake (University of Kitakyushu, Japan), Masahiro Kawakita, Takao Ito (Toshiba Corp., Japan), Masahiro Kojima, Michiko Kojima, Kenji Izumi, Tadayuki Habasaki (NEC Micro Systems, Ltd., Japan) |
Page | pp. 305 - 311 |
Keyword | analog placement, regularity-oriented, well island, diffusion sharing |
Abstract | This paper presents a novel regularity evaluation of placement structure and MOS analog specific layout techniques called diffusion sharing and well island generation, which are developed based on Sequence-Pair. The regular structures such as topological row, array and repetitive structure are characterized by the way of forming subsequences of a sequence-pair. A placement objective is formulated balancing the regularity and the area efficiency. Furthermore, diffusion sharing and well island can be also identified looking into forming of a sequence-pair. In experiments, we applied our regularity-oriented placement mixed with the constraint-driven technique to real analog designs, and attained the results comparable to manual designs even when imposing symmetry constraints. Besides, the results also revealed the regularity serves to increase row-structures applicable to the diffusion-sharing for the area and wire-length saving. |
Title | A Novel Characterization Technique for High Speed I/O Mixed Signal Circuit Components Using Random Jitter Injection |
Author | *Ji Hwan (Paul) Chun (Intel Corporation, U.S.A.), Jae Wook Lee, Jacob A. Abraham (The University of Texas at Austin, U.S.A.) |
Page | pp. 312 - 317 |
Keyword | Phase interpolator, High Speed I/O, Linearity, SerDes, DNL |
Abstract | Timing problems in high-speed serial communications are mitigated with phase-interpolator (PI) circuitry. Linearity testing of PI has been challenging, even though PI is widely used in modern high speed I/O architectures. Previous research has focused on implementing additional built-in circuits to measure PI linearity. In this paper, we present a cost effective PI linearity measurement technique which requires no significant modification of existing I/O circuits. Our method uses jitter distributions obtained from random jitter injected into the data channel. Two distributions are separately obtained using undersampling and sampling using PI. The proposed algorithm calculates the differential nonlinearity (DNL) from the difference of these distributions. Simulation results show that the average prediction RMS error for the DNL calculation is 0.31 LSB. |
Title | Technology Mapping with Crosstalk Noise Avoidance |
Author | Fang-Yu Fan (TSMC, Taiwan), *Hung-Ming Chen (NCTU, Taiwan), I-Min Liu (Atoptech, U.S.A.) |
Page | pp. 319 - 324 |
Keyword | Technology Mapping, Crosstalk Noise |
Abstract | In today's VLSI designs, crosstalk effects causing chips to fail or suffer from low yields have become one of the very essential design issues. In this paper, we attempt to reduce crosstalk noise in logic and physical synthesis stage, which is usually done in post-layout stage. We propose a technology mapping method that can reduce the crosstalk noise while meeting delay constraints. The algorithm employing a dynamic programming framework in the matching phase determines the routing of fanin nets for all the matches to estimate the track utilization in probability. These routings are stored as virtual routing maps to compute the crosstalk noise during the covering phase, which will select the crosstalk-minimal solutions satisfying the delay constraints rather than the delay-minimal ones. This problem is different from wire congestion-driven technology mapping and our experimental results are encouraging. We experiment on the benchmark circuits in 90nm process, the results show that, with 7% of area increase, our proposed approach is effective to improve the crosstalk by 28% on average, as compared to the conventional delay- and/or congestion-driven technology mapping. The overall result is better than the efforts done in post-layout stage, and has been validated by modern commercial EDA tools. In addition, this proposed approach can be applied in local technology remapping at post-placement/post-routing and ECO stages as well. |
Slides |
Title | Fault-Tolerant Resynthesis with Dual-Output LUTs |
Author | Ju-Yueh Lee (Electrical Engineering Department, UCLA, U.S.A.), Yu Hu (Electrical and Computer Engineering Department, University of Alberta, Canada), Rupak Majumdar (Conputer Science Department, UCLA, U.S.A.), *Lei He (Electrical Engineering Department, UCLA, U.S.A.), Minming Li (Computer Science Department, City University of Hong Kong, Hong Kong) |
Page | pp. 325 - 330 |
Keyword | FPGA, Fault-tolerant, Logic Synthesis |
Abstract | We present a fault-tolerant post-mapping resynthesis for FPGA-based designs that exploits the dual-output feature of modern FPGA architectures to improve the reliability of a mapped circuit against faults. Emerging FPGA architectures, such as 6-LUTs in Xilinx Virtex-5 and 8-input ALMs in Altera Stratix-III, have a secondary LUT output that allows access to non-occupied SRAM bits. We show that this architectural feature can be used to build redundancy for fault masking with limited area and performance overhead. Our algorithm improves reliability of a mapping by performing two basic operations: duplication (in which free configuration bits are used to duplicate a logic function whose value is obtained at the secondary output) and encoding (in which two copies of the same logic function are ANDed or ORed together in the fanout of the duplicated logic). The problem of fault tolerant post-mapping resynthesis is then formulated as the optimal duplication and encoding scheme that ensures the minimal circuit fault rate w.r.t. a stochastic single fault model. We present an ILP formulation of this problem and an efficient algorithm based on generalized network flow. On MCNC benchmarks, experimental results show that for combinational circuits the proposed approach improves mean-time-to-failure(MTTF) by 27% with 4% area overhead, and the proposed approach with explicit area redundancy improves MTTF by 113% with 36% area overhead, compared to the baseline mapping by ABC. This provides a viable fault tolerance solution for non-mission critical applications compared to TMR (triple modular redundancy) which has a 5×-6× area overhead. |
Slides |
Title | TRECO: Dynamic Technology Remapping for Timing Engineering Change Orders |
Author | *Kuan-Hsien Ho, Jie-Hong Roland Jiang, Yao-Wen Chang (National Taiwan University, Taiwan) |
Page | pp. 331 - 336 |
Keyword | Engineering Change Orders, Technology Remapping, Spare Cell |
Abstract | Due to the increasing IC design complexity, Engineering Change Orders (ECOs) have become a necessary technique to resolve late-found functional and/or timing deficiencies. To fix timing violations, the principles of gate sizing and buffer insertion are commonly used in post-mask ECO. These techniques however may not be powerful enough, especially when spare cells are inserted in a way of striking a balance between functional and timing repair capabilities. We propose a post-mask ECO technique, called TRECO, to remedy timing violations based on technology remapping, which supports functional ECO as well. Unlike conventional technology mapping, TRECO performs technology mapping with respect to a limited set of spare cells and confronts dynamic changes of wiring cost incurred by different spare-cell selections. With a pre-computed lookup table of representative circuit templates, TRECO iteratively performs technology remapping to restructure timing critical subcircuits until no timing violation remains. Experimental results on five industrial designs show the effectiveness of TRECO in ECO timing optimization. |
Slides |
Title | Multi-Operand Adder Synthesis on FPGAs Using Generalized Parallel Counters |
Author | *Taeko Matsunaga, Shinji Kimura (Waseda University, Japan), Yusuke Matsunaga (Kyushu University, Japan) |
Page | pp. 337 - 342 |
Keyword | multi-operand addition, generalized parallel counter, FPGA, arithmetic synthesis |
Abstract | Multi-operand adders usually consist of compression trees which reduce the number of operands per a bit to two, and a carry-propagate adder for the two operands in ASIC implementation. The former part is usually realized using full adders or (3;2) counters like Wallace-trees in ASIC, while adder trees or dedicated hardware are used in FPGA. In this paper, an approach to realize compression trees on FPGAs is proposed. In case of FPGA with m-input LUT, any counters with up to m inputs can be realized with one LUT per an output. Our approach utilizes generalized parallel counters (GPCs) with up to m inputs and synthesizes high-performance compression trees by setting some intermediate height limits in the compression process like Dadda's multipliers. Experimental results show its effectiveness against existing approaches at GPC level and on Altera's Stratix III. |
Title | Checker-Pattern and Shared Two Pixels LOFIC CMOS Image Sensors |
Author | *Yoshiaki Tashiro, Shun Kawada, Shin Sakai, Shigetoshi Sugawa (Tohoku University, Japan) |
Page | pp. 343 - 344 |
Keyword | CMOS Image Sensor, high-resolution, Pixel Scailing, LOFIC |
Abstract | Two wide dynamic range CMOS image sensors with lateral overflow integration capacitor have been developed. A checker-pattern image sensor has achieved high area efficiency by placing the color filters and on-chip microlens along the direction at an angle of 45-degree. A shared two pixels image sensor has achieved small pixel pitch by introducing a lateral overflow gate in each pixel. The fabricated image sensors exhibit high full well capacity, low noise, wide dynamic range and high resolution performance. |
Slides |
Title | A CMOS Image Sensor With 2.0-e- Random Noise and 110-ke- Full Well Capacity Using Column Source Follower Readout Circuits |
Author | *Takahiro Kohara, Wonghee Lee (Graduate School of Engineering, Tohoku University, Japan), Koichi Mizobuchi (DISP Development, Texas Instruments Japan, Japan), Shigetoshi Sugawa (Graduate School of Engineering, Tohoku University, Japan) |
Page | pp. 345 - 346 |
Keyword | CMOS image sensor, Column amplifier, noise, full well capacity, lateral overflow integration |
Abstract | A low noise CMOS image sensor without degradation of saturation performance has been developed by using column amplifiers of the gains of about 1.0 in a lateral overflow integration capacitor technology. The 1/4-inch, SVGA CMOS image sensor has achieved 0.98 column readout gain, 100-uV/e- conversion gain, 2.0-e- total random noise, 0.5-e- in readout circuits, 110,000-e- full well capacity and 95-dB dynamic range. Moreover, we measure the pixel noises by using developed readout circuits and optimize pixel operating condition. |
Slides |
Title | Checkered White-RGB Color LOFIC CMOS Image Sensor |
Author | *Shun Kawada, Shin Sakai, Yoshiaki Tashiro, Shigetoshi Sugawa (Tohoku University, Japan) |
Page | pp. 347 - 348 |
Keyword | high sensiivity, wide dynamic range, White-RGB, LOFIC, CMOS |
Abstract | We succeeded in developing a checkered White-RGB color CMOS image sensor based on a lateral overflow integration capacitor (LOFIC) architecture. The LOFIC CMOS image sensor with a 1/3.3-inch optical format, 1280(H) x 480(V) pixels, 4.2-um effective pixel pitch along with 45-degree direction was designed and fabricated through 0.18-um 2-Poly 3-Metal CMOS technology with buried pinned photodiode process. The image sensor has achieved about 108-uV/e- high conversion gain and about 102-dB dynamic range performance in one exposure. |
Slides |
Title | A Versatile Recognition Processor for Sensor Network Applications |
Author | *Risako Takashima, Hanai Yuya, Yuichi Hori, Tadahiro Kuroda (Keio University, Japan) |
Page | pp. 349 - 350 |
Keyword | detection, recognition, wireless sensor network, Haar-like feature |
Abstract | A versatile recognition processor is presented that comprises 2.1M transistors using a 90nm CMOS technology. It performs detection and recognition from image/video, sound and acceleration signals with energy dissipation of sub-mJ/frame. The versatility and the energy efficiency are attributed to optimal architecture design employing Haar-like Feature and Cascaded Classifier. |
Slides |
Title | A 2-6 GHz Fully Integrated Tunable CMOS Power Amplifier for Multi-Standard Transmitters |
Author | Daisuke Imanishi, *JeeYoung Hong, Kenichi Okada, Akira Matsuzawa (Tokyo Institute of Technology, Japan) |
Page | pp. 351 - 352 |
Keyword | Power amplifier, CMOS, tunable, multi-band, SDR |
Abstract | A tunable power amplifier (PA) from 2.1GHz to 6.0GHz is presented for multi-standard radios. The proposed multi-band PA can tune the output impedance to 50Ohm over a wide frequency range, so external isolators following PAs can be eliminated. The PA is implemented by using a 0.18um CMOS process, and the supply voltage is 3.3V. Over all of the frequency range, the PA realizes output return loss S22 of smaller than -8dB, power gain of larger than 12dB, output 1-dB compression point of larger than 15dBm. |
Slides |
Title | An Embedded Debugging/Performance Monitoring Engine for a Tile-based 3D Graphics SoC Development |
Author | *Liang-Bi Chen, Tsung-Yu Ho, Jiun-Cheng Ju, Cheng-Lung Chiang, Chung-Nan Lee, Ing-Jer Huang (National Sun Yat-Sen University, Taiwan) |
Page | pp. 353 - 354 |
Keyword | 3D graphics, Debugging, Performance monitoring, Bus protocol checker, Bus tracer |
Abstract | This paper presents an embedded debugging/ performance monitoring engine (EDPME), which is capable of collect run time characteristics, detect AHB on-chip bus protocol error/inefficiency, and capture on-chip AHB bus traces at various abstraction levels with compression ratio up to 98% for a low cost tile-based 3D graphics SoC development. |
Slides |
Title | Cascaded Time Difference Amplifier using Differential Logic Delay Cell |
Author | *Shingo Mandai (The University of Tokyo, Japan), Toru Nakura, Makoto Ikeda, Kunihiro Asada (VLSI Design and Education Center(VDEC), The University of Tokyo, Japan) |
Page | pp. 355 - 356 |
Keyword | Time Difference Amplifier, Time Amplifier, TOF, TDC |
Abstract | We introduce a 4x4 cascaded time difference amplifier (TDA) using differencial logic delay cells with 0.18um CMOS process. By employing differential logic cells for the delay chain instead of CMOS logic cells, our TDA has stable time difference gain (TD gain) and fine time resolution. Measurement results show that our TDA achieves less than 5.5% TD gain offset and 250ps input range. |
Slides |
Title | Built-in Self At-Speed Delay Binning and Calibration Mechanism in Wireless Test Platform |
Author | Chen-I Chung, Jyun-Sian Jhou, *Ching-Hwa Cheng (Feng Chia University, Taiwan) |
Page | pp. 357 - 358 |
Keyword | at-speed delay test, Built-In Self Test (BIST) |
Abstract | An at-speed BIST delay testing technique is proposed. It differs from traditional circuit speed testing techniques by changing the system clock rate. This method supplies test pattern to the circuit using lower-speed clock frequency, then applies internal BIST circuit to adjust clock edge for circuit at-speed delay testing and speed binning. The self wide-range (26%~76%), fine-scale (34ps) duty cycle adjustment technique with high-precision (28ps) calibration circuit is proposed for at-speed delay test and performance binning. The contribution of this work is the proposal of a feasible self at-speed delay testing technique. Test chip DFT strategies are fully validated by instruments and HOY wireless test system. Key words: scan based delay testing, at speed testing, speed binning |
Title | Dynamic Voltage Domain Assignment Technique for Low Power Performance Manageable Cell Based Design |
Author | Elone Lee, Feng-Tso Chien, *Ching-Hwa Cheng (Feng Chia University, Taiwan), Jiun-In Guo (National Chung Cheng University, Taiwan) |
Page | pp. 359 - 360 |
Keyword | multi voltage, voltage domain, performance-power manageable |
Abstract | Multi-voltage technique is an effective way to reduce power consumption. In the proposed voltage domain programmable (VDP) technique, high and low voltage domains applied to logic gates are programmable. The different voltage domains allow the chip performance and power consumption to be flexibly adjusted during circuit operation. In this proposed internal of the chip technique, the power switches possess the feature of flexible programming after chip manufacturing. The video decoder test chip proof of this novel methodology has 55% power reduction with good power-performance management mechanism. |
Title | Adaptive Performance Control with Embedded Timing Error Predictive Sensors for Subthreshold Circuits |
Author | *Hiroshi Fuketa, Masanori Hashimoto, Yukio Mitsuyama, Takao Onoye (Osaka University, Japan) |
Page | pp. 361 - 362 |
Keyword | subthreshold circuit, manufacturing variability, body biasing |
Abstract | This paper presents an adaptive technique for compensating manufacturing and environmental variability in subthreshold circuits using "canary flip-flop" that can predict timing errors. A 32-bit Kogge-Stone adder whose performance was controlled by body-biasing was fabricated in a 65nm CMOS process. Measurement results show that the adaptive control can reduce the power dissipation by 46% in comparison with the worst-case design with guardbanding. |
Slides |
Title | A 60GHz Direct-Conversion Transmitter in 65nm CMOS Technology |
Author | *Naoki Takayama, Kouta Matsushita, Shogo Ito, Ning Li, Kenichi Okada, Akira Matsuzawa (Tokyo Institute of Technology, Japan) |
Page | pp. 363 - 364 |
Keyword | transmitter, mm-wave, CMOS, power amplifier |
Abstract | This paper presents a 60 GHz direct-conversion transmitter in 65 nm CMOS technology. The power amplifier consists of 4-stage transistors. The circuit model of de-coupling capacitor is built as a transmission line to consider the physical length. In the measurement results, the conversion gain is above 9.6dB at 58-65GHz band, and the 1 dB compression point is 1.6 dBm with 60 GHz LO frequency and 1 dB LO power. |
Slides |
Title | An Electrically Adjustable 3-Terminal Regulator with Post-Fabrication Level-Trimming Function |
Author | *Hiroyuki Morimoto, Hiroki Koike, Kazuyuki Nakamura (Kyushu Institute of Technology, Japan) |
Page | pp. 365 - 366 |
Keyword | 3-terminal, regulator, serial, control, adjustment |
Abstract | This paper describes a new technique for 3-terminal regulators to adjust the output voltage level without additional terminals or extra off-chip components. By applying a serial control pattern using the intermediate voltage level between the supply voltage and the regulator output, the adjustment data in the internal nonvolatile memory are safely updated without noise disturbance. In an on-board test with a chip fabricated using a 0.35-um standard CMOS process, we confirm successful output voltage adjustment with sub-10mV precision. |
Title | Fine Resolution Double Edge Clipping with Calibration Technique for Built-In At-Speed Delay Testing |
Author | Chen-I Chung, Shuo-Wen Chang, Feng-Tso Chien, *Ching-Hwa Cheng (Feng Chia University, Taiwan) |
Page | pp. 367 - 368 |
Keyword | at speed delay test, built-in self test, lunch off capture |
Abstract | At speed Built-In Self Test (BIST) circuit can solve many test challenges generated from traditionally slower Automatic Test Equipment (ATE). In this paper, a double edge clipping technique is proposed for built-in at-speed delay testing requirements. It differs from traditional circuit delay testing techniques by changing the clock rate using external ATE. This method uses lower-speed input clock frequency, then applies internal BIST technique to adjust clock edges for circuit at-speed delay testing and speed binning. Test chips are fully validated. The fine-scale (16ps) progressive capture edge adjustment technique with high-precision (28ps) calibration circuit is effective for at-speed delay testing and performance binning. |
Title | Geyser-1: A MIPS R3000 CPU core with fine-grained run-time Power Gating |
Author | Diasuke Ikebuchi, Naomi Seki, Yuu Kojima, *Masahiro Kamata, Zhao Lei, Hideharu Amano (Keio University, Japan), Toshiki Shirai, Satoshi Koyama, Tatsunori Hashida, Yusuke Umahashi, Hiroki Masuda, Kimiyoshi Usami (Shibaura Institute of Technology, Japan), Seidai Takeda, Hiroshi Nakamura (University of Tokyo, Japan), Mitaro Namiki (University of Agriculture and Technology, Japan), Masaaki Kondo (The University of Electro-Communications, Japan) |
Page | pp. 369 - 370 |
Keyword | CPU, Power Gating, Low leakage power |
Abstract | Geyser-1, a prototype MIPS R3000 CPU with fine-grained runtime PG for major computational components in the execution stage is available. Function units such as ALU, shifter, multiplier and divider are power-gated and controlled in runtim such that only the function nit to be used in an instruction is powerd-on to minimize the leakage power. The evaluation results with the real chip reveals that the fine-grained runtime PG mechanism works at least 60MHz clock without electric problems. It reduces the leakage power 7% at 25 centigrade and 24% at 80 centigrade. The evaluation using benchmark programs show that the consuming power can be reduced from 3% at 25 centigrade and 30% at 80 centigrade. |
Slides |
Title | A WiMAX Turbo Decoder with Tailbiting BIP Architecture |
Author | *Hiroaki Arai, Naoto Miyamoto, Koji Kotani (Tohoku University, Japan), Hisanori Fujisawa (Fujitsu Laboratories Ltd., Japan), Takashi Ito (Tohoku University, Japan) |
Page | pp. 371 - 372 |
Keyword | WiMAX, turbo decoder, tailbiting, block-interleaved pipelining (BIP), Max-Log-MAP |
Abstract | A tailbiting block-interleaved pipelining (TB-BIP) is proposed for deeply-pipelined turbo decoders. Conventional sliding window block-interleaved pipelining (SW-BIP) turbo decoders suffer from many warm-up calculations when the number of pipeline stages is increased. However, by using TB-BIP, more than 50% of the warm-up calculations are reduced as compared to SW-BIP. We have implemented a TB-BIP WiMAX turbo decoder with four pipeline stages in the area of 3.8 mm2 using a 0.18 mm CMOS technology. The chip achieved 45 Mbps/iter and 3.11 nJ/b/iter at 99 MHz operation. |
Title | Temporal Circuit Partitioning for a 90nm CMOS Multi-Context FPGA and its Delay Measurement |
Author | *Naoto Miyamoto, Tadahiro Ohmi (Tohoku University, Japan) |
Page | pp. 373 - 374 |
Keyword | multi-context FPGA, execution latency, temporal partitioning, temporal communication |
Abstract | In this paper, we present a multi-context FPGA named Flexible Processor (FP) and its execution delay measurement results. A temporal partitioning algorithm for the FP has been developed, which divides a long critical path into equal-length shorter paths. The FP has been designed and fabricated by using a 90nm CMOS technology. From the measurement results, the execution latency remains constant regardless of the number of contexts used. |
Slides |
Title | Design and Chip Implementation of an Instruction Scheduling Free Ubiquitous Processor |
Author | *Masa-aki Fukase, Ryosuke Murakami, Tomoaki Sato (Hirosaki University, Japan) |
Page | pp. 375 - 376 |
Keyword | Ubiquitous Processor, CMOS chip |
Abstract | Instruction scheduling is a crucial issue for cutting edge VLSI processors that exploit parallelism to achieve power conscious high performance. A double scheme that merges scalar units into a multifunctional unit (MFU) and makes resultant MFU wave-pipeline achieves instruction scheduling free ILP (instruction level parallelism). Applying the double scheme to chip design, the latest chip of the ubiquitous processor architecture, HCgorilla is implemented by using 0.18-um CMOS standard cell technology. |
Slides |
Title | MUCCRA-3: A Low Power Dynamically Reconfigurable Processor Array |
Author | Yoshiki Saito, Toru Sano, Masaru Kato, Vasutan Tunbunheng, Yoshihiro Yasuda, *Masayuki Kimura, Hideharu Amano (Keio University, Japan) |
Page | pp. 377 - 378 |
Keyword | dynamically reconfigurable |
Abstract | MuCCRA-3 is a low power coarse-grained Dynamically Reconfigurable Processor Array (DRPA) for a flexible off-loading engine in various SoC(System-on-a-Chip). Similar to the other DRPAs, it has an array of processing elements (PEs), a simple coarse-grained processor, consisting of an ALU and a register file, and dynamic reconfiguration of the array enables time-multiplexed execution. For low power computation, the PE array structure of MuCCRA-3 is optimized according to the evaluation results of previous prototypes, MuCCRA-1 and 2[1], and was implemented with 65nm low power CMOS process from Fujitsu. By using a real chip, the power consumption and performance are evaluated. The evaluation results suggest that MuCCRA-3 works with extremely low power. |
Slides |
Title | Rapid Prototyping on a Structured ASIC Fabric |
Author | *Steve C.L. Yuen, Yan-Qing Ai, Brian P.W. Chan (Department of Electronic Engineering, The Chinese University of Hong Kong, Hong Kong), Thomas C.P. Chau (Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong), Sam M.H. Ho, Oscar K.L. Lau, Kong-Pang Pun (Department of Electronic Engineering, The Chinese University of Hong Kong, Hong Kong), Philip H.W. Leong (School of Electrical and Information Engineering, University of Sydney, Australia), Oliver C.S. Choy (Department of Electronic Engineering, The Chinese University of Hong Kong, Hong Kong) |
Page | pp. 379 - 380 |
Keyword | structured ASIC, metal programmable fabric, rapid prototyping, cell design, tool integration |
Abstract | We describe the architecture of a structured ASIC fabric in which the logic and routing can be customized using three masks. A standard Cadence based design flow is employed, and using an active dynamic backlight controller as an example, performance is compared to that of an ASIC implementation in the same technology. |
Slides |
Title | A High Performance Low Complexity Joint Transceiver for Closed-Loop MIMO Applications |
Author | Jian-Lung Tzeng, Chien-Jen Huang, *Yu-Han Yuan, Hsi-Pin Ma (National Tsing Hua University, Taiwan) |
Page | pp. 381 - 382 |
Keyword | Closed-Loop MIMO, Baseband Signal Processing, FPGA implementation |
Abstract | An efficient and practicable MIMO transceiver in which transmitter antenna selection is applied to QR detector and GMD precoding through limited feedback channel is implemented. For over 4x5 antenna selection, the proposed antenna selection scheme can save more than 50% computational complexity compared with that of the exhausting method. From the simulation results, the proposed transceiver can achieve over 6 dB SNR improvement over the open-loop V-BLAST counterparts at BER=10-2 under i.i.d. channel. Finally, a MIMO joint transceiver hardware platform on a Xilinx FPGA is realized to verify the proposed algorithm and architecture. |
Slides |
Title | A Fast Symbolic Computation Approach to Statistical Analysis of Mesh Networks with Multiple Sources |
Author | *Zhigang Hao, Guoyong Shi (Shanghai Jiaotong University, China) |
Page | pp. 383 - 388 |
Keyword | mesh, symbolic, moment, sensitivity, multiple sources |
Abstract | Mesh circuits typically consist of many resistive links and many sources. Accurate analysis of massive mesh networks is demanding in the current integrated circuit design practice, yet their computation confronts numerous challenges. When variation is considered, mesh analysis becomes a much harder task. This paper proposes a symbolic computation technique that can be applied to the moment-based analysis of mesh networks with multiple sources. The variation issues are easily taken care of by a structured computation mechanism, which can naturally facilitate sensitivity based analysis. Applications are addressed by applying the computation technique to a set of mesh circuits with varying sizes. |
Slides |
Title | Minimizing Clock Latency Range in Robust Clock Tree Synthesis |
Author | *Wen-Hao Liu, Yih-Lang Li, Hui-chi Chen (National Chiao Tung University, Taiwan) |
Page | pp. 389 - 394 |
Keyword | Clock Tree Routing, Buffer Insertion, Wire sizing, Zero skew, Clock Latency Range |
Abstract | In the ISPD’09 Clock Network Synthesis (CNS) Contest, clock latency range (CLR) was initially minimized across multiple supply voltages under capacitance and slew constraints. CLR approximates the summation of the clock skew and the maximum source-to-sink delay variation for multiple supply voltages. This work develops an efficient three-stage clock tree synthesis flow for CLR minimization. Experimental results reveal that the proposed can yield less CLR than the top three winners of ISPD’09 CNS contest by 59%, 52.7% and 35.4% respectively. |
Slides |
Title | Blockage-Avoiding Buffered Clock-Tree Synthesis for Clock Latency-Range and Skew Minimization |
Author | *Xin-Wei Shih, Chung-Chun Cheng, Yuan-Kai Ho, Yao-Wen Chang (Graduate Institute of Electronics Engineering, National Taiwan University, Taiwan) |
Page | pp. 395 - 400 |
Keyword | Clock Tree Synthesis, Buffer Insertion, Skew and Slew Rate, Process Variation, Clock Latency Range |
Abstract | In high-performance nanometer synchronous chip design, a buffered clock tree with high tolerance of process variations is essential. The nominal clock skew always plays a crucial role in determining circuit performance and thus should be a first-order objective for clock-tree synthesis. The clock latency range (CLR), which is the latency difference under different supply voltages, is defined by the 2009 ACM ISPD Clock Network Synthesis Contest as the major optimization objective to measure the effects of process variation on clock-tree synthesis. In this paper, we propose a three-level framework which effectively constructs clock trees by performing blockage-avoiding buffer insertion with both nominal skew and CLR minimization. To cope with the objectives, we present a novel three-stage TTR clock-tree construction algorithm which consists of clock-tree Topology Generation, Tapping-Point Determination, and Routing. Experimental results show that our framework with the TTR algorithm achieves the best average quality for both nominal skew and CLR, compared to all the participating teams for the 2009 ISPD Clock Network Synthesis Contest. |
Slides |
Title | Improved Clock-Gating Control Scheme for Transparent Pipeline |
Author | *Jung Hwan Choi (Samsung Electronics, Republic of Korea), Byung Guk Kim (Purdue University, U.S.A.), Aurobindo Dasgupta (Intel Corp., U.S.A.), Kaushik Roy (Purdue University, U.S.A.) |
Page | pp. 401 - 406 |
Keyword | Low-power, Clock-gating |
Abstract | This paper presents a stage-level clock-gating scheme for clock power improvement. The proposed technique efficiently implements the concept of transparent pipeline which improves clocking power by dynamically making pipeline registers transparent. We developed new control scheme for transparent pipeline which can be applied to any number of pipeline stages. A low-overhead flip-flop with transparent mode is also proposed to reduce implementation overhead. The proposed clock-gating control logic is extended to pipeline collapsing which allows energy/performance trade-off through dynamic frequency scaling. Simulation results on IBM 90nm technology show that the proposed approach has less overhead (~25%) than the previous transparent pipeline scheme and improves up to 40% of clocking power in 64-bit 7-stage pipeline over traditional stage-level clock-gating technique. |
Slides |
Title | Scan-Based Attack against Elliptic Curve Cryptosystems |
Author | *Ryuta Nara, Nozomu Togawa, Masao Yanagisawa, Tatsuo Ohtsuki (Waseda University, Japan) |
Page | pp. 407 - 412 |
Keyword | scan path, scan-based attack, elliptic curve cryptosystem, LSI |
Abstract | Scan-based attacks are techniques to decipher a secret key using scanned data obtained from a cryptography circuit. Public-key cryptography, such as RSA and elliptic curve cryptosystem (ECC), is extensively used but conventional scan-based attacks cannot be applied to it, because it has a complicated algorithm as well as a complicated architecture. This paper proposes a scan-based attack which enables us to decipher a secret key in ECC. The proposed method is based on detecting intermediate values calculated in ECC. By monitoring the 1-bit sequence in the scan path, we can find out the register position specific to the intermediate value in it and we can know whether this intermediate value is calculated or not in the target ECC circuit. By using several intermediate values, we can decipher a secret key. The experimental results demonstrate that a secret key in a practical ECC circuit can be deciphered using 29 points over the elliptic curve E within 40 seconds. |
Slides |
Title | Secure and Testable Scan Design Using Extended de Bruijn Graphs |
Author | Hideo Fujiwara, *Marie Engelene J. Obien (Nara Institute of Science and Technology, Japan) |
Page | pp. 413 - 418 |
Keyword | Secure scan design, security, testability, design for test, extended de Bruijn graph |
Abstract | In this paper, we first introduce extended de Bruijn graphs to design extended shift registers that are functionally equivalent but not structurally equivalent to shift registers. Using the extended shift registers, we present a new secure and testable scan design approach that aims to satisfy both testability and security of digital circuits. The approach is only to replace the original scan registers to modified scan registers called extended scan registers. This method requires very little area overhead and no performance overhead. New concepts of scan security and scan testability are also introduced. |
Slides |
Title | Correlating System Test Fmax with Structural Test Fmax and Process Monitoring Measurements |
Author | *Chia-Ying (Janine) Chen (University of California, Santa Barbara, U.S.A.), Jing Zeng (Advanced Micro Devices, Inc, U.S.A.), Li-C. Wang (University of California, Santa Barbara, U.S.A.), Michael Mateja (Advanced Micro Devices, Inc, U.S.A.) |
Page | pp. 419 - 424 |
Keyword | correlation analysis, data learning approach, system test, structural test |
Abstract | System test has been the standard measurement to evaluate performance variability of high-performance microprocessors. The question of whether or not many of the lower-cost alternative tests can be used to reduce system test has been studied for many years. This paper utilizes a data-learning approach for correlating three test datasets, structural test, ring oscillator test, and scan flush test, with system test. With the data-learning approach, higher correlation can be found without altering test measurements or test conditions. Rather, the approach utilizes new optimization algorithms to extract more useful information in the three test datasets, with particular success using the structural test data. To further minimize test cost, process monitoring measurements (ring oscillator and scan flush tests) are used to reduce the need for high-frequency structural test. We demonstrate our methodology on a recent high-performance microprocessor design. |
Slides |
Title | Guided Gate-level ATPG for Sequential Circuits using a High-level Test Generation Approach |
Author | *Bijan Alizadeh, Masahiro Fujita (The University of Tokyo, Japan) |
Page | pp. 425 - 430 |
Keyword | High-level Test Generation, Functional Verification, ATPG, HED |
Abstract | This paper proposes a non-scan gate-level Automatic Test Pattern Generation (ATPG) methodology which keeps the regularity in the arithmetic operations while reasoning about these operations for generating high-level test patterns from only faulty behavior of the design. Then by considering generated high-level test patterns as constraints and passing them to a SMT-solver we are able to automatically and efficiently generate gate-level test patterns. Experimental results show robustness and reliability of our method compared to other contemporary methods in terms of the fault coverage and CPU time. I |
Slides |
Title | Optimizing Power and Performance for Reliable On-Chip Networks |
Author | Aditya Yanamandra, Soumya Eachempati, Niranjan Soundararajan, *Vijaykrishnan Narayanan, Mary Jane Irwin, Ramakrishnan Krishnan (The Pennsylvania State University, U.S.A.) |
Page | pp. 431 - 436 |
Keyword | Network-on-chip, Error Correction Code, Architectural Vulnerability Factor, residual error rate, throttling |
Abstract | We propose novel techniques to minimize the power and performance penalties in protecting the NoC against soft errors, while giving desired reliability guarantees. Some applications have inherent error tolerance which can be exploited to save power, by turning off the error correction mechanisms for a fraction of the total time without trading off reliability. To further increase the power savings, we bound the vulnerability of a router by throttling the traffic into the router. In order to minimize the throughput loss due to throttling, we propose dividing the die into domains and using multiple vulnerability bounds across these domains. We explore both static and dynamic selection of vulnerability bounds. We find that for applications with an error tolerance of 10% of the raw error rate, the dynamic multiple vulnerability bound scheme can save up to 44% of power expended for error correction at a marginal network throughput loss of 3%. |
Slides |
Title | A Low Latency Wormhole Router for Asynchronous On-chip Networks |
Author | *Wei Song, Doug Edwards (the University of Manchester, U.K.) |
Page | pp. 437 - 443 |
Keyword | NoC, Asynchronous, wormhole |
Abstract | Asynchronous on-chip networks are power effcient and tolerant to process variation but they are slower than synchronous on-chip networks. A low latency asynchronous wormhole router is proposed using sliced sub-channels and the lookahead pipeline. Channel slicing removes the C-element tree in the completion detection circuit and convert a channel into multiple independent sub-channels reducing the cycle period. The lookahead pipeline uses the early evaluation protocol to reduce cycle period. Using the lookahead pipeline on the pipeline stages with the maximal cycle period improves the overall throughput. The router is implemented by a 0.13 um technology. The cycle period of the router at the typical corner is 1.7 ns, providing 2.35GByte/sec throughput per port. |
Slides |
Title | Combined Use of Rising and Falling Edge Triggered Clocks for Peak Current Reduction in IP-Based SoC Designs |
Author | Tsung-Yi Wu (National Changhua University of Education, Taiwan), How-Rern Lin (Providence University, Taiwan), Tzi-Wei Kao, *Shi-Yi Huang, Tai-Lun Li (National Changhua University of Education, Taiwan) |
Page | pp. 444 - 449 |
Keyword | IP-based SoC Design, IR Drop, Clock Scheme, Handshaking Protocol, Virtual Component Interface |
Abstract | In a typical synchronous SoC design, a huge peak current often occurs near the time of an active clock edge because of aggregate switching of a large number of transistors. A huge peak current causes circuit designers to increase the number of power and ground pads for preventing voltage drop problem. The number of aggregate switching gates can be shortened if the SoC design can use a clock scheme of mixed positive and negative triggering edges rather than one of pure positive (negative) triggering edges. In this paper, we propose a clock-triggering-edge assignment technique and algorithms that can assign either a rising triggering edge or a falling triggering edge to each clock of each IP core or block of a given IP-based SoC design. The goal of the algorithms is to reduce the peak current of the design. Experimental results show that our algorithms can reduce the peak current up to 56.3%. Our technique also can be applied to a level sensitive design. |
Slides |
Title | Workload Capacity Considering NBTI Degradation in Multi-core Systems |
Author | Jin Sun, Roman Lysecky, Karthik Shankar (The University of Arizona, U.S.A.), Avinash Kodi (Ohio University, U.S.A.), Ahmed Louri, *Janet M. Wang (The University of Arizona, U.S.A.) |
Page | pp. 450 - 455 |
Keyword | Multi-core Systems, Negative Bias Temperature Instability, Mean-Time-To-Failure, Workload Balancing |
Abstract | As device feature sizes continue to shrink, long-term reliability such as Negative Bias Temperature Instability (NBTI) leads to low yields and short mean-time-to-failure (MTTF) in multi-core systems. This paper proposes a new workload balancing scheme based on device level fractional NBTI model to balance the workload among active cores while relaxing stressed ones. The proposed method employs the Capacity Rate (CR) provided by the NBTI model, applies Dynamic Zoning (DZ) algorithm to group cores into zones to process task flows, and then uses Dynamic Task Scheduling (DTS) to allocate tasks in each zone with balanced workload and minimum communication cost. Experimental results on 64-core system show that by allowing a small part of the cores to relax over a short time period (10 seconds), the proposed methodology improves multi-core system yield (percentage of core failures) by 20%, while extending MTTF by 30% with insignificant degradation in performance (less than 3%). |
Slides |
Title | (Invited Paper) Overview of ITRI's Parallel Architecture Core (PAC) DSP Project: from VLIW DSP Processor to Android-ready Multicore Computing Platform |
Author | *An-Yeu (Andy) Wu (STC/ITRI, Taiwan) |
Keyword | VLIW, DSP, parallel architecture, SoC, multimedia |
Abstract | The Industrial Technology Research Institute (ITRI) PAC (Parallel Architecture Core) project was initiated in 2003. The target is to develop a low-power and high-performance programmable platform for multimedia applications. In the first PAC project phase (2004~2006), a 5-way VLIW DSP (PACDSP) processor has been developed with our patented distributed & ping-pong register file and variable-length VLIW encoding techniques. Recently, a tri-core PACDSP-based SoC, PAC-Duo (ARM9 + two DSP Cores), has also been designed and fabricated in TSMC 90nm technology to demonstrate the multicore-based outstanding performance and energy efficiency for multimedia processing such as real-time H.264 codec. In addition, to link with Web-based services, the Google Android software stack and OpenCore-based multimedia library are successfully implemented and verified in PAC-Duo SoC. To assist with architectural exploration of next-generation PAC-Duo SoC (PAC-Duo+), Electronic System Level (ESL) analysis with power information is also conducted. The future direction of ITRI multicore project planning will also be addressed in this presentation. |
Title | (Invited Paper) Design and Verification Methods of Toshiba's Wireless LAN Baseband SoC |
Author | *Masanori Kuwahara (Toshiba, Japan) |
Page | pp. 457 - 463 |
Keyword | wireless LAN, MAC, PHY, low power, verification |
Abstract | This paper presents design and verification methods of Toshiba's wireless LAN (WLAN) baseband SoCs. An FPGA-based high-speed and reliable verification environment for physical layer (PHY), a new SDL-based hardware design method for media access control layer (MAC), and an ultra low power design resulting in power consumption of 22 uW in the deep-sleep mode are described. |
Slides |
Title | (Invited Paper) Programmable Platform for Multimedia SoC |
Author | *Bor-Sung Liang (Sunplus Core Technology, Taiwan) |
Keyword | multimedia platform, SoC |
Abstract | Nowadays SoC development suffers from high R&D cost. High system complexity and mask costs make R&D expense worse dramatically. Moreover, current multimedia SoCs need to support lots of video/audio CODEC formats. In order to solve the problems, SoC with programmability can provide flexibility to retarget various applications to share R&D cost, and easier to meet time-to-market and time-in-market requirements. In this talk we will share some experiences on programmable platform for multimedia platform. |
Title | (Invited Paper) SOC for Car Navigation Systems with a 55.3 GOPS Image Recognition Engine |
Author | *Hiroyuki Hamasaki, Yasuhiko Hoshi, Atsushi Nakamura, Akihiro Yamamoto (Renesas Technology, Japan), Hideaki Kido, Shoji Muramatsu (Hitachi Ltd, Japan) |
Page | pp. 464 - 465 |
Keyword | SoC, image recognition, car navigation systems |
Abstract | This paper introduces the System on a Chip (SOC) equipped with dual RISC processors, an image recognition engine operating with up to 55.3 GOPS, multiple accelerators and peripherals for car navigation systems. The SoC has high performance with respect to image recognition applications which are installed in advanced vehicles as well as navigation function such as graphics operating at the same time. Furthermore we have developed the SoC in order to meet automotive specifications including cost and size. We report practical application which is for the pedestrian detection to demonstrate our SoC capability. We accelerate the application with combination of the RISC processor and image recognition engine. |
Slides |
Title | A Dual-MST Approach for Clock Network Synthesis |
Author | *Jingwei Lu, Wing-Kai Chow, Chiu-Wing Sham (The Hong Kong Polytechnic University, Hong Kong), Fung-Yu Young (The Chinese University of Hong Kong, Hong Kong) |
Page | pp. 467 - 473 |
Keyword | CNS, CLR |
Abstract | In nanometer-scale VLSI physical design, clock network becomes a major concern on determining total performance of digital circuit. Clock skew and PVT (Process, Voltage and Temperature) variation contribute a lot to its behavior, and these are the critical issues for clock network synthesis (CNS). Previous works on CNS include delay balancing, geometric matching and link insertion techniques. However, traditional methods mainly focused on skew and wirelength minimization, it may lead to negative influence towards process variation factors. In this paper, a novel clock network synthesizer is proposed and several algorithms are introduced for performance improvement. A dual-MST (DMST) geometric matching approach is proposed for topology construction. It can help balancing the tree structure to reduce the variation effect. A recursive buffer insertion technique and a blockage handling method are also presented, they are developed for proper distribution of buffers and saving of capacitance. Experimental results show that our matching approach is better than the traditional methods, and in particular our synthesizer has better performance compared to the results of the winner in the ISPD 2009 contest. |
Slides |
Title | Buffered Clock Tree Sizing for Skew Minimization Under Power and Thermal Budgets |
Author | Krit Athikulwongse, *Xin Zhao, Sung Kyu Lim (Georgia Institute of Technology, U.S.A.) |
Page | pp. 474 - 479 |
Keyword | clock tree synthesis, low power, thermal budget |
Abstract | In this paper, we study the clock tree sizing problem for thermal-aware skew minimization under power and thermal budgets. Clock wire/buffer sizing affects not only the delay/skew, but also the power dissipation of the clock tree. This effect in turn triggers changes in thermal distribution, making re-computation of the delay/skew necessary. Thus, the interaction among skew, power, and temperature is highly complicated if tied with clock wire/buffer sizing. In order to efficiently combat the time-varying nature of underlying thermal profile, we focus on two kinds of skew, depending on the number of thermal profiles given: skew value and skew range. The former refers to the skew value computed under a single steady-state thermal profile, while the latter refers to the skew range computed based on multiple thermal profiles. Our thermal-aware sequential-linear-programming approach maintains near-zero skew value and narrow skew range while keeping the power dissipation and temperature under the given budgets. |
Slides |
Title | Critical-PMOS-Aware Clock Tree Design Methodology for Anti-Aging Zero Skew Clock Gating |
Author | Shih-Hsu Huang, Chia-Ming Chang, *Wen-Pin Tu, Song-Bin Pan (Chung Yuan Christian University, Taiwan) |
Page | pp. 480 - 485 |
Keyword | Clock Tree Synthesis, Design for Reliability, Delay Degradation, Clock Skew, Gated Clock Design |
Abstract | Due to clock gating, the PMOS transistors in the clock tree often have different active probabilities, which lead to different NBTI delay degradations. To ensure that the clock skew is always zero, there is a demand to eliminate the degradation difference. In this paper, we present a critical-PMOS-aware clock tree design methodology to deal with this problem. First, we prove that, under the same tree topology, the NAND-type-matching clock tree has the minimum number of critical PMOS transistors. Then, we propose a 0-1 ILP (integer linear programming) approach to minimize the power consumption overhead while eliminating the degradation difference. Benchmark data consistently show that our design methodology can achieve very good results in terms of both the clock skew (due to the degradation difference) and the power consumption overhead. |
Slides |
Title | Clock Tree Embedding for 3D ICs |
Author | *Tak-Yung Kim, Taewhan Kim (Seoul National University, Republic of Korea) |
Page | pp. 486 - 491 |
Keyword | 3D ICs, clock tree, TSV, routing, optimization |
Abstract | This paper addresses a fundamental problem of zero skew clock tree embedding problem in 3D ICs. We propose an algorithm, called ZCTE-3D, for solving the zero skew clock tree embedding problem in 3D ICs for a given tree topology. The primary objective is to minimize the cost of TSVs together with finding embedding layers and the secondary objective is to minimize the cost of wirelength. We show that ZCTE-3D solves the problem optimally in polynomial time under the linear delay model, while it solves the problem suboptimally under the Elmore delay model. |
Slides |
Title | Improved Weight Assignment for Logic Switching Activity During At-Speed Test Pattern Generation |
Author | *Meng-Fan Wu, Hsin-Chieh Pan, Teng-Han Wang, Jiun-Lang Huang (National Taiwan University, Taiwan), Kun-Han Tsai, Wu-Tung Cheng (Mentor Graphics Corporation, U.S.A.) |
Page | pp. 493 - 498 |
Keyword | at-speed testing, IR-drop, weighted switching activity |
Abstract | For two-pattern at-speed scan testing, the excessive power supply noise at the launch cycle may cause the circuit under test to malfunction, leading to yield loss. This paper proposes a new weight assignment scheme for logic switching activity; it enhances the IR-drop assessment capability of the existing weighted switching activity (WSA) model. By including the power grid network structure information, the proposed weight assignment better reflects the regional IR-drop impact of each switching event. For ATPG, such comprehensive information is crucial in determining whether a switching event burdens the IR-drop effect. Simulation results show that, compared with previous weight assignment schemes, the estimated regional IR-drop profiles better correlate with those generated by commercial tools. |
Slides |
Title | Graph Partition based Path Selection for Testing of Small Delay Defects |
Author | Zijian He, *Tao Lv, Huawei Li, Xiaowei Li (Institute of Computing Technology, CAS, China) |
Page | pp. 499 - 504 |
Keyword | SDD, delay testing, graph partition, Monte Carlo |
Abstract | Critical path selection plays an important role in testing of small delay defects (SDDs). For some timing-balanced circuits, the numbers of candidate critical paths may be very large, and this will make Monte Carlo simulation based statistical timing analysis very inefficient. A fast path selection approach based on graph partition is proposed in this paper. First, a critical path graph (CPG) is generated to implicitly enumerate almost all candidate critical paths, and then the CPG is partitioned into several sub graphs which contain limited numbers of paths using two graph partition approaches. After that, Monte Carlo simulation is applied on each sub graph for path selection. At last, according to the partition topology of the CPG and path sets selected from each sub graph, a path set for the original CPG is generated using Union and Cartesian product operations for testing SDDs. Experimental results show that for circuits containing large numbers of candidate critical paths, the proposed path selection approach can reduce the CPU time significantly and maintain a high probability of capturing delay failures compared to path selection methods based on general Monte Carlo simulation. |
Slides |
Title | Functional and Partially-Functional Skewed-Load Tests |
Author | Irith Pomeranz (Purdue University, U.S.A.), *Sudhakar M. Reddy (University of Iowa, U.S.A.) |
Page | pp. 505 - 510 |
Keyword | broadside tests, skewed-load tests, transition faults |
Abstract | Functional broadside tests were defined to address overtesting that may occur with unrestricted scan-based tests. However, the fault coverage achievable by functional broadside tests is lower than the fault coverage achievable by unrestricted scan-based tests. It was observed that skewed-load tests can improve the fault coverage achievable by unrestricted broadside tests. Motivated by these observations, we define functional (and partially-functional) skewed-load tests to improve the fault coverage of functional broadside tests while attempting to curb overtesting. We present experimental results to demonstrate the ability of functional skewed-load tests to improve the fault coverage without exceeding the maximum switching activity of functional broadside tests (which is one indication of potential overtesting). |
Slides |
Title | Emulating and Diagnosing IR-Drop by Using Dynamic SDF |
Author | Ke Peng (University of Connecticut, U.S.A.), Yu Huang, Ruifeng Guo, *Wu-Tung Cheng (Mentor Graphics, U.S.A.), Mohammad Tehranipoor (University of Connecticut, U.S.A.) |
Page | pp. 511 - 516 |
Keyword | IR-drop defect, diagnosis, dynamic SDF |
Abstract | The SDF (Standard Delay Format) information is very important in timing-aware simulation of VLSI designs. However, conventionally, SDF is only design dependent, but pattern independent, which is called static SDF in this paper. Static SDF ignores all dynamic pattern dependent parameters, such as IR-drop and crosstalk etc. In this paper, we propose a novel pattern dependent SDF, which is called dynamic SDF and we apply this technique to take IR-drop effects into consideration. With the proposed IR-drop-aware SDF files, we can improve the accuracy of simulation. We also do diagnosis on the failed patterns with the IR-drop-aware SDF files and pin point the pattern-dependent IR-drop defects in the design. Experimental results demonstrate the efficiency of this method. |
Slides |
Title | Application-Specific 3D Network-on-Chip Design Using Simulated Allocation |
Author | Pingqiang Zhou (University of Minnesota, U.S.A.), Ping-Hung Yuh (National Taiwan University, Taiwan), *Sachin S. Sapatnekar (University of Minnesota, U.S.A.) |
Page | pp. 517 - 522 |
Keyword | 3D, Networks on chip (NoC), floorplanning, topology synthesis, application-specific NoCs |
Abstract | Three-dimensional (3D) silicon integration technologies have provided new opportunities for Network-on-Chip (NoC) architecture design in Systems-on-Chip (SoCs). In this paper, we consider the application-specific NoC architecture design problem in a 3D environment. We present an efficient floorplan-aware 3D NoC synthesis algorithm, based on simulated allocation, a stochastic method for traffic flow routing, and accurate power and delay models for NoC components. We demonstrate that this method finds greatly improved topologies for various design objectives such as NoC power (average savings of 34%), network latency (average reduction of 35%) and chip temperature (average reduction of 20%). |
Slides |
Title | A3MAP: Architecture-Aware Analytic Mapping for Networks-on-Chip |
Author | Wooyoung Jang, *David Z. Pan (University of Texas at Austin, U.S.A.) |
Page | pp. 523 - 528 |
Keyword | Networks-on-chip, Multiprocessor System-on-Chip, Task mapping, Homogeneous/heterogeneous core |
Abstract | In this paper, we propose a novel and global A3MAP (Architecture-Aware Analytic Mapping) algorithm applied to NoC (Networks-on-Chip) based MPSoC (Multi-Processor System-on-Chip) not only with homogeneous cores on regular mesh architecture as done by most previous mapping algorithms but also with heterogeneous cores on irregular mesh or custom architecture. As a main contribution, we develop a simple yet efficient interconnection matrix that models any task graph and network. Then, task mapping problem is exactly formulated to an MIQP (Mixed Integer Quadratic Programming). Since MIQP is NP-hard [14], we propose two effective heuristics, a successive relaxation algorithm and a genetic algorithm. Experimental results show that A3MAP by the successive relaxation algorithm reduces an amount of traffic up to 5.7%, 16.1% and 7.3% on average in regular mesh, irregular mesh and custom network, respectively, compared to the previous state-of-the-art work [1]. A3MAP by the genetic algorithm reduces more traffic up to 8.8%, 29.4% and 16.1 % on average than [1] in regular mesh, irregular mesh and custom network, respectively even if its runtime is longer. |
Slides |
Title | Efficient Throughput-Guarantees for Latency-Sensitive Networks-On-Chip |
Author | *Jonas Diemer, Rolf Ernst (Institute of Computer and Network Engineering, TU Braunschweig, Germany), Michael Kauschke (Intel, Germany) |
Page | pp. 529 - 534 |
Keyword | Network-on-Chip, Quality-of-Service, MPSoC, Best-Effort, Low-Latency |
Abstract | Networks-on-chip (NoC) for future multi- and many-core processor platforms face an increasing diversity of traffic requirements, ranging from streaming traffic with real-time requirements to bursty best-effort. The best-effort traffic usually results from applications running on general-purpose processors with caches and is very sensitive to latency. Hence, the NoC must provide guaranteed services to some traffic streams, while maintaining low latency and high throughput of best-effort traffic. In this paper, we propose a run-time configurable NoC that enables bandwidth guarantees with minimum impact on latency for best-effort traffic. This is achieved by prioritization and distributed traffic shaping of best-effort traffic. The analysis and evaluation of our quality-of-service scheme show that it can provide tight bandwidth guarantees for streaming traffic. At the same time, the average latencies of best-effort traffic improved by up to 47% compared to a standard prioritization scheme. |
Slides |
Title | Floorplanning and Topology Generation for Application-Specific Network-on-Chip |
Author | *Bei Yu, Sheqin Dong (Tsinghua University, China), Song Chen, Satoshi Goto (Waseda University, Japan) |
Page | pp. 535 - 540 |
Keyword | Networks-on-Chip, partition driven floorplanning, switches insertion, network interfaces insertion, path allocation |
Abstract | Network-on-Chip(NoC) architectures have been proposed as a promising alternative to classical bus-based communication architectures. In this paper, we propose a two phases framework to solve application-specific NoCs topology generation problem. At floorplanning phase, we carry out partition driven floorplanning. At post-floorplanning phase, a heuristic method and a min-cost max-flow algorithm is used to insert switches and network interfaces. Finally, we allocate paths to minimize power consumption. The experimental results show our algorithm is effective for power saving. |
Slides |
Title | (Invited Paper) (Tutorial) Is 3D Integration an Opportunity or Just a Hype? |
Author | *Jin-Fu Li (National Central University/Industrial Technology Research Institute, Taiwan), Cheng-Wen Wu (Tsing-Hua University, Taiwan) |
Page | pp. 541 - 543 |
Keyword | 3D Integration, through silicon via, test, design, VLSI |
Abstract | Three-dimensional (3D) integration using through silicon via (TSV) is an emerging technology for integrated circuit designs. 3D integration technology provides numerous opportunities to designers looking for more cost-effective system chip solutions. In addition to stacking homogeneous memory dies, 3D integration technology supports heterogeneous integration of memories, logic, sensors, etc. It eases the interconnect performance limitation, provides higher functionality, results in small form factor, etc. On the other hand, there are challenges that should be overcome before volume production of TSV-based 3D ICs becomes possible, e.g., technological challenges, yield and test challenges, thermal and power challenges, infrastructure challenges, etc. |
Slides |
Title | (Panel Discussion) (Panel) Is 3D Integration an Opportunity or Just a Hype? |
Author | Organizers & Moderators: Cheng-Wen Wu (National Tsing Hua University/Industrial Technology Research Institute, Taiwan), Jin-Fu Li (National Central University/Industrial Technology Research Institute, Taiwan), Panelists: Albert Li (GUC, Taiwan), Erik Jan Marinissen (IMEC, Belgium), Ding-Ming Kwai (Industrial Technology Research Institute, Taiwan), Kyu-Myung Choi (Samsung, Republic of Korea), Makoto Takahashi (Toshiba, Japan) |
Page | pp. 544 - 547 |
Keyword | 3D Integration, through silicon via, test, design |
Abstract | Three-dimensional (3D) integration using through silicon via (TSV) is an emerging technology for integrated circuit designs. 3D integration technology provides numerous opportunities to designers looking for more cost-effective system chip solutions. In addition to stacking homogeneous memory dies, 3D integration technology supports heterogeneous integration of memories, logic, sensors, etc. It eases the interconnect performance limitation, provides higher functionality, results in small form factor, etc. On the other hand, there are challenges that should be overcome before volume production of TSV-based 3D ICs becomes possible, e.g., technological challenges, yield and test challenges, thermal and power challenges, infrastructure challenges, etc. |
Thursday, January 21, 2010 |
Title | Configurable Multi-product Floorplanning |
Author | Qiang Ma, *Martin D. F. Wong (University of Illinois at Urbana-Champaign, U.S.A.), Kai-Yuan Chao (Intel Corporation, U.S.A.) |
Page | pp. 549 - 554 |
Keyword | Floorplanning, Multi-product |
Abstract | The reuse of existing design at different function levels can provide shorter time-to-market thus becomes an industrial trend. However, the SoC design floorplan with IP reuse usually targets for one single product; and, high design re-convergence efforts for other different products are still required. Therefore, the problem of designing a floorplan that simultaneously optimizes multiple products, or Multiproduct Floorplanning, is introduced. To the best of our knowledge, this is the first work in literature that addresses this newly emerged problem. The effectiveness of our approach is validated by promising results on several data sets derived from industrial test cases. |
Slides |
Title | UFO: Unified Convex Optimization Algorithms for Fixed-Outline Floorplanning |
Author | *Jai-Ming Lin, Hsi Hung (National Cheng Kung University, Taiwan) |
Page | pp. 555 - 560 |
Keyword | floorplan, convex, fixed-outline |
Abstract | In this paper, we apply two convex optimization methods, named UFO, for fixed-outline floorplanning. Our approach consists of two stages which are a global distribution stage and a local legalization stage. In the first stage, we first model modules as circles and use a pull-push model to distribute modules over a fixed outline under the wirelength consideration. Because good results can be obtained after the first stage, we do not need to consider wirelegnth and devote to legalize modules in the second stage. To keep the good results of the first stage, we also propose a procedure to extract the geometrical relations of modules from a layout and record them by constraint graphs in the second stage. Then, a quadratic function as well as non-overlap and boundary constraints are used to determine the locations and shapes of modules. We have implemented the two convex functions on Matlab, and experimental results have demonstrated that UFO clearly outperforms the results reported in the literature on the GSRC benchmark. |
Slides |
Title | Fixed-outline Thermal-aware 3D Floorplanning |
Author | *Linfu Xiao (The Chinese University of Hong Kong, Hong Kong), Subarna Sinha (Synopsys, U.S.A.), Jingyu Xu (Synopsys, China), Evangeline F.Y. Young (The Chinese University of Hong Kong, Hong Kong) |
Page | pp. 561 - 567 |
Keyword | 3D IC, floorplan, thermal, fixed outline |
Abstract | In this paper, we present a novel algorithm for 3D floorplanning with fixed outline constraints and a particular emphasis on thermal awareness. A computationally efficient thermal model that can be used to guide the thermal-aware floorplanning algorithm to reduce the peak temperature is proposed. We also present a novel white space redistribution algorithm to dissipate hotspot. Thermal through-silicon via (TSV) insertion is performed during the floorplanning process as a means to control the peak temperature. Experimental results are very promising and demonstrate that the proposed floorplanning algorithm has a high success rate at meeting the fixed-outline constraints while effectively limiting the rise in peak temperature. |
Slides |
Title | A Hierarchical Bin-Based Legalizer for Standard-Cell Designs with Minimal Disturbance |
Author | Yu-Min Lee, *Tsung-You Wu, Po-Yi Chiang (National Chiao Tung University, Taiwan) |
Page | pp. 568 - 573 |
Keyword | legalization, hierarchical method, bin-based technique |
Abstract | In this work, a hierarchical bin-based legalization approach, HiBinLegalizer, is developed to legalize standard cells with minimal movement. First, a chip is divided into several bins with equal size. Then, starting with the most crowed unlegalized bin, a merging procedure for bins is used to integrate bins into a cross-shape region or a square-shape region until the cell density in that region is less than a specific cell-density-threshold. After that, an efficient legalization method which simultaneously preserves the cell orders in each row and minimizes the weighted sum of movement distances is developed to legalize cells in that region to limit the movable scope. To improve the legalization quality, HiBinLegalizer refreshes the positions of legalized cells during legalization. The legalizing procedure is repeated until all cells are non-overlapped. Compared with the state-of-the-art method, Abacus, HiBinLegalizer can reduce the total movement of cells to be 48% in average and save the largest movement of cells to be 140% in average. Moreover, HiBinLegalizer can reduce the HPWL by 47% and obtain average 1.11 times runtime speed up. |
Slides |
Title | An Analytical Dynamic Scaling of Supply Voltage and Body Bias Exploiting Memory Stall Time Variation |
Author | *Jungsoo Kim, Younghoon Lee (KAIST, Republic of Korea), Sungjoo Yoo (POSTECH, Republic of Korea), Chong-Min Kyung (KAIST, Republic of Korea) |
Page | pp. 575 - 580 |
Keyword | DVFS, energy optimization, runtime distribution, memory stall |
Abstract | Success of workload prediction, which is critical in achieving low energy consumption via dynamic voltage and frequency scaling (DVFS), depends on the accuracy of modeling the major sources of workload variation. Among them, memory stall time, whose variation is significant especially in case of memory-bound applications, has been mostly neglected or handled in too simplistic assumptions in previous works. In this paper, we present an analytical DVFS method which takes into account variations in both computation and memory stall cycles. The proposed method reduces leakage power consumption as well as switching power consumption through combined Vdd/Vbb scaling. Experimental results on MPEG4 and H.264 decoder have shown that, compared to previous methods, our method achieves up to additional 30.0% and 15.8% energy reductions, respectively. |
Slides |
Title | Bounded Potential Slack: Enabling Time Budgeting for Dual-Vt Allocation of Hierarchical Design |
Author | *Jun Seomun, Seungwhun Paik, Youngsoo Shin (KAIST, Republic of Korea) |
Page | pp. 581 - 586 |
Keyword | Hierarchical design, slack, budgeting, dual-Vt, leakage |
Abstract | Time budgeting, which assigns timing assertion at block boundary, is a crucial step in hierarchical design. The proportion of high- and low-Vt gates of each block, which determines overall leakage power consumption, is dictated by timing assertion, yet dual-Vt allocation is not taken into account during conventional time budgeting. Bounded potential slack is introduced as a measure of dual-Vt allocation, and is experimentally shown to be strongly correlated with the percentage of high-Vt gates. A new time budgeting is proposed with objective of achieving bounded potential slack, which is formulated as a linear programming problem. In experiments with example hierarchical designs implemented in 45-nm commercial technology, the proposed time budgeting reduced leakage power by 32% on average compared to conventional time budgeting, when both are followed by the same dual-Vt allocation. The time budgeting is also applied to voltage island design, where each block can have its own Vdd with mix of high- and low-Vt gates. |
Slides |
Title | Dynamic Power Estimation for Deep Submicron Circuits with Process Variation |
Author | Quang Dinh, *Deming Chen, Martin Wong (UIUC, U.S.A.) |
Page | pp. 587 - 592 |
Keyword | Dynamic Power, Process Variation, Statistical Design, Power Estimation |
Abstract | Dynamic power consumption in CMOS circuits is usually estimated based on the number of signal transitions. However, when considering glitches, this is not accurate because narrow glitches consume less power than wide glitches. Glitch width and transition density modeling is further complicated by the effect of process variation. This paper presents a fast and accurate dynamic power estimation method that considers the detailed effect of process variation. First, we extend the probabilistic modeling approach to handle timing variations. Then the power consumption of a logic gate is computed based on the transition waveforms of its inputs. Both mean values and standard deviations of the dynamic power are estimated with high confidence based on accurate device characterization data. Compared with SPICE-based Monte Carlo simulations for small circuits, our power estimator reports power results within 3% error for the mean and 5% error for the standard deviation with six orders of magnitude speedup. For medium and large benchmarks, it is impossible to run Monte Carlo simulations with enough samples due to very long runtime, while our estimator can finish within minutes. |
Slides |
Title | Runtime Temperature-Based Power Estimation for Optimizing Throughput of Thermal-Constrained Multi-Core Processors |
Author | Dongkeun Oh, Nam Sung Kim, Yu Hen Hu (University of Wisconsin, U.S.A.), *Charlie Chung Ping Chen (National Taiwan University, Taiwan), Azadeh Davoodi (University of Wisconsin, U.S.A.) |
Page | pp. 593 - 599 |
Keyword | thermal sensor, optimization |
Abstract | Technology scaling has allowed integration of multiple cores into a single die. However, high power consumption of each core leads to very high heat density, limiting the throughput of thermal-constrained multi-core processors. To maximize the throughput, various software-based dynamic thermal management and optimization techniques have been proposed, many of which depend on accurate temperature sensing of each core. However, the decision for dynamic thermal management and throughput optimization only based on the temperature of each core can result in less optimal throughput in certain circumstances according to our investigation. In this paper, we propose 1) a dynamic power estimation method using a single thermal sensor for each core in multi-core processors, 2) a die temperature reconstruction method using the estimated power, and 3) a throughput optimization method based the estimated power instead of the temperature. According to our experiment using 90nm technology, the proposed method results in less than 3% error in estimating power and hot-spot temperature of a multi-core processor. Furthermore, the proposed throughput optimization method based on the estimated power leads to up to 4% higher throughput than a temperature-based optimization method. |
Title | Managing Verification Error Traces with Bounded Model Debugging |
Author | *Sean Safarpour (Vennsa Technologies, Canada), Andreas Veneris, Farid Najm (University of Toronto, Canada) |
Page | pp. 601 - 606 |
Keyword | debugging, verification, simulation, formal |
Abstract | Managing long verification error traces is one of the key challenges of automated debugging engines. Today, debuggers rely on the iterative logic array to model sequential behavior which drastically limits their application. This work presents Bounded Model Debugging, an iterative, systematic and practical methodology to allow debuggers to tackle larger problems than previously possible. Based on the empirical observation that errors are excited in temporal proximity of the observed failures, we present a framework that improves performance by up to two orders of magnitude and solve 2.7$\times$ more problems than a conventional debugger. |
Slides |
Title | Automatic Assertion Extraction via Sequential Data Mining of Simulation Traces |
Author | *Po-Hsien Chang, Li-C. Wang (University of California, Santa Barbara, U.S.A.) |
Page | pp. 607 - 612 |
Keyword | Verification, Simulation traces, Data mining |
Abstract | This paper studies the problem of automatic assertion extraction at the input boundary of a given unit embedded in a system. This paper proposes a data mining approach that analyzes simulation traces to extract the assertions. We borrow two key concepts from the sequential data mining and develop an effective assertion extraction approach specific to our problem. These two concepts are (1) the slide-window-based episode definition that decides the space of all potential assertions and (2) the Support-Confidence framework that evaluates the meaningfulness of potential assertions using a given simulation trace. We implement the approach in a system simulation environment built on the AMBA 2.0 standard. Experimental results demonstrate the feasibility of the proposed approachand validity of extracted assertions are verified by comparing to the transactions defined in the specification. |
Title | Automatic Constraint Generation for Guided Random Simulation |
Author | *Hu-Hsi Yeh, Chung-Yang (Ric) Huang (National Taiwan University, Taiwan) |
Page | pp. 613 - 618 |
Keyword | constraint, simulation, verification |
Abstract | In this paper, we proposed an Automatic Target Constraint Generation (ATCG) technique to automatically generate compact and high-quality constraints for the guided random simulation environment. Our objective is to tackle the biggest bottleneck of the entire constrained random simulation process ─ the time-consuming and error-prone manual testbench composition process. By taking only the design under verification and simulation coverage as our inputs, our automatic constraint generation technique can successfully generate just a few key constraints while achieving very high simulation coverage. Our experimental results show that the proposed approach can outperform both directed and random simulations in both coverage and simulation runtime for a variety of designs |
Title | A Method for Debugging of Pipelined Processors in Formal Verification by Correspondence Checking |
Author | *Miroslav Velev, Ping Gao (Aries Design Automation, U.S.A.) |
Page | pp. 619 - 624 |
Keyword | formal verification, debugging, pipelined processors, correspondence checking, SAT |
Abstract | Presented is a method for debugging of pipelined processors in their formal verification with the highly automatic and scalable approach of Correspondence Checking, where a pipelined/superscalar/VLIW implementation is compared against a non-pipelined specification via an inductive correctness criterion based on symbolic simulation in a way that guarantees the correctness of the implementation for all possible execution scenarios. The benefit from the proposed method increases with the complexity of the processor under formal verification. For a 12-stage VLIW processor that imitates the Intel Itanium in many features, the method reduced the size of the EUFM correctness formulas from buggy processors by up to an order of magnitude, the number of Boolean variables in the equivalent propositional correctness formulas and the number of 1s in the counterexample traces by up to 2 orders of magnitude, and resulted in an average speedup in detecting the bugs of 2 orders of magnitude, thus increasing the productivity of the processor designers. |
Title | (Invited Paper) Resilient Design in Scaled CMOS for Energy Efficiency |
Author | James Tschanz, Keith Bowman, Muhammad Khellah, Chris Wilkerson, Bibiche Geuskens, Dinesh Somasekhar, Arijit Raychowdhury, Jaydeep Kulkarni, Carlos Tokunaga, Shih-Lien Lu, Tanay Karnik, *Vivek K. De (Intel Corporation, U.S.A.) |
Page | p. 625 |
Keyword | Resiliency, Variations |
Abstract | Opportunites for resiliency to improve energy efficiency of processor designs in scaled CMOS technologies are discussed. |
Slides |
Title | (Invited Paper) Benefits and Barriers to Probabilistic Design |
Author | *Siva Narendra (Tyfone, Inc./Portland State University, U.S.A.) |
Page | pp. 626 - 627 |
Keyword | CMOS variation and leakage, Probabilistic Design, Adaptive design, Stochastic design |
Abstract | The undisputed increase in IOFF and large variations in IOFF – combined with ION/IOFF ratio approaching unity – leads to transistors that become increasing unreliable and unpredictable switches. A more comprehensive motivation is made on why therefore there are compelling benefits to design based on probabilistic methods. However, given the maturity of our industry, it is more likely that this will be realized in evolutionary steps toward such an ultimate change in our design and thought process. While this is a barrier for revolutionary innovation, it is likely the best possible mode for widespread commercial success. A potential three phase evolutionary roadmap towards that ultimate true probabilistic design is presented as well. |
Title | (Invited Paper) A Probabilistic Boolean Logic for Energy Efficient Circuit and System Design |
Author | Lakshmi N. B. Chakrapani (Rice University, U.S.A.), *Krishna Palem (Rice University/Nanyang Technological University, U.S.A.) |
Page | pp. 628 - 635 |
Abstract | We introduce probabilistic design, a methodology to design circuits using gates with probabilistic behavior. Probabilistic design is of great value, since the international technology roadmap for semiconductors (ITRS) forecasts that devices and interconnect are likely to suffer from frequent transient and permanent failures, as a consequence of technology scaling. We first provide the theoretical basis for probabilistic design, rooted in a novel Probabilistic Boolean Logic (PBL). By combining the properties of PBL with the properties of noise susceptible CMOS devices, we derive design principles and demonstrate that probabilistic design is a viable methodology to design circuits using gates with probabilistic behavior, which has been shown to be a useful approach for implementing ultra low-energy circuit designs. |
Title | (Panel Discussion) Dependable Silicon Design with Unreliable Components |
Author | Organizer & Moderator: Vincent Mooney (Georgia Institute of Technology/Nanyang Technological University, U.S.A.), Panelists: Vivek K. De (Intel Corporation, U.S.A.), Siva Narendra (Tyfone, Inc., U.S.A.), Krishna Palem (Rice University/Nanyang Technological University, U.S.A.) |
Title | A New Graph-theoretic, Multi-objective Layout Decomposition Framework for Double Patterning Lithography |
Author | *Jae-Seok Yang (University of Texas at Austin, U.S.A.), Katrina Lu (Intel, U.S.A.), MinSik Cho (IBM Research, U.S.A.), Kun Yuan, David Z. Pan (University of Texas at Austin, U.S.A.) |
Page | pp. 637 - 644 |
Keyword | Double patterning lithography, Decomposition, Overlay, Balanced density, min-cut partitioning |
Abstract | As Double Patterning Lithography(DPL) becomes the leading candidate for sub-30nm lithography process, we need a fast and lithography friendly decomposition framework. In this paper, we propose a multi-objective min-cut based decomposition framework for stitch minimization, balanced density, and overlay compensation, simultaneously. The key challenge of DPL is to accomplish high quality decomposition for large-scale layouts under reasonable runtime with the following objectives: a) the number of stitches is minimized, b) the balance between two decomposed layers is maximized for further enhanced patterning, c) the impact of overlay on coupling capacitance is reduced for less timing variation. We use a graph theoretic algorithm for minimum stitch insertion and balanced density. An additional decomposition constraints for self-overlay compensation are obtained by integer linear programming(ILP). With the constraints, global decomposition is executed by our modified FM graph partitioning algorithm. Experimental results show that the proposed framework is highly scalable and fast: we can decompose all 15 benchmark circuits in five minutes in a density balanced fashion, while an ILP-based approach can finish only the smallest five circuits. In addition, we can remove more than 95% of the timing variation induced by overlay for tested structures. |
Slides |
Title | A Robust Pixel-Based RET Optimization Algorithm Independent of Initial Conditions |
Author | *Jinyu Zhang (Institute of Microelectronics, Tsinghua University, China), Min-Chun Tsai (Brion Technology, U.S.A.), Wei Xiong, Yan Wang, Zhiping Yu (Institute of Microelectronics, Tsinghua University, China) |
Page | pp. 645 - 650 |
Keyword | lithography, ILT, optimization, Initial condition |
Abstract | A robust pixel-based optimization algorithm is proposed for mask synthesis of inverse lithography technology (ILT) to improve the resolution and pattern fidelity in optical lithography. Result shows that the final image fidelity is almost independent of the initial condition. To demonstrate the robustness of the algorithm, six typical desired mask patterns and two mask technologies are applied in mask synthesis optimization using 100 randomly generated initial conditions. The critical dimension (CD) is 60nm and the partial-coherence image system is applied. It is found that the final edge placement error (EPE) and iteration number are quite weakly dependent on the initial conditions. Good final image fidelity can be acquired using arbitrary initial conditions. This algorithm is about several orders of magnitude faster and more effective than other gradient-based algorithm and simulated annealing algorithm. |
Title | A New Method to Improve Accuracy of Parasitics Extraction Considering Sub-wavelength Lithography Effects |
Author | *Kuen-Yu Tsai, Wei-Jhih Hsieh, Yuan-Ching Lu, Bo-Sen Chang, Sheng-Wei Chien, Yi-Chang Lu (National Taiwan University, Taiwan) |
Page | pp. 651 - 656 |
Keyword | Lithography, proximity effect, resolution enhancement technique, parasitics extraction, layout parameter extraction |
Abstract | Modern nanometer integrated circuits are patterned by sub-wavelength lithography with significant shape deviation from drawn layouts. Full-chip parasitics extraction faces new challenges since shape distortions such as line end rounding and corner rounding cannot be accurately characterized by existing layout parameter extraction (LPE) techniques which assume perfect polygons. A new LPE method and efficient shape ap-proximation algorithms are proposed to account for the shape distortions. Preliminary results verified by field solver simulations indicate that accuracy of parasitics extraction can be significantly improved. |
Slides |
Title | Dead Via Minimization by Simultaneous Routing and Redundant Via Insertion |
Author | *Chih-Ta Lin, Yen-Hung Lin, Guan-Chan Su, Yih-Lang Li (National Chiao Tung University, Taiwan) |
Page | pp. 657 - 662 |
Keyword | redundant via, track assignment, detailed routing |
Abstract | While via failure significantly contributes to yield loss during manufacturing, post-routing redundant via insertion method is the conventional means of reducing the via failure rate, but only alive vias can be protected. As existing dead vias still lower manufacturing yield, identifying a routing result with fewer dead vias can increase the redundant via insertion rate, subsequently enhancing the yield of chips. This work presents, for the first time, a redundant-via-aware routing system to retain redundant via resources in track assignment, in which redundant vias are inserted in detailed routing. The proposed via prediction scheme performs trial route using L-shaped patterns to estimate via positions. Meanwhile, the proposed redundant-via-aware detailed router gradually relaxes the limitation on the number of generated dead vias during path searching to minimize the number of dead vias. Experimental results indicate that the proposed redundant-via-aware routing system is, to our knowledge, the first routing system that can achieve 100% redundant via insertion rate with all MCNC benchmark circuits. |
Title | Statistical Timing Verification for Transparently Latched Circuits through Structural Graph Traversal |
Author | *Xingliang Yuan, Jia Wang (Illinois Institute of Technology, U.S.A.) |
Page | pp. 663 - 668 |
Keyword | SSTA, latch, polynomial algorithm |
Abstract | Level-sensitive transparent latches are widely used in high-performance sequential circuit designs. Under process variations, the timing of a transparently latched circuit will adapt random delays at runtime due to time borrowing. The central problem to determine the timing yield is to compute the probability of the presence of a positive cycle in the latest latch timing graph. Existing algorithms are either optimistic since cycles are omitted or require iterations that cannot be polynomially bounded. In this paper, we present the first algorithm to compute such probability based on block-based statistical timing analysis that, first, covers all cycles through a structural graph traversal, and second, terminates within a polynomial number of statistical ``sum'' and ``max'' operations. Experimental results confirm that the proposed approach is effective and efficient. |
Slides |
Title | A Unified Multi-Corner Multi-Mode Static Timing Analysis Engine |
Author | Jing Jia Nian, *Shih Heng Tsai, Chung Yang (Ric) Huang (GIEE, National Taiwan University, Taiwan) |
Page | pp. 669 - 674 |
Keyword | timing analysis, process variation, corner-based |
Abstract | In this paper, we proposed a unified Multi-Corner Multi-Mode (MCMM) static timing analysis (STA) engine that can efficiently compute the worst-case delay of the process corners in various very large scaled circuits. Our key contributions include: (1) a seamless integration of the path- and parameter-based branch-and-bound algorithms so that the engine is very robust for different kinds of circuits, (2) an improved search space pruning technique, (3) a simple yet efficient critical path delay bound for the initial search space pruning. Our experimental results show that our engine can significantly outperform the prior MCMM STA approaches in various benchmark circuits with different number of process parameters. |
Slides |
Title | Statistical Time Borrowing for Pulsed-Latch Circuit Designs |
Author | *Seungwhun Paik, Lee-eun Yu, Youngsoo Shin (KAIST, Republic of Korea) |
Page | pp. 675 - 680 |
Keyword | pulsed-latch, time borrowing, statistical approach |
Abstract | Pulsed-latch inherits the advantage of latch in less sequencing overhead while taking the advantage of flip-flop in its convenience during timing analysis. Even though this advantage comes from the fact that pulsed-latch uses a short pulse, it is still capable of a small amount of time borrowing. A problem of allocating pulse width (out of a few predefined widths), where each width is modeled by a random variable, is formulated for minimizing the clock period of pulsed-latch circuits; this is equivalent to assigning a random variable that represents the amount of time borrowed by the combinational block between each latch pairs. A statistical approach is important in this problem because assuming +3ó of all pulse widths does not represent the worst case. An allocation algorithm called SPWA as well as an algorithm to compute timing yield is proposed. In experiments with 45-nm technology, compared to the case of no time borrowing, the clock period was reduced by 12.2% and 11.7% on average when the yield constraint Yc is 0.85 and 0.95, respectively; this is compared to the deterministic counterpart called DPWA, which reduced the clock period by 7.6% and 7.3%. More importantly, DPWA failed to satisfy the yield constraints in four (out of eleven) circuits while the yield constraints were always satisfied in SPWA. |
Title | Design Time Body Bias Selection for Parametric Yield Improvement |
Author | *Cheng Zhuo, Yung-Hsu Chang, Dennis Sylvester, David Blaauw (Univ. of Michigan, Ann Arbor, U.S.A.) |
Page | pp. 681 - 688 |
Keyword | boay bias, power, delay, optimization, design time |
Abstract | Circuits designed in aggressively scaled technologies face both stringent power constraints and increased process variability. Achieving high parametric yield is a key design objective, but is complicated by the correlation between power and performance. This paper proposes a novel design time body bias selection framework for parametric yield optimization while reducing testing costs. The framework considers both inter- and intra-die variations as well as power-performance correlations. This approach uses a feature extraction technique to explore the underlying similarity between the gates for effective clustering. Once the gates are clustered, a Gaussian quadrature based model is applied for fast yield analysis and optimization. This work also introduces an incremental method for statistical power computation to further reduce the optimization complexity. The proposed framework improves parametric yield from 39% to 80% on average for 11 benchmark circuits while runtime is linear with circuit size and on the order of minutes for designs with up to 15K gates. |
Title | Minimizing Leakage Power in Aging-Bounded High-level Synthesis with Design Time Multi-Vth Assignment |
Author | *Yibo Chen (Penn State University, U.S.A.), Yu Wang (Tsinghua University, China), Yuan Xie (Penn State University, U.S.A.), Andres Takach (Mentor Graphics Corporation, U.S.A.) |
Page | pp. 689 - 694 |
Keyword | high-level synthesis, aging, leakage power |
Abstract | Aging effects (such as Negative Bias Temperature Instability (NBTI)) can cause the temporal degradation of threshold voltage of transistors, and have become major reliability concerns for deep-sub-micron (DSM) designs. Meanwhile, leakage power dissipation becomes dominant in total power as technology scales. While multi-threshold voltage assignment has been shown as an effective way to reduce leakage, the NBTI-degradation rates vary with different initial threshold voltage assignment, and therefore motivates the co-optimizations of leakage reduction and NBTI mitigation. This paper minimizes leakage power during high-level synthesis of circuits with bounded delay degradation (thus guaranteed lifetime), using multi-Vth resource libraries. We first propose a fast evaluation approach for NBTI-induced degradation of architectural function units, and multi-Vth resource libraries are built with degradation characterized for each function unit. We then propose an aging-bounded high-level synthesis framework, within which the degraded delays are used to guide the synthesis, and leakage power is optimized through the proposed aging-aware resource rebinding algorithm. Experimental results show that, the proposed techniques can effectively reduce the leakage power with an extra 26% leakage reduction, compared to traditional aging-unaware multi-Vth assignment approach. |
Title | A Global Interconnect Reduction Technique during High Level Synthesis |
Author | Taemin Kim (Department of Computer Science, University of California, Los Angeles, U.S.A.), *Xun Liu (North Carolina State University, U.S.A.) |
Page | pp. 695 - 700 |
Keyword | High-level synthesis, Global Interconnect, Binding algorithm |
Abstract | In this paper, we propose an interconnect binding algorithm during high-level synthesis for global interconnect reduction. Our scheme is based on the observation that not all functional units (FUs) operate at all the time. When idle, FUs can be reconfigured as pass-through logic for data transfer, reducing interconnect requirement. Our algorithm formulates the interconnect reduction problem as a modified min-cost max-flow problem. It not only reduces the overall length of global interconnects but also minimizes the power overhead without introducing any timing violations. Experimental results show that, for a suite of digital processing benchmark circuits, our algorithm reduces global interconnects by 8.5% on the average in comparison to previously proposed schemes. It further lowers the overall design power by 4.8%. |
Slides |
Title | Incremental High-Level Synthesis |
Author | Luciano Lavagno (Cadence Design Systems, U.S.A.), Mototsugu Fujii (Renesas Technology Corp., Japan), Alex Kondratyev (Cadence Design Systems, U.S.A.), Noriyasu Nakayama (Fujitsu Advanced Technologies, Japan), Mitsuru Tatesawa (Renesas Technology Corp., Japan), Yosinori Watanabe (Cadence Design Systems, U.S.A.), *Qiang Zhu (Cadence Design Systems, Japan) |
Page | pp. 701 - 706 |
Keyword | ECO, high-level synthesis, incremental synthesis |
Abstract | The widespread acceptance of High-level synthesis as a mainstream tool mostly depends on its tight integration with the following RTL-to-GDSII design flow. A key aspect is the handling of so-called Engineering Change Orders (ECOs), i.e. minor changes required to fix small functional bugs or meet performance requirements late in the design cycle. Traditional high-level synthesis has attempted to optimize at best the output logic. However, in the ECO scenario the goal is to implement the required change with as few modifications as possible to the RTL, logic netlist, placed netlist and layout. In this paper we show how, by judiciously changing the internal databases used by the tool to match as much as possible the original design, one can achieve minimal impact and implement ECOs in truly incremental mode, while full-blow re-synthesis would lead to massive unnecessary downstream changes. The tool essentially matches source constructs between the original and the ECO design, and copies as many synthesis decisions as possible from the original design to the ECO design. |
Title | A High-Level Synthesis Flow for Custom Instruction Set Extensions for Application-Specific Processors |
Author | Nagaraju Pothineni (Google, India, India), *Philip Brisk, Paolo Ienne (EPFL, Switzerland), Anshul Kumar, Kolin Paul (Indian Institute of Technology, Delhi, India) |
Page | pp. 707 - 712 |
Keyword | Synthesis, Instruction set extension |
Abstract | Custom instruction set extensions (ISEs) are added to an extensible base processor to provide application-specific functionality at a low cost. As only one ISE executes at a time, resources can be shared. This paper presents a new high-level synthesis flow targeting ISEs. We emphasize new technique for resource allocation, binding, and port assignment during synthesis. Our method is derived from prior work on datapath merging, and increases area reduction by accounting for the cost of multiplexors that must be inserted into the resulting datapath to achieve multi-operational functionality. |
Slides |
Title | (Invited Paper) Computer-aided Recoding for Multi-core Systems |
Author | *Rainer Doemer (University of California, Irvine, U.S.A.) |
Page | pp. 713 - 716 |
Keyword | Embedded Systems, System Design, Modeling, Recoding |
Abstract | The design of embedded computing systems faces a serious productivity gap due to the increasing complexity of their hardware and software components. One solution to address this problem is the modeling at higher levels of abstraction. However, manually writing proper executable system models is challenging, error-prone, and very time-consuming. We aim to automate critical coding tasks in the creation of system models. This paper outlines a novel modeling technique called computer-aided recoding which automates the process of writing abstract models of embedded systems by use of advanced computer-aided design (CAD) techniques. Using an interactive, designer-controlled approach with automated source code transformations, our computer-aided recoding technique derives an executable parallel system model directly from available sequential reference code. Specifically, we describe three sets of source code transformations that create structural hierarchy, expose potential parallelism, and create explicit communication and synchronization. As a result, system modeling is significantly streamlined. Our experimental results demonstrate the shortened design time and higher productivity. |
Slides |
Title | (Invited Paper) TLM Automation for Multi-core Design |
Author | *Samar Abdi (Concordia University, Canada) |
Page | pp. 717 - 724 |
Keyword | transaction level modeling, multi-core design, embedded systems |
Abstract | Transaction Level Models (TLMs) are being increasingly used by multi-core system designers for design validation and embedded SW development. However, with well defined modeling semantics and TLM automation tools, it is also possible to use TLMs for multi-core design. This paper presents recent research in automatic generation of timed TLMs for early, yet reliable, evaluation of multi-core design decisions. The TLMs are automatically generated from a given mapping of a concurrent application to a multi-core platform. The application code is annotated with delays at the basic-block level of granularity. Similarly, the platform services, such as communication and scheduling, also include timing delays. The TLM automation methods have been implemented in the Embedded System Environment (ESE) toolset. Our experimental results with ESE demonstrate that multi-core TLMs can be generated in the order of seconds; they simulate close to host-compiled application execution speed, and are more than 90% accurate compared to board measurements on average for industrial size examples. Therefore, TLM automation enables early and reliable evaluation of multi-core design decisions. |
Slides |
Title | (Invited Paper) Platform Modeling for Exploration and Synthesis |
Author | *Andreas Gerstlauer (University of Texas, Austin, U.S.A.), Gunar Schirner (Northeastern University, Boston, U.S.A.) |
Page | pp. 725 - 731 |
Keyword | System-level design, Modeling, TLM, ESL |
Abstract | Ever increasing complexity and heterogeneity of system platforms drive the need for a move to higher levels of abstraction accompanied by corresponding design automation tools. The basis for any automated flow are well-defined design models. In this paper, we present an overview and taxonomy of platform modeling at various levels. Experiments demonstrate the benefits of fast yet accurate intermediate models at varying levels for rapid, early design space exploration. Furthermore, paired with automatic model generation and hardware/software synthesis, an automated path from specification to implementation becomes possible |
Slides |
Title | (Invited Paper) Application of ESL Synthesis on GSM Edge Algorithm for Base Station |
Author | *Alan P. Su (Global Unichip, Taiwan) |
Page | pp. 732 - 737 |
Keyword | ESL, ESL Synthesis, hardware-software codesign, hardware-software partitioning, Genetic Algorithm |
Abstract | Electronic System Level (ESL) design methodology has been widely adopted in SoC designing, especially for designs with multiple cores. High level synthesis is now becoming a standard tool in the ESL design flow. People use the term ESL Synthesis to suggest the solution for multicore system synthesis. In this paper we argue that ESL Synthesis is architecture synthesis, high level synthesis and software synthesis combined. A multicore architecture synthesis algorithm had been implemented and proven in an experimental industry use. We successfully synthesized the target application, a GSM Edge algorithm for base station, into single and multicore systems. With this experience we developed the theory how high level synthesis and software synthesis should work with architecture synthesis to perform the task of ESL synthesis. Possible future research directions inspired by this work are also proposed. Key contributions of this work are (1) a user-defined cost function mechanism, (2) a warranted convergence mechanism and (3) combine above two mechanisms to waive the need for a universal cost function. |
Slides |
Title | Analyzing Electrical Effects of RTA-driven Local Anneal Temperature Variation |
Author | *Vivek Joshi (University of Michigan, U.S.A.), Kanak Agarwal (IBM Austin Research Lab, U.S.A.), Dennis Sylvester, David Blaauw (University of Michigan, U.S.A.) |
Page | pp. 739 - 744 |
Keyword | RTA, thermal analysis, Performance, Leakage |
Abstract | Suppresing device leakage while maximizing drive current is the prime focus of semiconductor industry. Rapid Thermal Annealing (RTA) drives process development on this front by enabling fabrication steps such as shallow juction formation that require a low thermal budget. However, the decrease in junction anneal time for more aggresive device scaling has reduced the characteristic thermal length to dimensions less than the typical die size. Also, the amount of heat transferred, and hence the local anneal temperature, is affected by the layout pattern dependence of optical properties in a region. This variation in local anneal temperature causes a variation in performance and leakage across the chip by affecting the threshold voltage (Vth) and extrinsic transistor resistance (Rext). In this work, we propose a new local anneal temperature variation aware analysis framework which incorporates the effect of RTA induced temperature variation into timing and leakage analysis. We solve for chip level anneal temperature distribution, and employ TCAD based device level models for drive current (Ion) and leakage current (Ioff) dependence on anneal temperature variation, to capture the variation in device performance and leakage based on its position in the layout. Experimental results based on a 45nm experimental test chip shows anneal temperature variations of up to 10.5oC, which results in ~6.8% variation in device performance and 2.45X variation in device leakage across the chips. The corresponding variation in inverter delay was found to be ~7.3%. The temperature variation for a 65nm test chip was found to be ~8.65oC. |
Slides |
Title | Physical Design Techniques for Optimizing RTA-induced Variations |
Author | Yaoguang Wei (University of Minnesota, U.S.A.), Jiang Hu (Texas A&M University, U.S.A.), Frank Liu (IBM Austin Research Lab, U.S.A.), *Sachin Sapatnekar (University of Minnesota, U.S.A.) |
Page | pp. 745 - 750 |
Keyword | rapid thermal annealing, dummy fill, floorplanning |
Abstract | At 65nm and below, Rapid Thermal Annealing (RTA) makes a significant contribution to manufacturing process variations, degrading the parametric yield. RTA-induced variability strongly depends on circuit layout patterns, particularly the distribution of the density of the Shallow Trench Isolation (STI) regions. In this work, we investigate a two-step approach to reduce the impact of RTA-induced variations. We first solve a floorplanning problem that aims to reduce the RTA variations by evening out the STI density distribution. Next, we insert dummy polysilicon fills to further improve the uniformity of the STI density. Experimental results show that our floorplanner can reduce the global RTA variations by 39% and the local variations by 29% on average with low overhead compared to a traditional floorplanner, and the proposed dummy fill algorithm can further reduce the RTA variations to negligible amounts. Moreover, when inserting dummy fills, for the layouts obtained by our floorplanner, on average, 24% fewer dummy polysilicon fills are inserted, as compared to the results from a traditional floorplanner. |
Slides |
Title | On Confidence in Characterization and Application of Variation Models |
Author | Lerong Cheng, Puneet Gupta, *Lei He (UCLA, U.S.A.) |
Page | pp. 751 - 756 |
Keyword | variation, confidence interval |
Abstract | In this paper we study statistics of statistics. Due to limited number of samples (especially in the case of lot-to-lot variation), calibrated models have low degree of confidence. The problem is further exacerbated when production volumes are low (< 65 lots) causing additional loss of confidence in the statistical analysis (since production only sees a small snapshot of the entire distribution). We mathematically derive the confidence intervals for commonly used statistical measures (mean, variance, percentile corner) and analysis (SPICE corner extraction, statistical timing). Our estimates are within 2% of simulated confidence values. Our experiments (with variability assumptions derived from test silicon data from a 45nm industrial process) indicate that for moderate characterization volumes (10 lots) and low-to-medium production volumes (15 lots), a significant guardband (e.g., 34.7% of standard deviation for single parameter corner, 38.7% of standard deviation for SPICE corner, and 52% of standard deviation for 95%-tile point of circuit delay) is needed to ensure 95% confidence in the results. The guardbands are non-negligible for all cases when either production or characterization volume is not large. The proposed methods require are not runtime-intensive (always within 10s) as they require Monte-Carlo simulations on closed form expressions. |
Title | Incremental Solution of Power Grids using Random Walks |
Author | *Baktash Boghrati, Sachin S. Sapatnekar (University of Minnesota, U.S.A.) |
Page | pp. 757 - 762 |
Keyword | power grid, random walk, linear equation solver, incremental analysis |
Abstract | It is common for a designer to make multiple small changes to a power grid, corresponding to "what if" scenarios, in an attempt to improve its performance. To evaluate the effects of this incremental change, the circuit may go through incremental analysis. This paper presents a computationally efficient and accurate method for fast and accurate incremental analysis, using random walks to identify a region of influence (RoI) of a change, so that this RoI can then be analyzed by any other solver. Our experimental results demonstrate the accuracy and computational efficiency of this method. |
Slides |
Title | Efficient Power Grid Integrity Analysis Using On-the-Fly Error Check and Reduction |
Author | Duo Li, *Sheldon Tan, Ning Mi (University of California, Riverside, U.S.A.), Yici Cai (Tsinghua University, China) |
Page | pp. 763 - 768 |
Keyword | power grid analysis, model order reduction, truncated balanced realization, IR drop analysis |
Abstract | In this paper, we present a new voltage IR drop analysis approach for large on-chip power delivery networks. The new approach is based on recently proposed sampling based reduction technique to reduce the circuit matrices before the simulation. Due to the disruptive nature of tap current waveforms in typical industry power grid networks, input current sources typically has wide frequency power spectrum. To avoid the excessively sampling, the new approach introduces an error check mechanism and on-the-fly error reduction scheme during the simulation of the reduced circuits to improve the accuracy of estimating the the large IR drops. The proposed method presents a new way to combine model order reduction and simulation to achieve the overall efficiency of simulation. The new method can also easily trade errors for speed for different applications. Experimental results show the proposed IR drop analysis method can significantly reduce the errors of the existing ETBR method at the similar computing cost, while it can have 10X and more speedup over the the commercial power grid simulator in UltraSim with about 1-2% errors on a number of real industry benchmark circuits. |
Title | PS-FPG: Pattern Selection based co-design of Floorplan and Power/Ground Network with Wiring Resource Optimization |
Author | Li Li (WuHan University of Technology, China), *Yuchun Ma (Tsinghua Univ., China), Ning Xu (WuHan University of Technology, China), Yu Wang, Xianlong Hong (Tsinghua Univ., China) |
Page | pp. 769 - 774 |
Keyword | P/G network, Floorplanning, Wiring resource |
Abstract | As technology advances, the voltage (IR) drop in the Power/Ground (P/G) network becomes a serious problem in modern IC design. The P/G network co-design with floorplan can improve the power design quality. Different with traditional approaches which analyze P/G network during the floorplanning iterations, in this paper, an efficient pattern selection method is used to provide gradient information for fast signal-integrity estimation. We also propose a novel P/G aware incremental algorithm which can intelligently fix the violations during the floorplanning process. The P/G pin assignment and wire sizing method are adopted during the floorplanning process so that the power routing resource can be minimized with the constraints of IR drop and electron migration (EM) considered. Experimental results based on the MCNC benchmarks show that our design not only significantly speeds up the optimization process, but also optimizes the power routing resource while the quality of the floorplanning is maintained. |
Title | Gate Delay Estimation in STA under Dynamic Power Supply Noise |
Author | *Takaaki Okumura, Fumihiro Minami, Kenji Shimazaki, Kimihiko Kuwada (Semiconductor Technology Academic Research Center, Japan), Masanori Hashimoto (Osaka University, Japan) |
Page | pp. 775 - 780 |
Keyword | power, noise, timing |
Abstract | This paper presents a gate delay estimation method that takes into account dynamic power supply noise. We review STA based on static IR-drop analysis and a conventional method for dynamic noise waveform, and reveal their limitations and problems that originate from circuit structures and higher delay sensitivity to voltage in advanced technologies. We then propose a gate delay computation that overcomes the problems with iterative computations and consideration of input voltage drop. Evaluation results with various circuits and noise injection timings show that the proposed method estimates path delay fluctuation well within 2% error on average. |
Slides |
Title | Parametric Yield Driven Resource Binding in Behavioral Synthesis with Multi-Vth/Vdd Library |
Author | *Yibo Chen (Penn State University, U.S.A.), Yu Wang (Tsinghua University, China), Yuan Xie (Penn State University, U.S.A.), Andres Takach (Mentor Graphics Corporation, U.S.A.) |
Page | pp. 781 - 786 |
Keyword | high-level synthesis, parametric yield |
Abstract | The ever-increasing chip power dissipation in SoCs has imposed great challenges on today's circuit design. It has been shown that multiple threshold and supply voltages assignment (multi-Vth/Vdd) is an effective way to reduce power dissipation. However, most of the prior multi-Vth/Vdd optimizations are performed under deterministic conditions. With the increasing process variability that has significant impact on both the power dissipation and performance of circuit designs, it is necessary to employ statistical approaches in analysis and optimizations for low power. This paper studies the impact of process variations on the multi-Vth/Vdd technique at the behavioral synthesis level. A multi-Vth/Vdd resource library is characterized for delay and power variations at different voltage combinations. A parametric yield-driven resource binding algorithm is then proposed, which uses the characterized power and delay distributions and efficiently maximizes power yield under a timing yield constraint. During the resource binding process, voltage level converters are inserted between resources when required. Experimental results show that significant power reduction can be achieved with the proposed variation-aware framework, compared with traditional worst-case based deterministic approaches. |
Slides |
Title | Optimizing Blocks in an SoC Using Symbolic Code-Statement Reachability Analysis |
Author | *Hong-Zu Chou (National Taiwan University, Taiwan), Kai-Hui Chang (Avery Design Systems, U.S.A.), Sy-Yen Kuo (National Taiwan University, Taiwan) |
Page | pp. 787 - 792 |
Keyword | RTL symbolic simulation, Reachability, Synthesis, Optimization |
Abstract | Optimizing blocks in a System-on-Chip (SoC) circuit is becoming more and more important nowadays due to the use of third-party Intellectual Properties (IPs) and reused design blocks. In this paper, we propose techniques and methodologies that utilize abundant external don’t-cares that exist in an SoC environment for block optimization. Our symbolic code-statement reachability analysis can extract don’t-care conditions from constrained-random testbenches or other design blocks to identify unreachable conditional blocks in the design code. Those blocks can then be removed before logic synthesis is performed to produce smaller and more power-efficient final circuits. Our results show that we can optimize designs under different constraints and provide additional flexibility for SoC design flows. |
Slides |
Title | High Level Event Driven Thermal Estimation for Thermal Aware Task Allocation and Scheduling |
Author | *Jin Cui, Douglas L. Maskell (Nanyang Technological University, Singapore) |
Page | pp. 793 - 798 |
Keyword | Thermal-Aware Scheduling, Event Driven, High Level Estimation |
Abstract | Thermal aware scheduling(TAS) is an important system level optimization for CMP and MPSoC. An event driven thermal estimation method which can assist dynamic TAS is proposed in this paper. The event driven thermal estimation is based upon a thermal map which is updated only when a high level event occurs. To minimize the overhead, while maintaining the estimation accuracy, the prebuilt look-up-tables and the superposition principle are used to speed up the solution of the thermal RC network. Experimental results show our method is accurate, producing thermal estimations of similar quality to existing thermal simulators,while having a considerably reduced computational complexity. Our event driven thermal estimation technique is significantly better, in terms of accuracy, than existing TAS schedulers, making it highly suitable for integration into the OS kernel. |
Slides |
Title | Mapping and Scheduling of Parallel C Applications with Ant Colony Optimization onto Heterogeneous Reconfigurable MPSoCs |
Author | Fabrizio Ferrandi, *Christian Pilato, Donatella Sciuto, Antonino Tumeo (Politecnico di Milano, Italy) |
Page | pp. 799 - 804 |
Keyword | MPSoC, Ant Colony Optimization, scheduling, mapping |
Abstract | Efficient mapping and scheduling of partitioned applications are crucial to improve the performance on today reconfigurable multiprocessor systems-on-chip (MPSoCs) platforms. Most of existing heuristics adopt the Directed Acyclic (task) Graph as representation, that unfortunately, is not able to represent typical embedded applications (e.g., real-time and loop-partitioned). In this paper we propose a novel approach, based on Ant Colony Optimization, that explores different alternative designs to determine an efficient hardware-software partitioning, to decide the task allocation and to establish the execution order of the tasks, dealing with different design constraints imposed by a reconfigurable heterogeneous MPSoC. Moreover, it can be applied to any parallel C application, represented through Hierarchical Task Graphs. We show that our methodology, addressing a realistic target architecture, outperforms existing approaches on a representative set of embedded applications. |
Slides |
Title | (Invited Paper) -Possibility of ESL- A Software Centric System Design for Multicore SoC in the Upstream Phase |
Author | *Koichiro Yamashita (Fujitsu Laboratories Ltd., Japan) |
Page | pp. 805 - 808 |
Keyword | Multi-core system, Parallel software, SMP, Linux, Symbian, Unix and RealTime OS, Multimedia |
Abstract | The embedded systems for which both hardware and software are rapidly advancing and expanding, there is a growing need to be able to comprehensively and quantitatively estimate system performance at an early stage in the design process, especially multi-core based SoC. But it can be difficult to estimate system performance of actual target by employing only simple estimation methods. By using ESL technology, without implemented hardware, that enables high-precision assessment of system performance with software that runs on OS. By applying proposed assessment environment during upstream design of target system, it enables to research the characteristics of performance, bottle-neck and avoid the risk of the re-design. It will be the key-issue of ESL methodology. It might be tightly related with software architecture and it is different point of view from typical upstream design of hardware. |
Slides |
Title | (Invited Paper) Design of Complex Image Processing Systems in ESL |
Author | Benjamin Carrion Schafer (NEC Corporation, Japan), Ashish Trambadia (NECHCL ST, Japan), *Kazutoshi Wakabayashi (NEC Corporation, Japan) |
Page | pp. 809 - 814 |
Keyword | ESL, C, synthesis, image processing, design space exploration |
Abstract | This work presents the design of a complex image processing IP developed completely in C. We present the latest advanced in ESL-synthesis and demonstrate its main advantages over conventional RT-level flows. In particular we focus on the ability of behavioral synthesis to shorten the design cycle, perform functional verification and explore quickly the design space obtaining multiple dominating implementations with unique area vs. speed characteristics from an initial untimed behavioral description. A feature extraction process is presented in detailed showing how automatic design space exploration can lead to Pareto optimal (non-dominant) designs ranging from 524,648 gates to 584,868 gates and latencies of 38 to 69 state counts for the smallest and fastest design respectively taking approximately 6.3 hours. |
Title | (Invited Paper) PAC Duo System Power Estimation at ESL |
Author | *Wen-Tsan Hsieh, Jen-Chieh Yeh (Industrial Technology Research Institute, Taiwan), Shi-Yu Huang (TinnoTek Corp./National Tsing Hua Univ., Taiwan) |
Page | pp. 815 - 820 |
Keyword | ESL power estimation, ESL power analysis |
Abstract | In this work, we develop an electronic system-level (ESL) power estimation framework which uses the specified power model interface. Using the proposed power model interface we can easily integrate the various power models in ESL virtual platform. Designers can choose either the coarse-grained or fine-grained power models according to the trade-off between accuracy and computing cost. The experimental results show the proposed method can accurate estimate the system power trend immediately compared with traditional method. We also demonstrated the capability of system power and performance analysis in both hardware-view and software-view by using our approach at ESL. Meanwhile, it can be used for high level architecture exploration directly. |
Title | (Invited Paper) A Practice of ESL Verification Methodology from SystemC to FPGA -Using EPC Class-1 Generation-2 RFID Tag Design as An Example |
Author | *William Young (TSMC, Taiwan), Chua-Huang Huang (Feng Chia University, Taiwan), Alan P. Su (Global Unichip Corp., Taiwan), C. P. Jou, Fu-Lung Hsueh (TSMC, Taiwan) |
Page | pp. 821 - 824 |
Keyword | ESL, verification methodology, SystemC, FPGA |
Abstract | This paper presents the first published industrial practice (to the best of our knowledge) to reuse high-level/C++ system simulation model through OSCI TLM 2.0 Library to verify its corresponding RTL implementation in FPGA. ESL verification methodology is employed in the design regression of EPC C1Gen2 RFID tag. Around 200 times speedup is observed using ESL over conventional RTL simulation in regression runs (after logic bug fixes). This clearly shows ESL verification is a successful candidate to reuse high-level test harness for IC functional verification, especially in today's increasingly complex IC design world. On top of the successful use of the ESL functional verification flow on the design, we also show the infrastructure to use SystemC Verification Library (SCV) for formal verification. The functional and formal verification combined is thus the proposed ESL verification methodology. |
Slides |
Title | Slack Redistribution for Graceful Degradation Under Voltage Overscaling |
Author | Andrew B. Kahng, *Seokhyeong Kang (UC San Diego, U.S.A.), Rakesh Kumar, John Sartori (UIUC, U.S.A.) |
Page | pp. 825 - 831 |
Keyword | power optimization, voltage scaling, slack redistribution, reliability, cell swap |
Abstract | Modern digital IC designs have a critical operating point, or “wall of slack”, that limits voltage scaling. Even with an errortolerance mechanism, scaling voltage below a critical voltage - so-called overscaling - results in more timing errors than can be effectively detected or corrected. This limits the effectiveness of voltage scaling in trading off system reliability and power. We propose a designlevel approach to trading off reliability and voltage (power) in, e.g., microprocessor designs. We increase the range of voltage values at which the (timing) error rate is acceptable; we achieve this through techniques for power-aware slack redistribution that shift the timing slack of frequently-exercised, near-critical timing paths in a power- and area-efficient manner. The resulting designs heuristically minimize the voltage at which the maximum allowable error rate is encountered, thus minimizing power consumption for a prescribed maximum error rate and allowing the design to fail more gracefully. Compared with baseline designs, we achieve a maximum of 32.8% and an average of 12.5% power reduction at an error rate of 2%. The area overhead of our techniques, as evaluated through physical implementation (synthesis, placement and routing), is no more than 2.7%. |
Slides |
Title | A Decoder-Based Switch Box to Mitigate Soft Errors in SRAM-Based FPGAs |
Author | *Hassan Ebrahimi, Morteza Zamani, HamidReza Zarandi (Amirkabir, Iran) |
Page | pp. 832 - 837 |
Keyword | Reliability, Soft-error, SRAM-Based FPGAs, Single event upset (SEU), sitch box |
Abstract | This paper proposes a new switch box architecture in SRAM-based FPGAs to mitigate soft error effects. In this switch box architecture, the number of SRAM bits required for programming switch box is reduced to 67% without any impact on routing capability of the switch box. This architecture does not require any modification of the existing placement and routing algorithms. The architecture was evaluated based on several MCNC benchmarks using VPR tool. The experimental results show that this architecture decreases the susceptibility of switch boxes to SEUs about 20% on average compared to the traditional ones. |
Title | On Process-Aware 1-D Standard Cell Design |
Author | Hongbo Zhang, *Martin D. F. Wong (University of Illinois at Urbana-Champaign, U.S.A.), Kai-Yuan Chao (Intel Corporation, U.S.A.) |
Page | pp. 838 - 842 |
Keyword | Standard Cell Design, 1-D patterning, Gap Distribution |
Abstract | When VLSI technology scales down to sub-40nm process node, system process variation introduced by the lithography is a persistent challenge to the manufacturability. The limitation of the resolution enhancement technologies (RETs) forces people to adopt a regular cell design methodology. In this paper, targeted on 1-D cell design, we use simulation data to analyze the relationship between the line-end gap distribution and printability. Based on the gap distribution preferences, an optimal algorithm is provided to efficiently extend the line ends and insert dummies, which will significantly improve the gap distribution and help printability. Experimental results on 45nm and 32nm processes show that significant improvement can be obtained on edge placement error (EPE). |
Title | D-A Converter Based Variation Analysis for Analog Layout Design |
Author | *Bo Liu, Toru Fujimura, Bo Yang (University of Kitakyushu, Japan), Shigetoshi Nakatake (University of Kitakyushu, Japan) |
Page | pp. 843 - 848 |
Keyword | relative variation, ë-dependency variation, layout structure |
Abstract | For analog circuits, the current source is one of the most essential functions, and variation of its characteristic seriously influences to the accuracy of the performance. This paper presents a new methodology for analyzing the layout dependency of the variation of the current source transistor. We employ a current-driven D-A converter to investigate the dependency of the current source upon the relative accuracy and the lambda. We implemented the D/A converts with various layout structures into TEG(Test Element Group), and evaluated them. The analysis convinced us that the diffusion sharing and gate folding significantly influence to the variation of lambda and relative accuracy. |
Slides |
Title | Rule-Based Optimization of Reversible Circuits |
Author | *Mona Arabzadeh, Mehdi Saeedi, Morteza Saheb Zamani (Amirkabir University of Technology, Iran) |
Page | pp. 849 - 854 |
Keyword | Reversible circuits, Synthesis, Optimization |
Abstract | Reversible logic has applications in various research areas including low-power design and quantum computation. In this paper, a rule-based optimization approach for reversible circuits is proposed which uses both negative and positive control Toffoli gates during the optimization. To this end, a set of rules for removing NOT gates and optimizing sub-circuits with common-target gates are proposed. To evaluate the proposed approach, the best-reported synthesized circuits and the results of a recent synthesis algorithm which uses both negative and positive controls are used. Our experiments reveal the potential of the proposed approach in optimizing synthesized circuits. |
Slides |
Title | Variation Tolerant Logic Mapping for Crossbar Array Nano Architectures |
Author | Cihan Tunc (Northeastern University, U.S.A.), *Mehdi Tahoori (Northeastern University/Karlsruhe Institute of Technology, U.S.A.) |
Page | pp. 855 - 860 |
Keyword | nano crossbar, variation, defect tolerance, logic mapping |
Abstract | Bottom-up self-assembly nanofabrication process yields nanodevices with significantly more variations compared to the conventional top-down lithography used in CMOS fabrication. This is in addition to an increased defect density expected for self-assembled nanodevices. Therefore, it is one of the major design challenges to tolerate variation, in addition to defect tolerance, in emerging nano architectures. In this paper, we present a solution for variation tolerant logic mapping for FET based crossbar array nano architectures using Simulated Annealing. Furthermore, we extended the framework for defect tolerance. Experimental results including comparison with exact method confirm the effectiveness of the proposed approach. |
Slides |
Title | Generalised Threshold Gate Synthesis based on AND/OR/NOT Representation of Boolean Function |
Author | *Marek Arkadiusz Bawiec, Maciej Nikodem (Wrocław University of Technology, Poland) |
Page | pp. 861 - 866 |
Keyword | generalised threshold gate, negative differential resistance, boolean logic, synthesis |
Abstract | This paper focuses on generalized threshold gates (GTGs) that implement boolean logic functions using elements with negative differential resistance (NDR). GTGs are capable of implementing boolean functions, however, no effective synthesis algorithms have been proposed so far. We present that GTGs can be effectively implemented using unate functions. Our synthesis algorithm ensures that the circuit implementing n variable boolean function consists of at most n+2 NDR elements and can be further optimized by reducing the number of switching elements. |
Slides |
Title | Novel Dual-vth Independent-gate FinFET Circuits |
Author | Masoud Rostami, *Kartik Mohanram (Rice University, U.S.A.) |
Page | pp. 867 - 872 |
Keyword | FinFETs, dual-Vth, independent-gate, library |
Abstract | This paper describes gate work function and oxide thickness tuning to realize novel circuits using dual-Vth independent-gate FinFETs. Dual-Vth FinFETs with independent gates enable series and parallel merge transformations in logic gates, realizing compact low power alternatives. Furthermore, they also enable the design of a new class of compact logic gates with higher expressive power and flexibility than conventional forms, e.g., implementing 12 unique Boolean functions using only four transistors. The gates are designed and calibrated using the University of Florida double-gate model into a technology library. Synthesis results for 14 benchmark circuits from the ISCAS and OpenSPARC suites indicate that on average, the enhanced library reduces delay, power, and area by 9%, 21%, and 27%, respectively, over a conventional library designed using FinFETs in 32nm technology. |
Title | Hybrid Dynamic Energy and Thermal Management in Heterogeneous Embedded Multiprocessor SoCs |
Author | Shervin Sharifi, Ayse Kivilcim Coskun, *Tajana Simunic Rosing (University of California, San Diego, U.S.A.) |
Page | pp. 873 - 878 |
Keyword | Temperature, Thermal, Embedded Systems, Multiprocessor SoC, Heterogeneous |
Abstract | Heterogeneous multiprocessor system-on-chips (MPSoCs) which consist of cores with various power and performance characteristics can customize their configuration to achieve higher performance per Watt. On the other hand, inherent imbalance in power densities across MPSoCs leads to non-uniform temperature distributions, which affect performance and reliability adversely. In addition, managing temperature might result in conflicting decisions with achieving higher energy efficiency. In this work, we propose a joint thermal and energy management technique specifically designed for heterogeneous MPSoCs. Our technique identifies the performance demands of the current workload. By utilizing job scheduling and voltage/frequency scaling dynamically, we meet the desired performance while minimizing the energy consumption and the thermal imbalance. In comparison to performance-aware policies such as load balancing, our technique simultaneously reduces the thermal hot spots, temperature gradients, and energy consumption significantly. |
Slides |
Title | Energy Efficient Joint Scheduling and Multi-core Interconnect Design |
Author | Cathy Qun Xu (University of Texas at Dallas, U.S.A.), *Chun Jason Xue (City University of Hong Kong, China), Yi He, Edwin H.M. Sha (University of Texas at Dallas, U.S.A.) |
Page | pp. 879 - 884 |
Keyword | Scheduling, Interconnection network, Low power |
Abstract | Energy efficient and high performance interconnect is critical for multi-core architecture.Interconnect with power saving segmented buses satisfies the tight latency and high volumn data transfer needs of applications with large embeded pallelism. This paper analyzes the major energy consumption factors of interconnect with segmented buses from high level synthesis. It presents a computation and inter-core data transfer scheduling algorithm to minimize the interconnect energy consumption by addressing the analyzed factors while exploring an application's maximum parallelism. This paper jointly considers scheduling and interconnect design. It presents an application specific approach to determine the minimum number of segmented buses required and an optimal inter core data transfer schedule which can be used to configure the switches on the segmented buses to avoid bus contention and minimize interconnect energy consumption with a given application. Experimental results show that the proposed scheduling algorithm can reduce interconnect dynamic energy consumption about 71% and static energy consumption about 23% on average compared to the other communication cost conscious scheduling techniques for evaluated high parallelism DSP applications. |
Slides |
Title | Dynamic and Adaptive Allocation of Applications on MPSoC Platforms. |
Author | *Andreas Schranzhofer, Jian-Jia Chen (Swiss Federal Institute of Technology (ETH), Zürich, Switzerland), Luca Santinelli (Scuola Superiore Sant'Anna, Pisa, Italy), Lothar Thiele (Swiss Federal Institute of Technology (ETH), Zürich, Switzerland) |
Page | pp. 885 - 890 |
Keyword | MPSoC, multi-mode application, mapping, dynamic, adaptive |
Abstract | Multi-Processor Systems-on-Chip (MPSoC) are an increasingly important design paradigm not only for mobile embedded systems but also for industrial applications such as automotive and avionic systems. Such systems typically execute multiple concurrent applications, with different execution modes. Modes define differences in functionality and computational resource demands and are assigned with an execution probability. We propose a dynamic mapping approach to maintain low power consumption over the system lifetime. Mapping templates for different application modes and execution probabilities are computed offline and stored on the system. At runtime a manager monitors the system and chooses an appropriate pre-computed template. Experiments show that our approach outperforms global static mapping approaches up to 45%. |
Slides |
Title | Cool and Save: Cooling Aware Dynamic Workload Scheduling in Multi-socket CPU Systems |
Author | Raid Ayoub, *Tajana Rosing (University of California at San Diego, U.S.A.) |
Page | pp. 891 - 896 |
Keyword | Cooling, Workload scheduling, Multi-socket CPU |
Abstract | Traditionally CPU workload scheduling and fan control in multi-socket systems have been designed separately leading to less efficient solutions. In this paper we present Cool and Save, a cooling aware dynamic workload management strategy that is significantly more energy efficient than state-of-the art solutions in multi-socket CPU systems because it performs workload scheduling in tandem with controlling socket fan speeds. Our experimental results indicate that applying our scheme gives average fan energy savings of 73% concurrently with reducing the maximum fan speed by 53%, thus leading to lower vibrations and less noise levels. |
Slides |
Title | (Invited Paper) The Shrink Wrapped Myth: Cross Platform Software |
Author | *Mike Olivarez (Freescale Semiconductor, Inc., U.S.A.) |
Keyword | Software Reuse, Architecture, Compiler, RTOS, Open OS |
Abstract | "Shrink wrapped" software has been a goal from mainframes to embedded systems. Many issues keep this from being a reality based on architecture and performance requirements. This presentation focuses on the Architecture, Compiler, OS and Performance and how they effect the overall reuse for embedded systems. Freescale's MXC cellular architecture is used as an example on these points and getting a system to market. |
Slides |
Title | (Invited Paper) Using Software to Achieve Low Power Solutions |
Author | *Albert Shiue (Alvaview Technologies, Taiwan) |
Keyword | low power design, compiler optimization, video codec design, multimedia software for low power, MPEG, H.264, DSP,the lightest player, internet radio and internet TV |
Abstract | We would present how software achieves low power solution. This includes we use loop transformation techniques and compiler optimization for video codec design. In the architectural feature of DSPs that makes code generation difficult, namely the use of multiple data memory banks. This feature increases memory bandwidth by permitting multiple data memory accesses to occur in parallel when the referenced variables belong to different data memory banks and the registers involved conform to a strict set of conditions. We present algorithms that attempt to maximize the performance, minimize the energy, and therefore, maximize the benefit of this architectural feature. Experimental results demonstrate that our algorithms generate high performance, low energy code for the M56000 DSP. We also demo Alvaview products: iiplayer (the lightest player for embedded systems) and internet radio & TV. All the above consumes the least CPU utilization and the less memory usage for embedded systems such as MID, netbook, mobile phones, etc. This demonstrates our topic at using software to achieve low power solutions. |
Title | (Invited Paper) MPSoC Programming using the MAPS Compiler |
Author | Rainer Leupers, *Jeronimo Castrillon (RWTH Aachen University, Germany) |
Page | pp. 897 - 902 |
Keyword | MPSoC Programming, Compiler, Multi-application, KPN, Scheduling and Mapping |
Abstract | The problem of efficiently programming complex embedded heterogeneous Multi-Processor Systems-On- Chip (MPSoCs) continues to be one of the biggest hurdles in the IT community. Extracting parallelism from sequential applications, dealing with different programming models, and handling real time constraints in the presence of multiple concurrent applications are some of the challenges that make MPSoC programming so difficult. In this paper we describe the MAPS tool suite, which tries to tackle these aspects ofMPSoC programming in an integrated development environment built upon the Eclipse framework. We give an overview of the MAPS framework, highlighting its differences to the previous work in [7], and report on experiences using the tool. |
Slides |
Title | (Invited Paper) System-level Development of Embedded Software |
Author | *Gunar Schirner (Northeastern University, U.S.A.), Andreas Gerstlauer (University of Texas, Austin, U.S.A.), Rainer Domer (University of California, Irvine, U.S.A.) |
Page | pp. 903 - 909 |
Keyword | System Level Design, Embedded Software |
Abstract | Embedded software plays an increasingly important role in implementing modern embedded systems. Development of embedded software, and of Hardware-dependent Software in particular, is challenging due to the tight integration with the underlying hardware architecture. In this paper, we describe our system-level design approach that allows designers to develop software in form of a platform-agnostic specification. Our design environment enables exploration of different architectural alternatives and subsequently generates the software implementation. It generates the application code, communication drivers, and an adaptation to a chosen RTOS. It completes the process by producing the final target binary for each processor. Our experimental results demonstrate the automatic generation of the binaries for five control and media oriented applications. |
Slides |