ASP-DAC 2013 Technical Program

The 18th Asia and South Pacific Design Automation Conference

Session 1D University Design Contest
Time: 10:20 - 12:20 Wednesday, January 23, 2013
Chairs: Hiroshi Kawaguchi (Kobe University, Japan), Tetsuo Hironaka (Hiroshima City University, Japan)

1D-1 (Time: 10:20 - 10:25)

Title	A 40-nm 144-mW VLSI Processor for Real-time 60-kWord Continuous Speech Recognition
Author	*Guangji He, Takanobu Sugahara, Tsuyoshi Fujinaga, Yuki Miyamoto, Hiroki Noguchi, Shintaro Izumi, Hiroshi Kawaguchi, Masahiko Yoshimoto (Kobe University, Japan)
Page	pp. 71 - 72
Keyword	hidden Markov model(HMM), large vocabulary continuous speech recognition(LVCSR), memory bandwidth reduction
Abstract	We have developed a low-power VLSI chip for 60- kWord real-time continuous speech recognition based on a context-dependent Hidden Markov Model (HMM). Our implementation includes a cache architecture using locality of speech recognition, beam pruning using a dynamic threshold, two-stage language model searching, highly parallel Gaussian Mixture Model (GMM) computation based on the mixture level, a variable-frame look-ahead scheme, and elastic pipeline operation between the Viterbi transition and GMM processing. Results show that our implementation achieves 95% bandwidth reduction (70.86 MB/s) and 78% required frequency reduction (126.5 MHz). The test chip, fabricated using 40 nm CMOS technology, contains 1.9 M transistors for logic and 7.8 Mbit on-chip memory. It dissipates 144 mW at 126.5 MHz and 1.1 V for 60 kWord real-time continuous speech recognition.
Slides

1D-2 (Time: 10:25 - 10:30)

Title	A 24.5-53.6pJ/pixel 4320p 60fps H.264/AVC Intra-Frame Video Encoder Chip in 65nm CMOS
Author	*Dajiang Zhou, Gang He, Wei Fei, Zhixiang Chen, Jinjia Zhou, Satoshi Goto (Waseda University, Japan)
Page	pp. 73 - 74
Keyword	H.264/AVC, 4320p, video encoder, low power
Abstract	An H.264/AVC intra-frame video encoder is implemented in 65nm CMOS. With an efficient intra prediction design, its maximum throughput reaches 1991Mpixels/s for 7680x4320p 60fps video, 9.4x to 32x faster than previous designs. The encoder also incorporates a 1.41Gbins/s CABAC architecture that has been enhanced by 31%. Moreover, low energy consumption is achieved by the high parallelism and hardware efficiency of this design. 1080p 30fps encoding dissipates only 2mW at 0.8V and 9MHz.
Slides

1D-3 (Time: 10:30 - 10:35)

Title	A Low Power Multimedia Processor Implementing Dynamic Voltage and Frequency Scaling Technique
Author	Tadayoshi Enomoto, *Nobuaki Kobayashi (Chuo University, Japan)
Page	pp. 75 - 76
Keyword	motion estimation, Multimedia Processor, DVFS, power dissipation
Abstract	A 90-nm CMOS multimedia processor was developed by employing dynamic voltage and frequency scaling (DVFS) technique to greatly reduce the power dissipation (P). To adaptively predict the optimum supply voltage (VD) and the optimum clock frequency (fc) a fast motion estimation (ME) algorithm, an absolute difference accumulator as well as a DVFS controller were developed. Measured P of the multimedia processor was 34.4 µW, which was only 0.48% that of a conventional multimedia processor.
Slides

1D-4 (Time: 10:35 - 10:40)

Title	A 40-nm 0.5-V 12.9-pJ/Access 8T SRAM Using Low-Power Disturb Mitigation Technique
Author	*Shusuke Yoshimoto, Masaharu Terada, Shunsuke Okumura (Kobe University, Japan), Toshikazu Suzuki (Panasonic Corporation, Japan), Shinji Miyano (Semiconductor Technology Academic Research Center, Japan), Hiroshi Kawaguchi, Masahiko Yoshimoto (Kobe University, Japan)
Page	pp. 77 - 78
Keyword	SRAM, 8T, Low power, half select, write back
Abstract	This paper presents a novel disturb mitigation technique which achieves low-power and low-voltage SRAM. Our proposed technique consists of a floating bitline technique and a low-swing bitline driver (LSBD). We fabricated a 512-Kb 8T SRAM test chip that operates at a single 0.5-V supply voltage. The proposed technique achieves 1.52-pJ/access active energy in a write cycle and 72.8-uW leakage power, which are 59.4% and 26.0% better than the conventional write-back technique.
Slides

1D-5 (Time: 10:40 - 10:45)

Title	A Physical Unclonable Function Chip Exploiting Load Transistors’ Variation in SRAM Bitcells
Author	*Shunsuke Okumura, Shusuke Yoshimoto, Hiroshi Kawaguchi (Kobe University, Japan), Masahiko Yoshimoto (Kobe University/JST CREST, Japan)
Page	pp. 79 - 80
Keyword	SRAM, PUF, Chip ID
Abstract	We propose a chip identification (ID) generating scheme with random variation of transistor characteristics in SRAM bitcells. In the proposed scheme, a unique fingerprint is generated by grounding both bitlines. It has high speed, and it can be implemented in a very small area overhead. We fabricated test chips in a 65-nm process and obtained 12,288 sets of unique 128-bit fingerprints, which are evaluated in this paper. The failure rate of the IDs is found to be 2.1 × 10-12.
Slides

1D-6 (Time: 10:45 - 10:50)

Title	Over 10-Times High-speed, Energy Efficient 3D TSV-Integrated Hybrid ReRAM/MLC NAND SSD by Intelligent Data Fragmentation Suppression
Author	*Chao Sun (Chuo University/University of Tokyo, Japan), Hiroki Fujii (University of Tokyo, Japan), Kousuke Miyaji, Koh Johguchi (Chuo University, Japan), Kazuhide Higuchi (University of Tokyo, Japan), Ken Takeuchi (Chuo University, Japan)
Page	pp. 81 - 82
Keyword	SSD, ReRAM, TSV, MLC NAND
Abstract	A 3D through-silicon-via (TSV)-integrated hybrid ReRAM/multi-level-cell (MLC) NAND solid-state drive's (SSD's) architecture is proposed with NAND-like interface (I/F) and sector-access overwrite policy for ReRAM. Furthermore, intelligent data management algorithms are proposed to suppress data fragmentation and excess usage of MLC NAND. As a result, 11-times performance increase, 6.9-times endurance enhancement and 93% write energy reduction are achieved. Both ReRAM write and read latency should be less than 3us to obtain these improvements. The required endurance for ReRAM is 10^5.

1D-7 (Time: 10:50 - 10:55)

Title	Highly Reliable Solid-State Drives (SSDs) with Error-Prediction LDPC (EP-LDPC) Architecture and Error-Recovery Scheme
Author	*Shuhei Tanakamaru, Yuki Yanagihara (The University of Tokyo, Japan), Ken Takeuchi (Chuo University, Japan)
Page	pp. 83 - 84
Keyword	Solid-state drive, SSD, Error-correcting code, ECC, LDPC
Abstract	11-times extended lifetime, 76% reduced error SSD is proposed. The error-prediction LDPC realizes both 7-times faster read and high reliability. Errors are most efficiently corrected by calibrating memory data based on the VTH, inter-cell coupling, write/erase cycles and data-retention time. The error-recovery scheme with a program-disturb error-recovery pulse and a data-retention error-recovery pulse is also proposed to reduce the program-disturb error and the data-retention error by 76% and 56%, respectively.

1D-8 (Time: 10:55 - 11:00)

Title	A 3Gb/s 2.08mm² 100b Error-Correcting BCH Decoder in 0.13µm CMOS Process
Author	*Youngjoo Lee, Hoyoung Yoo, In-Cheol Park (KAIST, Republic of Korea)
Page	pp. 85 - 86
Keyword	ECC, BCH, decoder, optimization
Abstract	This paper presents a high-throughput BCH decoder that can correct 100 bit-errors. Several optimization methods are proposed to reduce the hardware complexity caused by the large error-correction capability. Based on the proposed methods, an 8-parallel decoder is designed for the (9592, 8192, 100) BCH code, which achieves a decoding throughput of 3Gb/s and occupies 2.08mm^2 in 0.13ìm CMOS process.

1D-9 (Time: 11:00 - 11:05)

Title	A 6.72-Gb/s, 8pJ/bit/iteration WPAN LDPC Decoder in 65nm CMOS
Author	*Zhixiang Chen, Xiao Peng, Xiongxin Zhao, Leona Okamura, Dajiang Zhou, Satoshi Goto (Graduate School of Information, Production and Systems, Waseda University, Japan)
Page	pp. 87 - 88
Keyword	LDPC, Decoder, WPAN, IEEE 802.15.3c, Permutation Network
Abstract	An LDPC decoder in 65nm CMOS targeting WPAN (IEEE 802.15.3c) is presented with measurement results. A modified-PCM based message permutation strategy with compatible data flow is proposed to solve the network problem raised by high parallelism LDPC decoding. Compared to the state-of-art, decoder chip achieves 17.7%, 33.5% and 49% improvements in chip density, gate count and energy efficiency, respectively.
Slides

1D-10 (Time: 11:05 - 11:10)

Title	A 7.5Gb/s Referenceless Transceiver for UHDTV with Adaptive Equalization and Bandwidth Scanning Technique in 0.13um CMOS Process
Author	*Junyoung Song (Korea University, Republic of Korea), Hyunwoo Lee (Hynix Inc., Republic of Korea), Sewook Hwang (Korea University, Republic of Korea), Inhwa Jung (Hynix Inc., Republic of Korea), Chulwoo Kim (Korea University, Republic of Korea)
Page	pp. 89 - 90
Keyword	Transceiver, CDR, PLL, Equalizer, Wireline
Abstract	A 7.5Gb/s referenceless transceiver for the ultra-high definition television is designed in a 0.13µm CMOS process. By applying the dynamic pre-emphasis calibration and the bandwidth scanning clock generators, measured eye opening and jitter of the clock are enhanced by 39.6% and 40%, respectively. Also the data-width comparison based adaptive equalizer with self-adjusting reference voltage is proposed.
Slides

1D-11 (Time: 11:10 - 11:15)

Title	A 12.5 Gb/s/Link Non-Contact Multi Drop Bus System with Impedance-Matched Transmission Line Couplers and Dicode Partial-Response Channel Transceivers
Author	*Atsutake Kosuge, Wataru Mizuhara, Noriyuki Miura, Masao Taguchi, Hiroki Ishikuro, Tadahiro Kuroda (Keio University, Japan)
Page	pp. 91 - 92
Keyword	Memory Interface, Coupler, Partial Response, Multi-Drop Bus
Abstract	A reduced-reflection multi-drop bus system using Dicode (1-D) partial response signaling transceiver is presented for the first time in the world. Directional couplers on transmission lines arranged with equi-energy distributing and exact impedance matched conditions allow the bus to reach to 12.5Gbps/link speed, which is the world’s fastest data link speed with multi-drop bus architecture. Dicode partial-response signaling method with a half-rate architecture was used where a precoder is placed in the transmitter to make the signal best fit for the channel to eliminate inter symbol interference (ISI).
Slides

1D-12 (Time: 11:15 - 11:20)

Title	315MHz OOK Transceiver with 38-µW Receiver and 36-µW Transmitter in 40-nm CMOS
Author	*Shunta Iguchi (University of Tokyo, Japan), Akira Saito (Semiconductor Technology Academic Research Center, Japan), Kentaro Honda, Yunfei Zheng (University of Tokyo, Japan), Kazunori Watanabe (Semiconductor Technology Academic Research Center, Japan), Takayasu Sakurai, Makoto Takamiya (University of Tokyo, Japan)
Page	pp. 93 - 94
Keyword	Transceiver, Sensor node, Low voltage, Low power, Intermittent sampling
Abstract	A 1-Mbps, 315MHz OOK transceiver in 40-nm CMOS for body area networks is developed. Both a 38-pJ/bit carrier-frequency-free intermittent sampling receiver with -55dBm sensitivity and a 36-pJ/bit transmitter applied dual supply voltage scheme with -20dBm output power achieve the lowest energy in the published transceivers for wireless sensor networks.
Slides

1D-13 (Time: 11:20 - 11:25)

Title	A Full 4-Channel 60GHz Direct-Conversion Transceiver
Author	*Seitaro Kawai, Ryo Minami, Ahmed Musa, Takahiro Sato, Ning Li, Tatsuya Yamaguchi, Yasuaki Takeuchi, Yuki Tsukui, Kenichi Okada, Akira Matsuzawa (Tokyo Institute of Technology, Japan)
Page	pp. 95 - 96
Keyword	60GHz, CMOS, tranceiver
Abstract	This paper presents a 60-GHz direct-conversion transceiver in 65 nm CMOS technology. By the proposed gain peaking technique, this transceiver realizes good gain flatness and is capable of more than 7Gbps in 16QAM wireless communication for every channel of IEEE802.15.3c standard within EVM of around -23dB. The transceiver consumes 319mW in transmitting and 223 mW in receiving, that includes the PLL consumption.
Slides

1D-14 (Time: 11:25 - 11:30)

Title	A Sub-harmonic Injection-locked Frequency Synthesizer with Frequency Calibration Scheme for Use in 60GHz TDD Transceivers
Author	*Teerachot Siriburanon, Wei Deng, Ahmed Musa, Kenichi Okada, Akira Matsuzawa (Tokyo Institute of Technology, Japan)
Page	pp. 97 - 98
Keyword	60GHz, Synthesizer, Calibration, Injection-locked
Abstract	A 58.1-to-65.0 GHz frequency synthesizer using sub-harmonic injection-locking technique is presented. The synthesizer can generate all 60GHz channels defined by IEEE 802.15.3c, wirelessHD, IEEE 802.11ad, WiGig, and ECMA-387. A frequency calibration scheme is proposed to monitor frequency shift resulting from environmental variations. Implemented in a 65nm CMOS process, the synthesizer achieves a typical phase noise of -117 dBc/Hz @10MHz offset from a carrier frequency of 61.56 GHz.

1D-15 (Time: 11:30 - 11:35)

Title	A Fractional-N Harmonic Injection-locked Frequency Synthesizer with 10MHz-6.6GHz Quadrature Outputs for Software-Defined Radios
Author	*Wei Deng, Ahmed Musa, Kenichi Okada, Akira Matsuzawa (Tokyo Institute of Technology, Japan)
Page	pp. 99 - 100
Keyword	synthesizer, fractional-N, SDR, PLL, injection-locked
Abstract	This paper presents an area-efficient frequency synthesizer with a quadrature phase output using a fractional-N injection-locking technique for software-defined radios. A background calibration scheme is proposed to compensate for the PVT variations. Implemented in a 65nm CMOS process, this work demonstrates 10 MHz to 6.6 GHz continuous quadrature frequency coverage, while only occupies a small area of 0.38 mm2 and consumes 16-26 mW depending on output frequency, from a 1.2 V power supply. The normalized phase noise achieves –135.3 dBc/Hz at 3 MHz offset, and -95.1 dBc/Hz in-band phase noise at 10 kHz offset, from a 1.7 GHz carrier frequency.

1D-16 (Time: 11:35 - 11:40)

Title	A Ring-VCO-Based Sub-Sampling PLL CMOS Circuit with 0.73 ps Jitter and 20.4 mW Power Consumption
Author	*Kenta Sogo, Akihiro Toya, Takamaro Kikkawa (Research Institute for Nanodevice and Bio Systems, Hiroshima University, Japan)
Page	pp. 101 - 102
Keyword	PLL, CMOS, JItter, Sampling, Phase noise
Abstract	This paper presents a ring voltage–controlled- oscillator(ring-VCO)-based sub-sampling phase locked loop (PLL) CMOS circuit with low phase noise and low jitter. A 2.08 GHz PLL is developed by use of 65 nm CMOS technology. The in-band phase noise is -119.1 dBc/Hz at 1 MHz and the output jitter integrated from 1 kHz to 10 MHz is 0.73 ps (rms) with the power consumpition 20.4 mW. The normalized jitter-power product is -229.7 dB.
Slides

1D-17 (Time: 11:40 - 11:45)

Title	Design of a Clock Jitter Reduction Circuit Using Gated Phase Blending Between Self-Delayed Clock Edges
Author	*Kiichi Niitsu, Naohiro Harigai, Daiki Hirabayashi, Daiki Oki, Masato Sakurai (Gunma University, Japan), Osamu Kobayashi (STARC, Japan), Takahiro J. Yamaguchi, Haruo Kobayashi (Gunma University, Japan)
Page	pp. 103 - 104
Keyword	jitter, clock, PLL, jitter reduction, CMOS
Abstract	Design of a clock jitter reduction circuit that exploits the phase blending technique between the uncorrelated self-delayed clock edges is demonstrated. By blending uncorrelated clock edges, the output clock edges approach the ideal timing and, thus, timing jitter can be reduced by a factor of square root of two per stage. Measurement results with a 180-nm CMOS prototype chip demonstrated approximately four-fold reduction in timing jitter from 30.2ps to 8.8ps in 500-MHz clock by cascading the proposed circuit with four-stages.

1D-18 (Time: 11:45 - 11:50)

Title	A 25-Gb/s LD Driver with Area-Effective Inductor in a 0.18-µm CMOS
Author	*Takeshi Kuboki (Kyoto University, Japan), Yusuke Ohtomo (NTT Electronics, Japan), Akira Tsuchiya (Kyoto University, Japan), Keiji Kishine (University of Shiga Prefecture, Japan), Hidetoshi Onodera (Kyoto University, Japan)
Page	pp. 105 - 106
Keyword	optical interconnect, LD driver, Interwoven inductor
Abstract	This paper presents high-speed and area-efficient laser-diode driver with interwoven inductor in a 0.18-μm CMOS. We interweave ten peaking inductors for area-effective implementation as well as performance enhancement. Interwoven inductor can not only achieve area-efficiency but also tune frequency characteristic. Mutual inductances of interwoven inductor enhance bandwidth and suppress group delay dispersion. The test chip area is 0.32 mm2 and the maximum operating speed is 25 Gb/s.
Slides

1D-19 (Time: 11:50 - 11:55)

Title	A Regulated Charge Pump with Low-Power Integrated Optimum Power Point Tracking Algorithm for Indoor Solar Energy Harvesting
Author	*Jungmoon Kim, Chulwoo Kim (Korea University, Republic of Korea)
Page	pp. 107 - 108
Keyword	Photovoltaic systems, solar energy harvesting, charge pump, maximum power point tracking, optimum power point tracking
Abstract	This paper presents a regulated charge pump (CP) with an integrated optimum power point tracking (OPPT) algorithm designed for indoor solar energy harvesting. The proposed OPPT circuit does not require a current sensor that consumes power proportionally to the load. The solar cell voltage is regulated at the optimum power point; the CP output is regulated according to the target voltage. The controller of the OPPT circuit and CP dissipates only 450nW, so the proposed technique is appropriate for indoor solar energy harvesting applications under dim lighting conditions.
Slides

1D-20 (Time: 11:55 - 12:00)

Title	A Low Voltage Buck DC-DC Converter Using On-Chip Gate Boost Technique in 40nm CMOS
Author	*Xin Zhang, Po-Hung Chen (University of Tokyo, Japan), Yoshikatsu Ryu (Semiconductor Technology Academic Research Center, Japan), Koichi Ishida (University of Tokyo, Japan), Yasuyuki Okuma, Kazunori Watanabe (Semiconductor Technology Academic Research Center, Japan), Takayasu Sakurai, Makoto Takamiya (University of Tokyo, Japan)
Page	pp. 109 - 110
Keyword	DC-DC converter, PWM controller, low voltage
Abstract	A low voltage buck DC-DC converter (0.45-V input, 0.4-V output) with on-chip gate boosted (OGB) and clock frequency scaled digital PWM controller is designed in 40-nm CMOS process. The highest efficiency to date is achieved at the output power less than 40µW. In order to compensate for the die-to-die delay variations of a delay line in the proposed digital PWM controller, a linear delay trimming by a logarithmic stress voltage (LSV) scheme with good controllability is also proposed and verified in measurement.
Slides

1D-21 (Time: 12:00 - 12:05)

Title	A 0.35-0.8V 8b 0.5-35MS/s 2bit/step Extremely-low Power SAR ADC
Author	*Kentaro Yoshioka, Akira Shikata, Ryota Sekimoto, Tadahiro Kuroda, Hiroki Ishikuro (Keio University, Japan)
Page	pp. 111 - 112
Keyword	SAR ADC, Extreme-low voltage, 2bit/step, Low power, Power efficient
Abstract	An extremely low-voltage operating high speed and low power 2bit/step asynchronous SAR ADC is presented. Wide range dynamic threshold configuring comparator is proposed to enable power and area efficient 2bit/step operation. By configuring the comparator threshold by simple Vcm biased current sources, the ADC holds immunity against 10% power supply variation. The prototype ADC fabricated in 40nm CMOS achieved 44.3 dB SNDR with 6.14 MS/s at a single supply voltage of 0.5 V. The ADC achieved a peak FoM of 5.9fJ/conv-step at 0.4V and operates down to 0.35V.
Slides