6B: Application Examples with Leading Edge Design Methodology

6B-1

Title	Flow-Through-Queue based Power Management for Gigabit Ethernet Controller
Author	Hwisung Jung (University of Southern California, United States), Andy Hwang (Broadcom Corp., United States), *Massoud Pedram (University of Southern California, United States)
Abstract	This paper presents a novel architectural mechanism and a power management structure for the design of an energy-efficient Gigabit Ethernet controller. Key characteristics of such a controller are low-latency and high-bandwidth required to meet the pressing demands of extremely high frame and control data, which in turn cause difficulties in managing power dissipation. We propose a flow-through-queue (FTQ) based power management method, which allows some of the tasks involved in processing the frame data to be offloaded. This in turn enables utilization of multiple clock rates and multiple voltages for different cores inside the Ethernet controller. A modeling approach based on semi-Markov decision process (SMDP) and queuing models is employed, which allow one to apply mathematical programming formulations for energy optimization under performance constraints. The proposed Gigabit Ethernet controller is designed with a 130nm CMOS technology that includes both high and low threshold voltages. Experimental results show that the proposed power optimization method can achieve system-wide energy savings under tighter performance constraints.
Slides (pdf file)	6B-1

6B-2

Title	Approximation Algorithm for Process Mapping on Network Processor Architectures
Author	*Chris Ostler, Karam S. Chatha, Goran Konjevod (Arizona State University, United States)
Abstract	The high performance requirements of networking applications has led to the advent of programmable network processor (NP) architectures that incorporate symmetric multi-processing, and block multi-threading. The paper presents an automated system-level design technique for process mapping on such architectures with an objective of maximizing the worst case throughput of the application. As this mapping must be done in the presence of resource (processors and code size) constraints, this is an NP-complete problem. We present a polynomial time approximation algorithm which has a proven guarantee to generate solutions with throughput at least 1/2 that of optimal solutions. The proposed algorithm was utilized to map realistic applications on the Intel IXP2400 (NP) architecture, and produced solutions within 78% of optimal.
Slides (pdf file)	6B-2

6B-3

Title	Implementation of a Real Time Programmable Encoder for Low Density Parity Check Code on a Reconfigurable Instruction Cell Architecture (RICA)
Author	*Zahid Khan, Tughrul Arslan (The University of Edinburgh, Great Britain)
Abstract	This paper presents a real time programmable irregular Low Density Parity Check (LDPC) Encoder as specified in the IEEE P802.16E/D7 standard. The encoder is programmable for frame sizes from 576 to 2304 and for five different code rates. H matrix is efficiently generated and stored for a particular frame size and code rate. The encoder is implemented on Reconfigurable Instruction Cell Architecture which has recently emerged as an ultra low power, high performance, ANSI-C programmable embedded core. Different general and architecture specific optimization techniques are applied to enhance the throughput. With the architecture, a throughput from 10 to 19 Mbps has been achieved.
Slides (pdf file)	6B-3

6B-4

Title	VLSI Design of Multi Standard Turbo Decoder for 3G and Beyond
Author	*Imran Ahmed, Tughrul Arslan (University of Edinburgh, Great Britain)
Abstract	Turbo decoding architectures have greater error correcting capability than any other known code. Due to their excellent performance turbo codes have been employed in several transmission systems such as CDMA2000, WCDMA (UMTS), ADSL, IEEE 802.16 metropolitan networks etc. The computation kernel of the algorithm is very similar and we have exploited this commonality for a turbo decoder VLSI design suitable for deployment using platform based system on chip methodologies. Turbo and viterbi components of the unified array are also individually reconfigurable for different standards. This supports the 4G concept that user can be simultaneously connected to several access technologies (for example Wi-Fi, 3G, GSM etc) and can seamlessly move between them. A new normalization scheme for turbo decoding is presented to suit reconfigurable mappings. We have also shown dynamic reconfiguration methodology for a context switch between Turbo and Viterbi decoders which does not waste any clock cycles. The reconfigurable Turbo decoder fabric is implemented reusing components of Viterbi decoder on a 180 nm UMC process technology.
Slides (pdf file)	No Show

6B-5

Title	A High-Throughput Low-Power AES Cipher for Network Applications
Author	Shin-Yi Lin, *Chih-Tsun Huang (National Tsing Hua University, Taiwan)
Abstract	We propose a full-featured high-throughput low-power AES cipher which is suitable for widespread network applications. Different modes of operation are implemented, i.e., the ECB, CBC, CTR and CCM modes. Our cipher utilizes a cost-efficient two-stage pipeline for the CCM mode by a single datapath. With the design-for-test circuitry, the maximum throughput is 4.27 Gbps using a 0.13um CMOS technology with a 333MHz clock rate. The hardware cost is 86.2K gates with the power of 40.9mW.
Slides (pdf file)	6B-5

Last Updated on: January 29, 2007