A High-Throughput Low-Power Fully Parallel 1024-bit <sup>1</sup>/<sub>2</sub>-Rate Low Density Parity Check Code Decoder in 3-Dimensional Integrated Circuits

### C.-J. Richard Shi

Department of Electrical Engineering University of Washington, Seattle, WA 1/24/2006

### Presented by Sheldon X. D. Tan

Department of Electrical Engineering University of California, Riverside, CA









## High-Throughput Fully-Parallel LDPC Decoder and Applications



•Low-density parity-check (LDPC) codes are emerging as error correcting standards for many military and commercial applications, due to their near Shannon-limit performance.

- -Military Joint Tactical Radio Systems (JTRS)
- -NASA Space Communications Project
- -NASA OMNI Project





- -Direct instantiation of Tanner-graph representation of LDPC code
- -Two types of computation nodes, named variable nodes and check nodes
- -Ideal for high-throughput and low-power applications

$$H \cdot c = \begin{bmatrix} 1 & 1 & 0 & 1 & 0 & 0 & 0 \\ 0 & 1 & 1 & 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 1 & 0 & 1 & 0 \\ 0 & 0 & 1 & 1 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 & 1 & 0 & 1 \end{bmatrix} \begin{bmatrix} c_0 \\ c_1 \\ c_2 \\ c_2 \\ c_4 \\ c_5 \\ c_6 \end{bmatrix} = 0$$







# **3D Integration Technology**



•However, fully-parallel implementation has serious interconnect design challenges utilizing standard 2D technology.

#### **Reference:**

A. Blanksby, and C. J. Howland, "A 690-mW 1-Gb/s 1024-b, rate ½ low-density parity-check code decoder," IEEE Journal of Solid State Circuits, vol. 37, no. 3, pp. 404-412, Mar 2002.

•To address these interconnect design challenges, we explore the use of 3D IC technology.



**Cross-section of 3-tier 3D integration** 



### **The 3D 3-Tier LDPC Design**





**Top-View of 3-tier Final Layout** 



#### The simulated code performance

•The main data path is designed as 16 parallel three-stage pipelines.

•This allows the decoder to achieve a high throughput of 2Gb/s with a clock frequency of 128MHz. (128 MHz x 16 = 2 Gb/s) -The blue curve shows the BER vs. SNR performance up to a BER of 10<sup>-5</sup>.

-The green curve shows fast iteration convergence with increasing SNR.



# Summary



|             | 23076                   |                          | 10070                                                   | )                                                      | 13070          | Z3070             | 4J 70                        |
|-------------|-------------------------|--------------------------|---------------------------------------------------------|--------------------------------------------------------|----------------|-------------------|------------------------------|
|             | 250%                    | 270%                     | 160%                                                    |                                                        | 130%           | 230%              | 43%                          |
| improvement |                         |                          |                                                         |                                                        |                |                   |                              |
| 3D design   | (6.4*6.227)*3=<br>119.5 | 67.4                     | 8.68                                                    | 4.1                                                    | 24636          | 1                 | 430                          |
| 2D design   | 18.238*15.92=<br>290.3  | 182.4                    | 13.82                                                   | 4                                                      | 32900          | 2.33              | 750                          |
| 2D vs. 3D   | area(mm*mm)             | total wire<br>length (m) | max. wire<br>leng before<br>buffer<br>insertion<br>(mm) | max. wire<br>leng after<br>buffer<br>insertion<br>(mm) | buffer<br>used | clock<br>skew(ns) | power<br>dissipation<br>(mw) |

→Overall significant improvements based on real silicon comparison (8M transistor LDPC ASIC; MIT-LL 3D 3tier/2D processes)

### Contribution

•The first large-scale 3D ASIC implementation (2M gates).

•The first demonstration, by real silicon tape out and simulation, of a 3D IC process shown to yield an order of magnitude improvement over the corresponding 2D process, in terms of power-delay-area product (1.75 \* 2.5 \* 2.5 = 11).

•Proves the viability of our automated 3D design flow through the implementation of a large-scale silicon ASIC design.