## Row-Based Area-Array I/O Design Planning in Concurrent Chip-Package Design Flow

Ren-Jie Lee, \*Hung-Ming Chen (EE Dept., National Chiao Tung Univ., Taiwan)

## Outline

Introduction

Novel I/O-bump tile design and I/O-row based planning

Package-aware I/O-bump planning methods

**Experimental results** 

## **Previous Work**

#### Wire-bonding package

Peripheral I/O-pad

#### Flip-chip package

Area-array I/O-bump

- ✓ Extrinsic area-array I/O
  - network-flow-based [5]
  - ILP-based [6]
- ✓ Intrinsic area-array I/O
  - I/O clustering method [7]
  - constraint-driven I/O planning [8]



## Motivation

#### Issues in previous works

#### Bumps are assumed to be arranged in fixed array location

- ✓ Flexibility in optimizing chip and package designs is restricted
- ✓ Costly RDL routing or I/O planning is needed

#### Pin-out (ballplan) is ignored

✓ It possibly leads to complicated or failed package design

#### The conventional design flow is a sequential flow

 It will result in long and costly re-spin cycles on satisfying the entire system's design constraints (see next slide)

## **Motivation (cont.)**

#### **Conventional chip-package design flow** (IC-driven)



## **Our Contributions**

- We propose a concurrent design flow
- We design the specific I/Obump tiles with I/O-row based scheme
- We develop two heuristics and one optimization algorithm to place I/O-bump



# Novel I/O-bump tile design and I/O-row based planning



2011/01/28

# Novel I/O-bump tile design and I/O-row based planning (cont.)



2011/01/28

## **Problem Description**

#### Input:

—The given net names and locations for n package balls.

—The design rules for chip and package.

#### Output:

—The assigned net names and locations for p I/Os and p bumps (p = n).

—The preliminary assignment provided for chip-level core-I/O placement and packagelevel bump-ball routing.

#### Assignment criteria:

-Minimum possible routing layer (minimum net crossing number).

-Minimum timing delay (minimum total net length).

-Minimum signal skew (minimum sum of length difference/deviation on each net).

#### Package-aware I/O-bump planning methods

Heuristic SORT: Double Sorting for Planar Planning



\*The monotonic routing is a route with no U-turn path. It consumes less routing resource and results in higher routing 2011/01/28 completion compared with nonmonotonic routing [6].

## Package-aware I/O-bump planning methods (cont.)

#### Heuristic GREEDY: Shortening Flylines Between I/O-Bumps and Package Balls



### Package-aware I/O-bump planning methods (cont.)

#### Optimization WBIPT: Matching-Based Assignment



\* where  $Diff_{ij} = /Order_{ball_i}$ -Order\_ $bump_j$  is obtained through directly subtracting the order of  $Bump_j$  from  $Ball_i$ , and 2011/01/28 therefore calculating the upper bound of crossing number [14].

## **Experimental Results**

#### The industrial chip designs

|      | Peripheral I/O                                        |                   |              |        | Area-Array I/O    |              |            |
|------|-------------------------------------------------------|-------------------|--------------|--------|-------------------|--------------|------------|
|      | Tech.                                                 | Die               | I/O          | I/O    | Die               | I/O-bump     | Die        |
|      | (um)                                                  | size              | size         | number | size              | tile size    | size       |
|      |                                                       | $(um^2)$          | $(um^2)$     |        | $(um^2)$          | $(um^2)$     | difference |
| d1   | 0.18                                                  | $2500^{2}$        | 115 	imes 65 | 220    | 2327 <sup>2</sup> | 160 	imes 80 | -6.92%     |
| *d2  | 0.18                                                  | 3250 <sup>2</sup> | 200 	imes 60 | 188    | 3475 <sup>2</sup> | 160 	imes 80 | +6.93%     |
| * d3 | 0.18                                                  | $2510^{2}$        | 140 	imes 65 | 130    | $2742^{2}$        | 160 	imes 80 | +9.25%     |
| d4   | 0.13                                                  | $2580^{2}$        | 120 	imes 75 | 200    | $2364^{2}$        | 160 	imes 80 | -8.39%     |
| d5   | 0.13                                                  | $4720^{2}$        | 115 	imes 50 | 628    | $4600^{2}$        | 160 	imes 80 | -2.55%     |
| d6   | 0.09                                                  | $6800^{2}$        | 175 	imes 65 | 390    | 6645 <sup>2</sup> | 160 	imes 80 | -2.29%     |
|      | (The utilization rate of core cells is kept the same) |                   |              |        |                   |              |            |

\* d2 and d3 are core-limited designs.

## **Experimental Results (cont.)**

#### **The summary of six I/O-bump planning methods**

|            | I/O-Bump Planning Method                |
|------------|-----------------------------------------|
| #1         | SORT                                    |
| #2         | GREEDY                                  |
| <b>#</b> 3 | $WBIPT \ (\alpha = 5000, \beta = 1.0)$  |
| <b>#</b> 4 | $WBIPT \ (\alpha = 2500, \beta = 1.0)$  |
| <b>#</b> 5 | WBIPT ( $\alpha = 1000, \beta = 1.0$ )  |
| <b></b> ₿6 | $WBIPT \ (\alpha = 500, \ \beta = 1.0)$ |

#### The I/O-bump planning on test case d5 (random)

|             | Flyline criteria              |            |                |                  |                |         |
|-------------|-------------------------------|------------|----------------|------------------|----------------|---------|
|             | Net                           | Wireler    | ngth           | Length deviation |                | runtime |
|             | crossing                      | Total (um) | Increase       | Total (um)       | Increase       | (sec)   |
| $\sharp 1a$ | 0                             | 5473480    | 1.010x         | 1432024          | 1.989x         | < 2.0   |
| $\sharp 2a$ | 1056                          | 5416680    | —              | 720120           | —              | < 2.0   |
| <b>#</b> 3a | 0                             | 5473480    | 1.010x         | 1432024          | 1.989x         | < 5.5   |
| $\sharp 4a$ | 32                            | 5461600    | 1.008x         | 1209704          | 1.680x         | < 5.5   |
| $\sharp 5a$ | 140                           | 5447080    | 1.006 <b>x</b> | 996644           | 1.384x         | < 5.5   |
| $\sharp 6a$ | 376                           | 5437040    | 1.004 <b>x</b> | 920888           | 1.279 <b>x</b> | < 5.5   |
|             | ("-" stands for the baseline) |            |                |                  |                |         |

## **Experimental Results (cont.)**

#### The I/O-bump planning on test case d5 (uniform)

|                     | Flyline criteria              |            |          |                  |          |         |  |
|---------------------|-------------------------------|------------|----------|------------------|----------|---------|--|
|                     | Net Wireler                   |            | ngth     | Length deviation |          | runtime |  |
|                     | crossing                      | Total (um) | Increase | Total (um)       | Increase | (sec)   |  |
| #1b                 | 0                             | 5813320    | 1.007x   | 1645728          | 1.167x   | < 2.0   |  |
| #2b                 | 1432                          | 5775240    | —        | 1410436          | —        | < 2.0   |  |
| <b>#</b> 3 <b>b</b> | 0                             | 5813320    | 1.007x   | 1645728          | 1.167x   | < 5.5   |  |
| #4b                 | 24                            | 5802480    | 1.005x   | 1505192          | 1.067x   | < 5.5   |  |
| #5b                 | 32                            | 5794920    | 1.003x   | 1480016          | 1.049x   | < 5.5   |  |
| <b>#6</b> ₽         | 148                           | 5786320    | 1.002x   | 1374196          | 0.974x   | < 5.5   |  |
|                     | ("-" stands for the baseline) |            |          |                  |          |         |  |

## **Experimental Results (cont.)**

Results of normalized performance metrics



## Conclusion

We propose a concurrent design flow which completes the core-I/O placement and package routing in parallel.

With our I/O-bump tile designs and I/O-row based scheme, we improve the flexibility in arranging I/Os and bumps.

Two heuristics and one optimization algorithm are provided to implement the package-aware I/O-bump planning.

# Thank You