## Design Methodology for 2.4GHz Dual-Core Microprocessor

Noriyuki Ito, Hiroaki Komatsu, Akira Kanuma, Akihiro Yoshitake, Yoshiyasu Tanamura, Hiroyuki Sugiyama, Ryoichi Yamashita, <u>Kenichi Nabeya</u>, Hironobu Yoshino, Hitoshi Yamanaka, Masahiro Yanagida, Yoshitomo Ozeki, Kinya Ishizaka, Takeshi Kono, Yutaka Isoda

**Fujitsu Limited** 



Design requirements
CAD system overview
Clock delay calculation
Custom macro design
Test
Conclusions

## **Design requirements**

### Design requirements

- High performance
- 2.4 GHz Use of state-of-the-art process 90 nm, 10 layers
- Short design time

about 12 months

### SPARC64 VI dual-core microprocessor



Process: 90nm, Cu metallization, 10 metal layers Frequency: 2.4GHz Die size: 20.38mm x 20.67mm Transistor count: 540M Level 2 on-chip cache: 6MB I/O signals count: 412 Power dissipation: less than 120W

## **CAD** system construction

### Important issues

 All of user experiences and know-how, which are accumulated in CAD tools, are continuously carried over to the new CAD system.
 Both in-house and EDA vendor tools are appropriately combined.

Design steps in which EDA vendor tools are used

|   | Design step                                      | Why?                                               |
|---|--------------------------------------------------|----------------------------------------------------|
| 1 | Logic simulator, emulator                        | Tools are mature                                   |
| 2 | Editors for cell/macro design, circuit simulator | No competitive advantage with in-house development |
| 3 | Noise analysis based on transistors              | In-house development is not in time                |
| 4 | DRC, LVS                                         | Specified as a sign-off tool                       |

## CAD system construction (cont'd)

### Design steps in which in-house tools are used

|   | Design step                                   | Why?                                                           |
|---|-----------------------------------------------|----------------------------------------------------------------|
| 1 | Logical and physical design rule checkers     | Must verify our original design rules                          |
| 2 | Layout, timing analysis                       | Key tools are preferred to be stable among generations         |
| 3 | Routing                                       | Need extensive tuning for the state-of-the-art<br>CMOS process |
| 4 | Placement, routing                            | No tools are available for engineered design                   |
| 5 | Noise analysis based on standard cells/macros | Must ensure correct margins                                    |
| 6 | Clock design, Power<br>grid design            | Capability of EDA vendor tools is insufficient                 |

## CAD system construction (cont'd)



## **Physical handling of cores**

Three types of core handling in a chip



# Physical handling of cores (cont'd)

Type "flat" has the following pros and cons:

Pros

No characterization

Less computer resources

>More accuracy of analyses

No new hierarchy

>No modification for an existing CAD

>No additional design work in a core level hierarchy

### Cons

Cannot handle a core as one instance



Design requirements
CAD system overview
Clock delay calculation
Custom macro design
Test
Conclusions

### **Clock delay calculation**

### H-shaped clock distribution



## Clock delay calculation (cont'd)

### Clock design technique



## Clock delay calculation (cont'd)

### Correlation between two methods

| By SPICE for split |            | By Elmore for one wire |       |            |       |
|--------------------|------------|------------------------|-------|------------|-------|
|                    | Post route | Steiner                | error | Post route | error |
| Path1              | 246.5ps    | 272.1ps                | 10.4% | 255.1ps    | 3.5%  |
| Path2              | 144.2ps    | 148.8ps                | 3.2%  | 138.9ps    | -3.7% |
| Path3              | 181.6ps    | 183.0ps                | 0.8%  | 176.7ps    | -2.7% |

Error is within 4%!

### To Speed up SPICE simulation

- Clock distribution circuit is divided from a root to leaves into several hundreds of groups.
- SPICE simulation takes about 5 hours on a 1.3GHz UNIX server with 5 jobs running in parallel.

# Design requirements CAD system overview Clock delay calculation Custom macro design Test Conclusions

## **Custom macro design tools**

### Vendor tools and in-house tools



### **TrTf analysis**

### Purpose of TrTf analysis

In transistor-level custom logic and peripheral logic of a RAM, rise time (Tr) and fall time (Tf) values are checked.



## TrTf analysis (cont'd)

### TrTf analysis tool implementation



## TrTf analysis (cont'd)

### Circuit model for TrTf analysis



## TrTf analysis (cont'd)

### Execution time in TrTf analysis

| Macro | # of Tr   | # of R    | # of C    | CPU time(H) |
|-------|-----------|-----------|-----------|-------------|
| No.1  | 12,054    | 190,740   | 409,582   | 0.3         |
| No.2  | 22,304    | 319,780   | 837,945   | 2.3         |
| No.3  | 2,056,140 | 6,421,074 | 4,777,646 | 14.5        |
| No.4  | 1,109,245 | 3,479,851 | 2,014,669 | 10.5        |
| No.5  | 291,288   | 3,461,953 | 8,047,243 | 27.7        |

- Design requirements
  CAD system overview
  Clock delay calculation
  Custom macro design
  Test
- Conclusions



The number of test vectors is reduced by 87% by the logic BIST circuit as compared to the conventional scan chains.



## Test (cont'd)

### Usefulness of good/no-good test in each core



## Statistics of test generation and verification of tests

It took about 3 weeks to generate test vectors and to verify them including a delay test.

### Generation

| Test     | # Faults        | # Vectors | Coverage | CPU Time (Hours) |
|----------|-----------------|-----------|----------|------------------|
| SCAN     | SCAN 17,216,718 |           | 00.7%    | 0.6              |
| FUNCTION | 26,877,639      | 3,343     | 99.7%    | 40.8             |
| RBIST    | N/A             | N/A       | N/A      | 0.1              |
| DELAY    | 19,705,662      | 3,971     | 91.0%    | 110.0            |

### Verification

| Test     | CPU Time (Hours) | Relative Verification Time |
|----------|------------------|----------------------------|
| SCAN     | 402.2            | 21.1%                      |
| FUNCTION | 12.6             | 0.7%                       |
| RBIST    | 1452.5           | 76.3%                      |
| DELAY    | 35.8             | 1.9%                       |
| Total    | 1903.1           | 100.0%                     |

- Design requirements
  CAD system overview
  Clock delay calculation
  Custom macro design
  Test
- Conclusions

### Conclusions

Conclusions

- Our design methodology is successfully applied to 2.4GHz dual-core microprocessor design.
  - In timing analysis, turn-around-time for modification of the clock distribution circuit is reduced by treating split and shielded wires as one wire.
  - In custom macro design, signal integrity analysis is enhanced. TrTf analysis is very fast and is applicable to large-scale custom macros such as RAMs.

Future work

- We are improving our system for the development of much higher performance microprocessors with 4 or more cores.
- We will focus on statistical timing analysis, power grid analysis, and delay test and diagnosis to improve yield and reliability.