An Interface-Circuit Synthesis Method with Configurable Processor Core in IP-Based SoC Designs

Shunitsu Kohara, Naoki Tomono, Junpei Uchida, Yuichiro Miyaoka, Nozomu Togawa, Masao Yanagisawa, Tatsuo Ohtsuki

Department of Computer Science, Waseda University

January 26th, 2006

## Outline

#### Introduction

#### ✓ IP-Based SoC Design Method

- ✓ Design Flow
- Architecture Model
- ✓ Interface
- ✓ IFC\_Synthesizer
  - IFC Architecture
  - IFC Synthesis Method
- Experimental Results
- ✓ Conclusion

## Introduction

# Introduction (1)

Requirements for SoC design
 Small Area, high performance and low energy
 Design in a short period and low cost

 Method for designing in a short period
 IP-Based Design: Reuse of IPs (Intellectual Property)
 Configurable Module: Adjustment for performance / area cost



# Introduction (2)

IP-Based SoC Design:



- Interface circuit should be designed automatically
- Previous works about an interface generation target such a situation



 Interface circuit is affected by the configurable processor core



# Introduction (4)

Our Proposal:

- ✓ IFC Architecture
- ✓ IFC Synthesis Method

IFC: An Interface Circuit between a Configurable Processor Core and a Hardware IP

## **IP-Based SoC Design Method**

#### **Design Flow**

- 1. Application HW/SW partitioning
- 2. (HW part) Selecting HW IPs from DB
- 3. (SW part) Processor Core Synthesis
- 4. Interface Circuit Synthesis



## IFC\_Synthesizer

Input: hardware IP Hardware IP Interface  $\checkmark$ interface description Description (CWL) (CWL) **Processor Core**  $\checkmark$ Interface **Parameters** Circuit CWL to XML Templates ✓ IFC Templates (HDL) converter (HDL) Output: ✓ Interface Circuit (HDL) hardware IP IFC\_Synthesizer Processor Core interface Parameters description (XML) Interface Circuit (HDL)

#### Interface Description Language: CWL (Compornent Wrapper Language)



| interface ex1:                             |    |  |  |
|--------------------------------------------|----|--|--|
| port:                                      |    |  |  |
| input.clock clk;                           |    |  |  |
| input.control cmd;                         |    |  |  |
| input.control req;                         |    |  |  |
| output.control ack;                        |    |  |  |
| output.data[31:0] dat;                     |    |  |  |
| endport                                    |    |  |  |
| alphabet:                                  |    |  |  |
| signalset all = {clk, cmd, req, ack, dat}; |    |  |  |
| W: {R, ?, 0, 0, ?};                        |    |  |  |
| $O(Xa): \{R, 0, 1, 0, ?\};$                |    |  |  |
| ERR: {R, 0, 1, 1, ?};                      |    |  |  |
| endsignalset                               |    |  |  |
| endalphabet                                |    |  |  |
| word:                                      |    |  |  |
| read(reg[9:0] Xa, reg[7:0] Xd) :           |    |  |  |
| $Q(Xa) W\{0,8\} [O(Xd)   ERR$              | ]; |  |  |
| endword                                    |    |  |  |
| endinterface                               |    |  |  |

#### **Architecture Model**



- ✓ A Processor Core
- ✓ A Memory
- ✓ Several Hardware IPs with IFC
- ✓ A Shared Bus



# Interface between Processor Core and Hadware IP

#### Based on ARM7TDMI Coprocessor Interface

✓ <u>Signal Interface</u>

for Handshake Protocol

✓ Instruction Interface

for Data Processing and Transferring

## **Signal Interface**

| name | meaning                        | direction                   |
|------|--------------------------------|-----------------------------|
| nCPI | Not CoProcessor<br>Instruction | Processor -><br>CoProcessor |
| СРА  | CoProcessor Absent             | CoProcessor -><br>Processor |
| СРВ  | CoProcessor Busy               | CoProcesosr -><br>Processor |

\* Here, CoProcessor = Hardware IP



## **Instruction Interface**

Hardware-IP-Instructions:

✓ <u>CDP</u> (CoProcessor Data Operation) Operate data in the Hardware IP

✓ LDC/STC (CoProcessor Load/Store Operation) Transfer data between IP and memory

✓ <u>MRC/MCR</u> (Register Transfer Operation) Transfer data between IP and processor core

## IFC\_Synthesizer

IFC ArchitectureIFC Synthesis Method

#### IFC Architecture – transferring data



✓ BUS\_I/O: controlling data flow via shared bus

✓ REGISTER: Saving data from / to a shared memory

#### IFC Architecture – control



- ✓ DECODER: Decoding Hardware-IP-Instructions
- ✓ INST\_QUEUE: Preserving decoded bit vectors
- ✓ HANDSHAKE: Handling handshake protocol
- ✓ CONTROLLER: Controlling all units in IFC with control signals

#### **IFC Synthesis Method**



- ✓ Synthesizing CONTROLLER is essential
- ✓ Refer to the paper about the ohters



## **CONTROLLER Synthesis Algorithm**

1. Ports Decision

External and internal ports in IFC are decided

2. States Decision

States for processing and transferring data are decided

- Sub-states Decision Sub-states, which define control signals to all the units, are decided
- 4. Sub-state Transitions Decision Transitions among sub-states are decided

## Step1: Ports Decision

| ;  |
|----|
|    |
|    |
|    |
|    |
|    |
|    |
|    |
| n  |
| 11 |
|    |
|    |
|    |

#### **Step2: States Decision**



Harware IP CWL

#### Step3: Sub-States Decision



# Step4: Sub-States Transitions Decision

| <pre>port:<br/>input.clock CLK;<br/>input.enable EN;<br/>input.control[1:0]CONT;<br/>input.data[7:0] ADR;<br/>output.data[31:0] DATA;<br/>endport<br/>alphabet:<br/>signalset a = {CLK, EN, CONT, ADR, DATA;;<br/>I: {R, 1, 2'b01, x, Z };<br/>N: {R, 1, 2'b00, x, Z };</pre> | <pre>elsif CURRENT_STATE = S_CDP_2_2 then if CNT_Q = 3 then NEXT_STATE &lt;= S_CDP_2_1; elsif CNT 0 = 6 then NEXT_STATE &lt;= S_CDP_2_3; else NEXT_STATE &lt;= S_CDP_2_2; end if; </pre> |
|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| <pre>R(Xa): {R, 1, 2'b10, Xa, Z };<br/>O(Xd): {R, 1, 2'b11, X, Xd };<br/>endsignalset<br/>endalphabet<br/>word:<br/>proc(Xa,Xd): (R(Xa) N[2])[1,2] O(Xd)[3];<br/>endword</pre>                                                                                                | <ul> <li>Deciding sequences of substates from "word" section in CWL</li> </ul>                                                                                                           |

# **Experimental Results**



## **Target Application**

#### ✓ MPEG-4 Encoder



| Us      |
|---------|
|         |
|         |
|         |
| <b></b> |

#### Using Hardware IPs

| Area [mm <sup>2</sup> ] |
|-------------------------|
| 0.3904                  |
| 2.4480                  |
| 3.6000                  |
|                         |

#### **Configuration of Synthesized Processor Cores**

| Name | Area                | a Frequency Configurations |        | nfigurations |         |       |
|------|---------------------|----------------------------|--------|--------------|---------|-------|
| Name | [mm <sup>2</sup> ]  | [MHz]                      | Kernel | Issue        | #ALUs   | #Regs |
| A    | 5.9723              | 81.300                     | RISC   | 4            | ALU x 2 | 47    |
|      |                     |                            |        |              | MUL x 2 |       |
| В    | 1.7554              | 70.225                     | DSP    | 2            | ALU x 1 | 8     |
| В    | 1.7554 70.225 DSF 2 | 1.7554 70.225              |        | MUL x 1      | O       |       |

# Results (1)

| function     | Processor<br>Core | IFC area<br>[mm²] |
|--------------|-------------------|-------------------|
| RGB to YCrCb | А                 | 0.1080            |
| RGB to YCrCb | В                 | 0.1148            |
| DCT/IDCT     | А                 | 0.1028            |
| DCT/IDCT     | В                 | 0.1108            |
| ME/MC        | А                 | 0.1547            |
| ME/MC        | В                 | 0.1638            |

# Results (2)

#### ✓ IFC\_Synthesizer

- Implemented in Ruby Language
- Executed on Linux 2.4, Pentium III 500MHz, RAM 192MB
- ✓ Execution time of IFC\_Synthesizer
  - ✓Max: 9.4 [sec]
  - ✓ Min: 4.3 [sec]



IFC\_Synthesizer reduces the cost of designing an interface circuit

## Conclusion

## Conclusion

- ✓ Our proposal:
  - IFC Architecture
  - IFC Synthesis method
- IFC\_Synthesizer reduce IFC development cost
  - Execution time ... less than 10 [sec]
  - ✓ Manual design ... about 3 [days]
- ✓ Future Work
  - Clock Gating for Low Energy Consumption

# Thank you





Why IFCs are need? IFCs should be in a synthesized processor.

- ✓ Result in a same thing (if IFCs are in a synthesized processor core).
- Since hardware IPs act parallely, IFCs are required independently every hardware IP.

## Why CWL is adopted?

- ✓ Interface Language is required for Hardware IP database.
- CWL is based on a regular expression, so we can describe wave form simply.
- CWL parser (XML converter) has been prepared.

# Why XML converter is need?

✓ For parsing.



What is Ruby Language? Why Ruby is adopted?

- Ruby is Object Oriented Script Language.
- ✓ <u>http://www.ruby-lang.org</u>
- IFC\_Synthesizer need not have high performance.
- Of cource, other language can be adopted.

### Architecture of Processor Core



DSP: 3 pipeline stages RISC: 5 pipeline stages



``A hardware/software co-synthesis system for digital signal processor core,"IEICE Trans. on Fundamentals, vol. E82-A, no.11, 1999.

#### Connections



### **IFC** Architecture



# IFC Synthesis Method – BUS\_I/O



- Depending on a bit length of a shared bus
- ✓ Independent of an using hardware IP

#### IFC Architecture – BUS\_I/O



- controlling data flow via shared bus:
  - 1. Hardware-IP-Instructions to DECODER
  - 2. Input Data from BUS to REGISTER
  - 3. Output Data from REGISTER to BUS

### IFC Architecture – DECODER



- ✓ Decoding Hardware-IP-Instructions
- ✓ Queuing decoded bit vector into INST\_QUEUE

## IFC Architecture – INST\_QUEUE



- Preserving decoded bit vectors
- ✓ Dequeuing them into CONTROLLER

## IFC Architecture – HANDSHAKE



- ✓ Interface for handshake signals (nCPI, CPA, CPB)
- ✓ Handling handshake protocol
- ✓ Communication with CONTROLLER

## IFC Architecture – REGISTER



(input REGISTER) Saving data from a shared memory
 (result REGISTER) Saving data from the hardware IP

## IFC Architecture – CONTROLLER



- ✓ Controlling all units in IFC with control signals
- Controlling the hardware IP for processing data
- ✓ See below for further details

# IFC Synthesis Method – DECODER



- Depending on a synthesized processor core.
   ... Instruction encoding specification
- ✓ Independent of an using hardware IP

#### IFC Synthesis Method – INST\_QUEUE



- Depending on a synthesized processor core.
   ... Pipeline stages
- Independent of an using hardware IP

#### IFC Synthesis Method – HANDSHAKE



✓ Fixed

... Handshake protocol has been defined.

#### IFC Synthesis Method – REGISTER



- The size of registers is given by processor core synthesis system
  - ... Hardware-IP-instructions used in Software include the length of transferring data

#### IFC Synthesis Method – CONTROLLER



✓ Next slides ...

# Using Hadware IPs

| function     | area [mm <sup>2</sup> ] |  |  |
|--------------|-------------------------|--|--|
| RGB to YCrCb | 0.3904                  |  |  |
| DCT / IDCT   | 2.4480                  |  |  |
| ME / MC      | 3.6000                  |  |  |

\* Hitachi 0.35 um CMOS

#### Configuration of Synthesized Processor Cores

|      | Area                  | Frequency<br>[MHz] | Configurations |       |                    |       |
|------|-----------------------|--------------------|----------------|-------|--------------------|-------|
| Name | ne [mm <sup>2</sup> ] |                    | Kernel         | Issue | #ALUs              | #Regs |
| А    | 5.9723                | 81.300             | RISC           | 4     | ALU x 2<br>MUL x 2 | 47    |
| В    | 1.7554                | 70.225             | DSP            | 2     | ALU x 1<br>MUL x 1 | 8     |