### MuCCRA-3: A Low Power Dynamically Reconfigurable Processor Array

Yoshiki Saito, Toru Sano, Masaru Kato, Vasutan Tunbunheng, Yoshihiro Yasuda<u>, Masayuki Kimura</u> and Hideharu Amano

> Department of Information and Computer Science, Keio University, Japan

Paper Session Number: 4D-19

# MuCCRA-3 Overview

- Dynamically Reconfigurable Processor Array (DRPA)
- Flexible offloading engine in various SoCs
  - streaming applications
- Multi-Context style
  - Datapath changing clock by clock
- Instruction-Level Parallelism
- Data-Level Parallelism

### MuCCRA-3

- Third prototype chip
- Optimized for low power consumption





Fig. 1: Datapath changing on DRPA

# Structure of MuCCRA-3



- PE Array
  - $\circ$  4x4 PEs
  - Data bit width: 16bit
  - No. of instruction: 14
- Data Memory (DMEM)
- 16bits × 128words × 8RAMs 0
- Inter PE Network
- 🔹 Island-style Links 🛹
- Direct Links 🔶 0



- Multi-Context style
  - 1 clock context switching
  - 32 hardware contexts

## **PE Structure and Context Switching**



#### Fig. 3: Structure of PE

- Dynamically Reconfigurable Modules
  - ALU, Register file, ALU Input SELs, SEs
  - Changes function of each module clock by clock
- Context Memory
  - Storing Configuration Data
  - Give Conf. Data to each module
    - by Context Pointer from Context Switch Controller (CSC)
  - No. of Entries: 32

# **Chip Implementation**



Fig. 4: MuCCRA-3 Floorplan

- Fujitsu e-shuttle 65nm CMOS Process
- Synthesize: Synopsys Design Compiler 2007.12–SP3
- Layout: Cadence's SoC Encounter 7.1
- Target clock frequency: 41.4 MHz

# **Evaluation Results**



Fig.5 Power for Executing Applications

- Power for Applications
  - Power consumption of clock tree: 30% of entire power (Sepia)
  - 10–13mW power consumption for each application
- Energy consumption
  - Approximately 90 times better than DSP(TI TMS32C6201)

#### Please come to see our poster !!