Multithreaded Coprocessor Interface for Multi-Core Multimedia SoC

Shih-Hao Ou

Department of Electronics Engineering

National Chiao Tung University, Taiwan

# Outline

- Introduction
- Proposed Multithreaded Coprocessor Interface
- Performance Evaluation
- Conclusion

### Introduction

- Dual-core & multi-core platform (with MPU & DSPs) are popular in multimedia and communication applications
  - MPU for control-oriented tasks, such as user interface, system coordination
  - DSP for computation-intensive tasks
- As the application complexity grows rapidly,
  - Multiple tasks tend to use DSP concurrently
  - Modern DSP explore parallelism among tasks to make the architecture more efficient
  - Thus, DSP task management is required

#### Dual-Core Software Architecture

- Task management can be done on
  - DSP itself (with an OS or a kernel)
    - Not feasible for the intensive program flows & interrupt handling
    - Significant context switch overheads
    - Idle DSP-specific functional units
  - MPU (as a device driver)
    - MPU response time significantly affects the DSP utilization
- Example: TI OMAP
  - DSP/BIOS (kernel) & DSP/BIOS Link (or Linux DSP Gateway)
  - Problems
    - Inefficient mailbox-based & interruptdriven IPC
    - Thick software layer with high context switch overhead





## Proposed Multithreaded Coprocessor Interface

- Intelligent host processor interface (HPI) with task management capability
  - Dedicated controller for task management offloaded either
    - From DSP (i.e. more hardware efficient)
    - From MPU (i.e. with quick response time)
  - Specific task loading mechanism
    - Instant task initialization
    - Reduce controller complexity



# Task Scheduling



#### Priority queue

| Task | Program<br>address | Destination | Queue Pointer |      |
|------|--------------------|-------------|---------------|------|
|      |                    |             | Head          | Tail |
| VLC  | &VLC               | -           | 1             | 0    |
| Q    | &Q                 | VLC         | 1             | 1    |
| DCT  | &DCT               | Q           | 2             | 1    |
| CST  | &CST               | DCT         | 4             | 2    |

Dispatch table

| Enable | Task |  |  |
|--------|------|--|--|
| 1      | VLC  |  |  |
| 0      | -    |  |  |
| 0      | -    |  |  |
| 0      | -    |  |  |

# Task Loading



**Computing Kernels** 

### Performance Evaluation

Experimental setup

- CoWare ESL platform
  - ARM926(@297MHz) + TI C'64(@594MHz)
- Applications
  - 256x256 JPEG encoding
- Experimental results (total execution time)
  - MPU(uC-Linux): 47.409 ms
  - DSP(uC-OS-II): 17.844 ms
  - HPI: 15.315 ms
- Implementation
  - The area overhead of the proposed HPI is only 0.65% of the DSP core

## Conclusion

- A multithreaded coprocessor interface with dynamical task management capability
  - Dedicated controller for task management
  - Specific task loading mechanism
- Our approach can improve the overall performance of a dual-core platform by 67% and the hardware overhead is only 0.65% of the DSP core