### A Precise Bandwidth Control Arbitration Algorithm for Hard Real-Time SoC Buses

Bu-Ching Lin, Geeng-Wei Lee, Juinn-Dar Huang, Jing-Yang Jou Department of Electronics Engineering, National Chiao Tung University, Hsinchu, Taiwan





## Outline

- Introduction
- Previous Works
  - fixed priority
  - time division multiple access (TDMA)
  - Lottery
  - RT\_lottery
- Proposed Arbitration Architecture
- Experimental Results
- Conclusions

# Introduction(1/2)

- Shared bus is widely used in current SoC designs
  - master initiate communication transactions
  - slave respond to transactions initiated by masters
  - arbiter manage the usage of bus



## Introduction(2/2)

- Requirements in different applications
  - complete transactions of all requests before the corresponding deadlines in real-time applications
  - take at least a fixed fraction of total bandwidth in multimedia applications
- Difficult to satisfy both real-time and bandwidth requirements simultaneously

an innovative arbitration algorithm is required

### **Previous Works**

- Existing arbitration algorithms
  - fixed priority
  - time division multiple access (TDMA)
  - Lottery
  - RT\_lottery

## **Fixed Priority**

Among the requesting masters, the one with the Ar highest priority gets granted



#### Pros

simple, low hardware cost and easy to implement

#### Cons

- starvation problem the masters with lower priority hardly get the service
- lack of control over real-time and bandwidth requirements



- Execution time is divided into time slots which are statically assigned to masters
- 2<sup>nd</sup>-level of arbitration is usually adopted to alleviate the wasted time slots



TDMA (2/2)

#### Pros

- deterministic worst-case response latency
- reserved bandwidth for each master

#### Cons

- difficult to design time slot sequences in an unpredictable system
- more slots → more bandwidth and shorter latency
  - what if a master with LOW bandwidth requirement but needs SHORT response latency?





# Latency and Bandwidth in Lottery

- Response latency and bandwidth allocation both controlled by the number of tickets
  - e.g., 3 masters have similar traffic behaviors



## Summary of Lottery

#### Pros

- good control over bandwidth allocation in network switching applications
- fair average response latency

#### Cons

- no hard real-time consideration
- no independent controllability over response latency and bandwidth allocation

# **RT\_lottery**

- A 2-level arbitration algorithm dealing with real-time and bandwidth requirements simultaneously
- The proposed architecture
  - 1<sup>st</sup> level real-time handler
    - handles the hard real-time requirements
  - 2<sup>nd</sup> level Lottery with tuned weight
    - reserves the bandwidth allocation for each master



### **Real-Time Handler**

- Similar to earliest deadline first scheduling (EDF)
  - the request with earliest deadline and below the warning line gets granted

#### Deadline

- the time limit for a master to complete a request
- missing the deadline is regarded as the real-time violation

#### Warning line

the worst case of scheduling the contending requests

# Weight Tuning

A ticket redistribution mechanism to meet the required bandwidth by simulation





# Summary of Previous Works

|  |                                                                                                                  | Pros                                               | Cons                                                         |
|--|------------------------------------------------------------------------------------------------------------------|----------------------------------------------------|--------------------------------------------------------------|
|  | fixed prioritysimplicity<br>area efficiencyTDMAdeterministic worst-case latency<br>reserved bandwidth allocation |                                                    | no real-time consideration<br>no means for bandwidth control |
|  |                                                                                                                  |                                                    | no hard real-time guarantee<br>no precise bandwidth control  |
|  | Lottery                                                                                                          | reserved bandwidth allocation fair average latency | no real-time consideration<br>no precise bandwidth control   |
|  | RT_lottery                                                                                                       | hard real-time guarantee                           | limitation of Weight Tuning                                  |

# **RB\_lottery Architecture**

- 3-level arbitration algorithm
  - real-time handler handles the hard real-time requirements
  - Lottery with tuned weight reserves the bandwidth allocation for each master
    - bandwidth regulator –provides fine-grained control over bandwidth allocation



# An Example of RB\_lottery

- Bandwidth regulator monitors the bus traffic
  - record the transactions of each master
  - temporarily block the requests from masters that have already got the required bandwidth in a period



### The Implementation

- Observation window (w) the execution time is divided into windows of size w cycles
  - block the requests of over-served masters temporarily
- Bandwidth register the allocated bandwidth in the current window



### **RB\_lottery Algorithm Flow**



- check whether a new window starts
- grant the most urgent master
- stochastically grant an unblocked master
  - record the transaction cycles
- check the allocated bandwidth
- reset all the blocked signals and proceed to the next window





## **Experiment Setup**

#### Behavior of masters

|          | type | beat/prob. |       | interval/prob. |       |       |       |       |
|----------|------|------------|-------|----------------|-------|-------|-------|-------|
| Master 1 | D    | 8/50       | 16/50 | 6/10           | 7/20  | 8/40  | 9/20  | 10/10 |
| Master 2 | D    | 1/50       | 4/50  | 10/10          | 11/20 | 12/40 | 13/20 | 14/10 |
| Master 3 | D    | 8/50       | 16/50 | 6/10           | 7/20  | 8/40  | 9/20  | 10/10 |
| Master 4 | D    | 1/50       | 4/50  | 10/10          | 11/20 | 12/40 | 13/20 | 14/10 |
| Master 5 | D_R  | 8/50       | 16/50 | 10/10          | 11/20 | 12/40 | 13/20 | 14/10 |
| Master 6 | D_R  | 1/50       | 4/50  | 10/10          | 11/20 | 12/40 | 13/20 | 14/10 |
| Master 7 | ND_R | 8/50       | 16/50 | 65/10          | 66/20 | 67/40 | 68/20 | 69/10 |
| Master 8 | ND_R | 1/50       | 4/50  | 85/10          | 86/20 | 87/40 | 88/20 | 89/10 |

Heavy-Traffic Light-Traffic

- 4 D type masters, 2 D\_R type masters and 2 ND\_R type masters in the simulation system
- □ Half of masters are heavy-traffic

# Performance Comparisons (1/2)

- Fail cases of different arbitration algorithms
  - 100 random required-bandwidth combinations for each workload
  - 102400 simulation cycles for each combination

| Workload<br>(%) | Fixed<br>Priority | Lottery | TDMA<br>+<br>Lottery | RT<br>_lottery | RB<br>_lottery |
|-----------------|-------------------|---------|----------------------|----------------|----------------|
| 60              | 100               | 100     | 95                   | 0              | 0              |
| 65              | 100               | 100     | 98                   | 0              | 0              |
| 70              | 100               | 100     | 100                  | 0              | 0              |
| 75              | 100               | 100     | 100                  | 10             | 0              |
| 80              | 100               | 100     | 100                  | 18             | 0              |
| 85              | 100               | 100     | 100                  | 37             | 1              |
| 90              | 100               | 100     | 100                  | 55             | 12             |
| 95              | 100               | 100     | 100                  | 74             | 44             |

# Performance Comparisons (2/2)

#### Hardware comparisons

|             | Fixed<br>Priority Lottery |      | TDMA<br>+<br>Lottery | RT<br>_lottery | RB<br>_lottery |
|-------------|---------------------------|------|----------------------|----------------|----------------|
| Gate counts | 215                       | 4296 | 4917                 | 5134           | 5814           |
|             |                           |      |                      |                |                |

#### Summary

|  |                        | real-time capability | bandwidth capability                                        |  |  |
|--|------------------------|----------------------|-------------------------------------------------------------|--|--|
|  |                        |                      | poor                                                        |  |  |
|  |                        |                      | good but weight tuning is required                          |  |  |
|  | TDMA+Lottery           | no guarantee         | good only in low loaded bus<br>(workload < 60%)             |  |  |
|  | RT_lottery always hold |                      | good but still fails in high loaded bus<br>(workload < 75%) |  |  |
|  | RB_lottery             | always hold          | good even in extremely high loaded bus                      |  |  |

# **Observation Window Comparisons**

- Fail cases in different size of observation window of RB\_lottery
  - 100 random required-bandwidth combinations for each workload
  - 102400 simulation cycles for each combination
    - the size of observation window ranges from 128 to 2048

|                                         | 1   |     |     |      |      | 60       |
|-----------------------------------------|-----|-----|-----|------|------|----------|
| Workload The size of observation window |     |     |     |      |      | r(       |
| (%)                                     | 128 | 256 | 512 | 1024 | 2048 | aquinu   |
| 85                                      | 4   | 1   | 0   | 0    | 0    | <u> </u> |
| 87                                      | 11  | 1   | 0   | 0    | 0    | os atter |
| 89                                      | 25  | 11  | 4   | 2    | 0    | il p     |
| 91                                      | 37  | 25  | 10  | 7    | 7    |          |
| 93                                      | 42  | 31  | 24  | 20   | 14   | 0        |
| 95                                      | 57  | 44  | 33  | 32   | 28   | ~2° 55   |
|                                         |     |     |     |      |      | V        |



### Conclusions

- RB\_lottery is proposed to provide
  - hard real-time guarantee
  - fine-grained bandwidth control

### The observation window in the bandwidth regulator

the larger size of observation window, the better controllability over bandwidth requirements A Precise Bandwidth Control Arbitration Algorithm for Hard Real-Time SoC Buses

# Thank you!