

### Energy-Efficient Real-Time Task Scheduling in Multiprocessor DVS Systems

<u>Jian-Jia Chen</u><sup>\*</sup>, Chuan Yue Yang, Tei-Wei Kuo, and Chi-Sheng Shih



Embedded Systems and Wireless Networking Lab. Department of Computer Science and Information Engineering, National Taiwan University \*Fisheries Agency, Council of Agriculture, Executive Yuan, Taiwan





# Agenda

#### Introduction

Scheduling Algorithms

- Energy-Efficient Scheduling for Homogeneous Multiprocessor Systems
  - Negligible leakage power consumption
  - Non-negligible leakage power consumption
- Energy-Efficient Scheduling for Heterogeneous Two-Processor Systems
- Allocation Cost Minimization for Multiprocessor Synthesis under Energy Constraints
- Conclusion and Open Issues

# Motivations for Power Saving



- Rapid Increasing of Power Consumption
  - Modern hardware design increases the power consumption of circuits.
  - The power consumption of processors increases dramatically.
- Slow Increasing of the Battery Capacity
  - The battery capacity increases about 5% per year
  - Battery life time is a major concern for embedded systems
- Embedded Systems vs Servers

The reduction of power is also needed to cut the power bill off







#### Hardware Methodology for Power Saving

- Dynamic power management (DPM)
  The operation mode of the system
  ACPI
- Micro-architecture technique
  - Adaptive architecture
  - Cache management
- Dynamic voltage scaling (DVS)
  - Supply voltage scaling
    - Intel Xscale, StrongARM; Transmeta Crusoe, Intel Pentium 4
    - Intel SpeedStep, AMD PowerNow!



Threshold voltage scaling

 $^{\circ}$ N E W S<sup>Lab</sup>



# Dynamic Voltage Scaling

- A higher supply voltage usually results in a higher frequency (or higher execution speed)
  - s =  $k * (V_{dd} V_t)^2 / (V_{dd})$ , where
    - s is the corresponding speed of the supply voltage  $V_{dd}$  and
    - $V_t$  is the threshold voltage
- The dynamic power consumption function P() of the execution speeds of a processor is a convex function:
  - P(s) =  $C_{ef} V_{dd}^2 s$ , in which  $C_{ef}$  is the switch capacitance related to tasks under executions
  - $P(s) = C_{ef} s^3 / k^2 , \text{ when } V_t = 0$
- The static power consumption comes from the leakage current
  - A constant or
  - A sub-linear function of speed s



#### SNEWS<sup>Lob</sup> An Example of Power Consumption Functions





# Energy Efficiency

*Energy-efficient* scheduling is to minimize the energy consumption while the performance index or the timing constraint is guaranteed

To minimize the energy consumption resulting from the dynamic voltage scaling circuits

*Slow down the execution speed* while the timing constraints could be met

To minimize the energy consumption resulting from the leakage current

Turn the circuit off whenever needed





#### Non-Negligible Leakage Power



ASP-DAC 2007, Yokohama Japan



# Why Multiprocessor?









## Related Work

[Gruian et al. ASP-DAC'01, Zhang et al. DAC'02]:

Heuristic algorithms based on the well-known list-scheduling

[Mishra et al. IPDPS'03]:

Heuristic algorithms based on the well-known list-scheduling for tasks with precedence constraints with communication costs

[Anderson and Baruah ICDCS'04]:

Heuristic partition algorithms to trade the number of processors with the energy consumption

[Aydin and Yang IPDPS'03, AlEnawy and Aydin RTAS'05]:

Heuristic algorithms based on traditional bin packing strategies







# Agenda

#### Introduction

Scheduling Algorithms

- Energy-Efficient Scheduling for Homogeneous Multiprocessor Systems
  - Negligible leakage power consumption
  - Non-negligible leakage power consumption
- Energy-Efficient Scheduling for Heterogeneous Two-Processor Systems
- Allocation Cost Minimization for Multiprocessor Synthesis under Energy Constraints
- Conclusion and Open Issues



# **Problem Definition**

#### Given a set **T** of *n* periodic real-time tasks:

- $\tau_i$  is characterized by its
  - arrival time: O
  - computing requirement: c<sub>i</sub> cycles
  - period:  $p_i$
  - relative deadline:  $p_i$
- A homogeneous multiprocessor environment with *M* processors

# The objective is to derive a feasible schedule so that

- each task is on a processor,
- each task completes in time, and
- the energy consumption is minimized



 $L_1$ 

 $L_2$ 

 $L_3$ 



### Algorithm Largest-Task First (LTF)



$$M = 3$$

- 1. Sort tasks in a nonincreasing order of  $c_i/p_i$
- 2. Assign tasks in a greedy manner to the processor with the smallest load
- 3. Execute tasks on a processor at the speed with 100% utilization

100% utilization is optimalprfor energy-efficiency whenwidenticaltasks are with an identicalpower consumption function

Jian-Jia Chen, He "Multiprocessor E

 $\tau_{2}$ 

Loads  $(c_i/p_i)$ 

 $\tau_1$ 

 $au_{2}$ 

 $\tau_{4}$ 

Jian-Jia Chen, He Real-Time Tasks Tei-Wei Kuo, Algorithm LTF is a 1.13-approximation algorithm RTS 2004.

neduling of

ASP-DAC 2007, Yokohama Japan

[Aydin et al. RTSS'01]: EDF

schedule by executing tasks

at a constant speed with

Multiprocessor Systems", in RTAS 2006.

# *Leakage-Aware Largest-Task-First* (LA+LTF)





M = 3



- 1. Sort tasks in a non-increasing order of their loads  $(c_i/p_i)$
- 2. Assign tasks in a greedy manner to the processor with the smallest total estimated utilizations
- 3. Decide execution speeds

Algorithm LA+LTF is a 1.283-approximation algorithm when the overhead on turning processors on/off is negligible



#### Scheduling Scheme Non-negligible Overhead on Turning Processors on/off

- She total load of tasks in T is no more than  $S_0$ 
  - An optimal solution will execute tasks on only one processor
  - It becomes a uniprocessor scheduling problem
    Apply the 2-approximation algorithm for uniprocessor
    EDF scheduling strategy by Irani et al. in SODA 2003
- The total load of tasks in T is greater than  $s_0$ 
  - Apply Algorithm LA+LTF for task assignment
  - Apply Algorithm FF (first-fit) for task reassignment



M=4



### Algorithm FF (First-Fit)



ASP-DAC 2007, Yokohama Japan



## Simulation Results



• Normalized energy: the energy consumption of the derived schedule divided by a lower bound of the input instance



• M=8





# Agenda

#### Introduction

#### Scheduling Algorithms

- Energy-Efficient Scheduling for Homogeneous Multiprocessor Systems
  - Negligible leakage power consumption
  - Non-negligible leakage power consumption
- Energy-Efficient Scheduling for Heterogeneous Two-Processor Systems
- Allocation Cost Minimization for Multiprocessor Synthesis under Energy Constraints
- Conclusion and Future Work





## System Models

- Processing element models
  - DVS PE
    - Ideal PE ( $S_{min} \sim S_{max}$ ) vs. Non-Ideal PE ( $S_{min} = S_1, S_2, ..., S_M = S_{max}$ )
    - Dynamic power consumption vs. static power consumption
    - Power consumption:  $P_1(s)$
  - Non-DVS PE
    - Workload-independence vs. workload-dependence
    - A networking device or an FPGA
    - Power consumption:  $P_2$
- Task models: a set T of n tasks
  - Period:  $p_i$
  - Worst-case execution cycles on the DVS PE: c<sub>i</sub>
  - Execution requirement on the non-DVS PE: u<sub>i</sub>
  - Feasibility constraints:

$$\sum_{\tau_i \text{ on the non-DVS PE}} u_i \leq$$





Chia-Mei Hung, Jian-Jia Chen, and Tei-Wei Kuo, "Energy-Efficient Real-Time Task Scheduling for a DVS System with a Non-DVS Processing Element", in IEEE Real-Time Systems Symposium (RTSS) 2006

N E W S Lab

 $\tau_{5}$ 

0.2

0.2

 $c_i/p_i$ 

 $\mathcal{U}_i$ 

 $au_4$ 

0.2

0.3

 $\tau_3$ 

0.2

0.4

 $\tau_2$ 

0.25

0.5



 $u_i$ 

## Algorithm E-GREEDY

- Sort the tasks in a non-increasing order of ÷
- Put all the tasks on the non-DVS PE initially

Consider tasks in the order to move to the DVS PE:



E-GREEDY: an 8-approximation algorithm

| τ <sub>1</sub><br>0.15 | Minimize            | $\sum_{\tau_i \text{ on the DVS PE}}$ | $c_i/p_i$ |
|------------------------|---------------------|---------------------------------------|-----------|
| 0.6                    | while $\sum_{\tau}$ | $_{i}$ on the DVS PE $u_{i} \geq$     | 1         |



 $\frac{\frac{c_i}{p_i}}{\frac{u_i}{u_i}}$ 



## **S-GREEDY**

Steps

- 1. Sort the tasks non-increasingly according to
- 2. Put all the tasks on the DVS PE
- 3. Generate a solution (A): go through from the first task

If a task keeps saving more energy during its migration, then move it to the non-DVS PE if feasible; otherwise fix it on the DVS PE





 $au_1$ 

0.2

0.2

 $c_i/p_i$ 

 $\mathcal{U}_i$ 

































 $au_1$ 

0.2

0.2

 $c_i/p_i$ 

 $\mathcal{U}_i$ 









#### Steps

- 1. Sort the tasks non-increasingly according to  $\frac{c_i/p_i}{u_i}$
- 2. Put all the tasks on the DVS PE
- 3. Generate a solution (A): go through from the first task, if a task keeps saving more energy during its migration, then move it to the non-DVS PE; otherwise fix it on the DVS PE
- 4. Generate a solution (B): move the most energysaving task to the non-DVS PE









|                       | $	au_1$ | $	au_2$ | $	au_3$ | $	au_4$ | $	au_5$ | <b>Only</b> τ |
|-----------------------|---------|---------|---------|---------|---------|---------------|
| $c_i/p_i$             | 0.2     | 0.2     | 0.2     | 0.25    | 0.15    | <b>,</b>      |
| <i>u</i> <sub>i</sub> | 0.2     | 0.3     | 0.4     | 0.5     | 0.6     | $(P_1(s) = $  |

| Only $\tau_1$ is moved to the non-DVS PE              |
|-------------------------------------------------------|
| $(P_1(s) = s^3, P_2 = 0.4, S_{max} = 1, S_{min} = 0)$ |







#### Steps of S-GREEDY

- 1. Sort the tasks non-increasingly according to  $\frac{c_i/p_i}{u_i}$
- 2. Put all the tasks on the DVS PE
- 3. Generate a solution (A): go through from the first task, if a task keeps saving more energy during its migration, then move it to the non-DVS PE; otherwise fix it on the DVS PE
- 4. Generate a solution (B): move the most energysaving task to the non-DVS PE
- 5. Return the better solution between (A) and (B)
- A 0.5-approximation ratio



## Evaluation Results (1)

Non-Ideal DVS PE & workload-independent non-DVS PE:





ASP-DAC 2007, Yokohama Japan



## Evaluation Results (2)

#### Ideal DVS PE & workload-dependent non-DVS PE:

 $n=10, P_2=558mW, U_1^*=1$ 









# Agenda

#### Introduction

#### Scheduling Algorithms

- Energy-Efficient Scheduling for Homogeneous Multiprocessor Systems
  - Negligible leakage power consumption
  - Non-negligible leakage power consumption
- Energy-Efficient Scheduling for Heterogeneous Two-Processor Systems
- Allocation Cost Minimization for Multiprocessor Synthesis under Energy Constraints
- Conclusion and Open Issues



# **Problem Definition**

#### 🧇 Input

- **m** processor types:  $M_j$  with cost  $C_j$ , j = 1 ... m
- n independent periodic real-time tasks:
  - Period of task  $\tau_i$ :  $p_i$
  - Relative deadline of task τ<sub>i</sub>: *p<sub>i</sub>*
  - Required cycles of a task instance of task  $\tau_i$  at  $M_j$ :  $c_{i,j}$
- An energy budget in the hyper-period *L* of tasks: *E<sub>budget</sub>*
- 🧆 Output
  - Select a multisubset of these *m* processor types
  - Assign each task to one allocated processor
  - Determine execution speeds of tasks
  - Consume no more than *E<sub>budget</sub>* in energy
  - Minimize the allocation cost of allocated processors



J.-J. Chen and T.-W. Kuo. Allocation cost minimization for periodic hard real-time tasks in energy-constrained DVS systems. In *ICCAD* 2006.

S<sup>Lab</sup> N E W S<sup>Lab</sup>



# Approaches for Non-Ideal Processors

Formulate the problem into an integer linear programming problem

- Relax the problem into a naïve linear programming (LP) problem → unbounded in worst cases
- Relax the problem into m linear programming
  - Sort processor types from cheap to expensive ones
  - Restrict no processor type with index larger than j is used for each j=1,2,...,m
- Find the solution with the minimum objective value among the m linear programming relaxations with feasible solutions
  - Algorithm ROUNDING
    - Assign tasks onto processors based on the optimal LP solution
    - Have (m+2)-approximation
  - Algorithm E-ROUNDING





## **Evaluation Results**



# Approaches for Ideal Processors

- One processor type:
  - Incrementally allocate processors until the energy consumption is satisfied
  - At most use (1.189)m<sup>\*</sup>+1 processors, where m<sup>\*</sup> is the optimal cost
- Multiple processor types:
  - Determine the (virtual) discrete speeds of processors manually
  - Apply algorithms for non-ideal processors to assign tasks and energy constraint on each processor type
  - Adopt the incremental approach to allocate processors in each processor type

$$\begin{array}{c} \tau_{1} \\ r_{2} \\ r_{3} \\ r_{4} \\ r_{5} \\$$







#### **Evaluation Results**









## Summary

- Energy-Efficient Multiprocessor Scheduling
  - Homogeneous multiprocessor systems
    - Negligible-leakage: A 1.13-approximation algorithm for homogeneous systems
    - Non-negligible leakage
      - A 1.283-approximation algorithm with negligible overhead in turning on/off processors
      - 2-approximation algorithms for the other case
  - Heterogeneous multiprocessor systems
    - Approximation algorithms for systems with two processors
- Energy-Constrained DVS Synthesis

Approximation algorithms based on parametric linear programming relaxations

- Ideal processor types
- Non-ideal processor types







## Additional Material

- Chip-multiprocessor (CMP) architecture [Yang, Chen, and Kuo in DATE 2005]
- Multiprocessor energy-efficient scheduling for tasks with different power characteristics [Chen and Kuo in ICPP 2005]
- Energy-efficient slack reclamation for multiprocessor systems [Chen, Yang, and Kuo in SUTC 2006]
- Energy-efficient scheduling with task rejection [Chen, Kuo, Yang, and King in DATE 2007]
- Multiprocessor instantaneous temperature minimization [Chen, Hung and Kuo in RTAS 2007]



 $\bigcirc$  N E W S<sup>Lab</sup>



# **Open Issues**

#### Energy-Efficient Scheduling for

- DVS systems and I/O devices
- Heterogeneous multiprocessors
- Tasks with precedence constraints
- Tasks with uncertain execution paths

#### Energy-Aware Synthesis

- Multi-dimensional floor-planning
- Tasks with precedence constraints
- Thermal-Aware Issues
  - Thermal-constrained scheduling
  - Thermal-constrained synthesis





## Questions and Suggestions?



