#### Fast Multi-Domain Clock Skew Scheduling for Peak Current Reduction

#### Shih-Hsu Huang, Chia-Ming Chang, and Yow-Tyng Nieh

#### Department of Electronic Engineering, Chung Yuan Christian University, Taiwan



# Outline

- Introduction
- Preliminaries
- ASAP and ALAP
- Zone-Based Scheduling
- Experimental Results
- Conclusions

# Introduction

- In a high-speed synchronous digital circuit, a huge current peak is often observed at each clock edge due to the simultaneous signal switching.
- By skewing the clock inputs to various registers, it is possible to reduce this current peak.
- The peak current may lead to logic errors due to voltage drop or reliability problems due to electromigration.





#### Non-Zero Clock Skew Circuit



# Multiple Clocking Domains

- However, in fact, it is very difficult to implement a wide spectrum of dedicated clock delays.
- The architecture of multiple clocking domains, which restricts the clock arrival time of each register to specified clocking domains, provides an alternative to a wide spectrum of dedicated clock delays.

# Previous Work

- Vittal, Ha, Brewer and Marek-Sadowska ever used the 0-1 integer linear programming (ILP) formulation to minimize the peak current under the constraint of multi-domain clock skew scheduling.
- Although their approach guarantees the optimality (in terms of the specified cost function), the run time of the 0-1 ILP solver grows dramatically with the increase of binary variables.
- The high computational expense has limited the use of their approach.

# Our Approach

- We derive the ASAP (as-soon-as-possible) schedule and the ALAP (as-late-as-possible) schedule of each register. As a result, we can prune all the redundant binary variables and all the redundant constraints without sacrificing the exactness (optimality) of the solution.
- We also propose a zone-based scheduling algorithm to solve a large circuit heuristically. Experiments with large circuits consistently show that the zone-based scheduling achieves nearoptimal solutions within an acceptable run time.

# Outline

- Introduction
- Preliminaries
- ASAP and ALAP
- Zone-Based Scheduling
- Experimental Results
- Conclusions

# Our Advantages

- The proposed algorithms, including ASAP scheduling, ALAP scheduling, and zonebased scheduling, are independent of the used peak current model (i.e., applicable to any peak current model).
- The peak current can be reduced without any penalty on clock cycle time and average power dissipation.

### **0-1 ILP Formulation**

- Multi-domain clock skew scheduling is easier to implement because it only uses n discrete clocking domains: d<sub>1</sub>, d<sub>2</sub>, ...and d<sub>n</sub>
- The clock arrival time of register  $R_i$ , denoted as  $T_{Ci}$ , must be one of the n clocking domains.
- Binary variable S<sub>i,k</sub> is 1 if and only if T<sub>Ci</sub> is equal to d<sub>k</sub>.

 $d_3$ 

d₄

 $d_5$ 

de

 $d_7$ 

T<sub>ci</sub>

 $d_1$ 

 $d_2$ 

#### 0-1 ILP Formulation

- Multi-domain clock skew scheduling is easier to implement because it only uses n discrete clocking domains: d<sub>1</sub>, d<sub>2</sub>, ...and d<sub>n</sub>
- The clock arrival time of register  $R_i$ , denoted as  $T_{Ci}$ , must be one of the n clocking domains.
- Binary variable S<sub>i,k</sub> is 1 if and only if T<sub>Ci</sub> is equal to d<sub>k</sub>.









# Formulation 2



Double Clocking Constraint

$$T_{cj} - T_{ci} \le T_{PDi, j(min)}$$

$$\sum_{k=1}^{n} \mathbf{d}_{k} \cdot \mathbf{S}_{j,k} - \sum_{k=1}^{n} \mathbf{d}_{k} \cdot \mathbf{S}_{i,k} \leq \mathbf{T}_{\text{PDi},j(\text{min})}$$

# Formulation 3



$$\Gamma_{ci} - T_{cj} \le P - T_{PDi,j(max)}$$

$$\sum_{k=1}^{n} d_k \cdot S_{i,k} - \sum_{k=1}^{n} d_k \cdot S_{j,k} \le P - T_{\text{PD}i,j(\text{max})}$$

## Formulation 4



The Objective is to minimize the value of Peak\_Current (without loss of generality, here we assume the current of each register is the same).

# Outline

- Introduction
- Preliminaries
- ASAP and ALAP
- Zone-Based Scheduling
- Experimental Results
- Conclusions

## ASAP and ALAP

- Due to the zero clocking constraints and the double clocking constraints, many binary variables used in the 0-1 ILP formulations are definitely to be 0.
- We derive the ASAP schedule and the ALAP schedule to prune the redundancies without sacrificing the exactness (optimality) of the solution.
- For each register R<sub>i</sub>, we use ASAP<sub>i</sub> to denote the earliest clock arrival time and ALAP<sub>i</sub> to denote the latest clock arrival time.

#### **Forward Restriction**

For each data path from register  $R_i$  to register  $R_j$ , if the range of  $T_{ci}$  is fixed, then the range of  $T_{ci}$  is restricted as below:

 $ASAP_{j} \ge ASAP_{i} + P - T_{PDi, j(max)}$  $ALAP_{j} \le ALAP_{i} + T_{PDi, j(min)}$ 

#### **Backward Restriction**

For each data path from register  $R_i$  to register  $R_j$ , if the range of  $T_{cj}$  is fixed, then the range of  $T_{ci}$  is restricted as below:

 $ASAP_{i} \le ASAP_{j} + P - T_{PDi, j(max)}$  $ALAP_{i} \ge ALAP_{j} - T_{PDi, j(min)}$ 



















# Outline

- Introduction
- Preliminaries
- ASAP and ALAP
- Zone-Based Scheduling
- Experimental Results
- Conclusions

### **Zone-Based Scheduling**

- For a large circuit, the number of binary variables is still very large even though all the redundant binary variables are pruned. We propose a two-phase zone-based approach to solve a large circuit heuristically.
- A zone is defined as a portion of registers in the circuit graph. The maximum number of binary variables associated with a zone is limited. Therefore, each zone can be scheduled efficiently.

### **First Phase**

- Given a target clock period, we use [4] to obtain an initial feasible schedule.
- Then, we partition the registers in the circuit graph into several zones. Note that a register associated with few binary variables has higher priority to be re-scheduled.

### First Phase – Zone Partitioning



The maximum number of binary variables within a zone : 6

 $Z_1 = \{R_2, R_3, R_4\}$  $Z_2 = \{R_1, R_5\}$  $Z_3 = \{R_6\}$ 







 $Peak\_Current = 5$ 







 $Peak_Current = 4$ 

## First Phase – Zone 3



## First Phase – Zone 3



### First Phase – Zone 3



 $Peak_Current = 3$ 

## Second Phase

- A clocking domain is called the hottest clocking domain, if and only if the total current derivative at this clocking domain is exactly the same as the peak current.
- The registers scheduled in hottest clocking domains are re-scheduled. The iteration process repeats until the peak current cannot be further reduced.



 $Z_4 = \{R_2, R_4, R_5\}$ 



 $Z_4 = \{R_2, R_4, R_5\}$ 



 $Z_4 = \{R_2, R_4, R_5\}$ 



 $Z_4 = \{R_2, R_4, R_5\}$ 

Peak\_Current = 2

## Outline

- Introduction
- Preliminaries
- ASAP and ALAP
- Zone-Based Scheduling
- Experimental Results
- Conclusions

### **Experimental Results**

- The circuits from ISCAS'89 benchmarks suites are targeted to a 0.35 um cell library for the experiments.
- We use Extended LINGO Release 8.0 as the 0-1 ILP solver.
- Running on a personal computer with AMD K8-3GHz CPU and 1G Bytes RAM.

# **Pruning Redundancies**

| Circuit |                | [6]          |                 | Reduced Formulations |              |                 |  |
|---------|----------------|--------------|-----------------|----------------------|--------------|-----------------|--|
|         | #variable<br>s | #constraints | CPU<br>time (s) | #variables           | #constraints | CPU<br>time (s) |  |
| S3271   | 6612           | 2447         | 30087           | 3342                 | 2027         | 88              |  |
| S3384   | 24339          | 4316         |                 | 4917                 | 3531         | 24              |  |
| S5378   | 6265           | 4860         | 16046           | 2088                 | 342          | 59              |  |
| S6669   | 45171          | 5476         |                 | 3951                 | 3265         | 122             |  |
| S9234   | 14364          | 6803         |                 | 6554                 | 6042         | 232             |  |
| S13207  | 48837          | 10062        |                 | 20057                | 8416         | 362             |  |
| S15850  | 56715          | 34418        |                 | 18124                | 28314        |                 |  |
| S35932  | 60480          | 12081        | 714             | 17516                | 3714         | 271             |  |
| S38584  | 114708         | 32189        |                 | 40018                | 29569        |                 |  |

## **Results of Zone Scheduling**

| Circuit | 500 variables in a zone |                  |                    | 2000 variables in a zone |                  |                    | whole-circuit   |                    |
|---------|-------------------------|------------------|--------------------|--------------------------|------------------|--------------------|-----------------|--------------------|
|         | peak<br>current         | tackled<br>Zones | CPU<br>Time<br>(s) | peak<br>current          | tackled<br>Zones | CPU<br>time<br>(s) | peak<br>current | CPU<br>time<br>(s) |
| S3271   | 4                       | 8                | 10                 | 4                        | 4                | 88                 | 4               | 88                 |
| S3384   | 5                       | 10               | 5                  | 5                        | 4                | 14                 | 5               | 24                 |
| S5378   | 11                      | 7                | 7                  | 11                       | 2                | 54                 | 11              | 59                 |
| S6669   | 11                      | 10               | 11                 | 11                       | 5                | 26                 | 11              | 122                |
| S9234   | 7                       | 8                | 23                 | 7                        | 5                | 37                 | 7               | 232                |
| S13207  | 16                      | 28               | 205                | 16                       | 11               | 285                | 16              | 362                |
| S15850  | 15                      | 29               | 501                | 15                       | 12               | 712                |                 |                    |
| S35932  | 288                     | 35               | 133                | 288                      | 9                | 240                | 288             | 271                |
| S38584  | 38                      | 54               | 334                | 36                       | 21               | 513                |                 |                    |

## Benchmark S6669



## Outline

- Introduction
- Preliminaries
- ASAP and ALAP
- Zone-Based Scheduling
- Experimental Results
- Conclusions

## Conclusions

- This paper investigates a fast multidomain clock skew scheduling for peak current reduction.
- The main contribution of our work is that it overcomes the high computational expense of previous work.
- Our approach achieves good results within an acceptable run time.

# Thank You!