



# Lifetime-aware Design Methodology for Dynamic Partially Reconfigurable Systems

Presented by: Siva Satyendra Sahoo

Siva Satyendra Sahoo, Dr. Bharadwaj VeeravalliDr. Tuan D.A. Nguyen , Dr. Akash KumarDepartment of Electrical and Computer Engineering,<br/>National University of Singapore<br/>satyendra@u.nus.edu, elebv@nus.edu.sgDr. Tuan D.A. Nguyen , Dr. Akash KumarCenter for Advancing Electronics Design,<br/>Technische Universitat, Dresden<br/>tuan\_duy\_anh.nguyen1,akash.kumar@tu-dresden.de

- > Motivation
- > Dynamic Partial Reconfiguration (DPR):
  - Background
  - Features
  - Aging mitigation
- System model
- System Design methodology
- Experiment and Results
- Conclusion

## > Motivation

Dynamic Partial Reconfiguration:

- Background
- Features
- Aging mitigation
- > System model
- System Design methodology
- **Experiment and Results**
- Conclusion

# Increasing fault-rates

Insufficient

voltage scaling

Increased

Fault Rate

Transistor

Scaling

Manufacturing

defects

Increased

Variability



Void (open circuit) and hillock (short circuit) Electromigration



J. Keane and C. H. Kim, "An odometer for CPUs," in IEEE Spectrum, vol. 48, no. 5, pp. 28-33, May 2011.

J. Srinivasan, S. V. Adve, P. Bose and J. A. Rivers, "The impact of technology scaling on lifetime reliability," Dependable Systems and Networks, 2004 International Conference on, 2004, pp. 177-186.

# Increasing fault-rates



 $\cap$ 

0

Hard Dielectric Breakdown Gate-oxide Breakdown

0

0

0



Hot carrier injection



Void (open circuit) and hillock (short circuit) Electromigration



## **Reduced System Lifetime**



J. Keane and C. H. Kim, "An odometer for CPUs," in IEEE Spectrum, vol. 48, no. 5, pp. 28-33, May 2011.

J. Srinivasan, S. V. Adve, P. Bose and J. A. Rivers, "The impact of technology scaling on lifetime reliability," Dependable Systems and Networks, 2004 International Conference on, 2004, pp. 177-186.





☆ Mission failures



### ☆Reduced safety in critical systems

Power plants, transportation, medical etc.



### Meeting increasing computation demands:

- Parallelism
- Custom Computing
  - Hardware Accelerators



### Meeting increasing computation demands:

- Parallelism
- Custom Computing
  - Hardware Accelerators

### Finite computation resources !!

### *\* Time-sharing* of computing resources:

• Cost-efficient *parallel* systems



### Meeting increasing computation demands:

- Parallelism
- Custom Computing
  - Hardware Accelerators

### *Finite* computation resources !!

### *☆ Time-sharing* of computing resources:

- Cost-efficient *parallel* systems
- Multi-processor and/or Multi-core SoCs:
  - Multiple applications sharing a number of *instruction-set processor pipeline*





### Meeting increasing computation demands:

- Parallelism
- Custom Computing
  - Hardware Accelerators

### *Finite* computation resources !!

### *☆ Time-sharing* of computing resources:

- Cost-efficient *parallel* systems
- Multi-processor and/or Multi-core SoCs:
  - Multiple applications sharing a number of *instruction-set processor pipeline*
- FPGAs:
  - Multiple hardware accelerators sharing *reconfigurable hardware*
  - Parallel + "Custom"







# Dynamic Partial Reconfiguration (DPR)

☆ Partially Reconfigurable Modules (PRMs)

☆ <u>Partially Reconfigurable Regions</u> (PRRs)



#### **Dynamic Partial Reconfiguration** (DPR)☆ Partially Reconfigurable Modules (PRMs) ☆ Partially Reconfigurable Regions (PRRs) **PR** Region (PRR) $R_1$ $R_2$ $M_3$ **PR** Module (PRM) Reconfig Compute $R_1$ $M_1$ $M_2$ $R_2$ $M_3$ $M_4$ **Execution Trace**

## > Motivation

## > Dynamic Partial Reconfiguration:

### Background

### Features

- Aging mitigation
- > System model
- System Design methodology
- **Experiment and Results**
- > Conclusion

# PRM-PRR compatibility





| PRRs →                | <i>R</i> <sub>1</sub> | $R_2$       |
|-----------------------|-----------------------|-------------|
| PRMs 🗸                |                       |             |
| <i>M</i> <sub>1</sub> | >                     | <           |
| <i>M</i> <sub>2</sub> | <b>&gt;</b>           | ×           |
| <i>M</i> <sub>3</sub> | ×                     | ~           |
| $M_4$                 | <b>&gt;</b>           | <b>&gt;</b> |

PRM-PRR Compatibility

Affects:

- PRRs' size
- #PRRs (*available* parallelism)
- Bitstreams *storage*

# Scheduling PRMs on PRRs

- - Latency (*Timing* Reliability)





# Scheduling PRMs on PRRs

Deadlines

- Latency (Timing Reliability)
- - System MTTF (Lifetime Reliability)







# System-level Spatial Redundancy

### ☆ Number of available PRRs

- Increased available *parallelism*
- *Net* aging reduced



# DPR and System Lifetime

☆ Scheduling : Deadlines and Aging

System-level Spatial Redundancy

Tools to improve the system MTTF

## > Motivation

## > Dynamic Partial Reconfiguration:

- Background
- Features
- Aging mitigation
- System model
- System Design methodology
- **Experiment and Results**
- > Conclusion

# System MTTF-aware Scheduling

- ☆ Aim: Reduce aging of each PRR
- ☆ Constraints:
  - Execution latency



# System MTTF-aware system\_

# partitioning

### ☆ Homogeneous v/s Heterogeneous PRRs Effects:

- Maximum #PRRs
- Aging of each PRR



- > Motivation
- > Dynamic Partial Reconfiguration:
  - Background
  - Features
  - Aging mitigation

## System model

- System Design methodology
- Experiment and Results
- > Conclusion

# System model: *Application*

### ☆ Task-graph

☆ Parameters for problem formulation



Application Task-graph

| Parameter | Description                        |  |  |
|-----------|------------------------------------|--|--|
| TaskID    | Serial number of task              |  |  |
| TaskType  | Type of PRM used                   |  |  |
| StartT    | Start time of task                 |  |  |
| ExecT     | Expected execution time            |  |  |
| EndT      | End time of task                   |  |  |
| TaskCLBs  | CLBs used for PRM implementation   |  |  |
| TaskBRAMs | BRAMs used for PRM implementation  |  |  |
| TaskDSPs  | DSPs used for PRM implementation   |  |  |
| TaskMTTF  | Expected MTTF of the task PRM      |  |  |
| TaskD     | Any soft/hard deadline of the task |  |  |

Task-level parameters



# PR-HMPSoC : PRRs and Static components NoC-based system



| Parameter               | Description                           |  |
|-------------------------|---------------------------------------|--|
| prrID <sub>r</sub>      | Serial number of PRR                  |  |
| prrCLBs <sub>r</sub>    | CLBs present in the PRR               |  |
| prrBRAMs <sub>r</sub>   | BRAMs present in the PRR              |  |
| prrDSPs <sub>r</sub>    | DSPs present in the PRR               |  |
| prrMTTF <sub>r</sub>    | Estimated MTTF of the PRR             |  |
| prrPRMs <sub>r</sub>    | List of PRMs supported by the PRR     |  |
| prrExTrace <sub>r</sub> | Schedule of task execution on the PRR |  |

PRR parameters

T. D. A. Nguyen and A. Kumar. PR-HMPSoC: A versatile partially reconfigurable heterogeneous Multiprocessor System-on-Chip for dynamic FPGA-based embedded systems. In *Proceedings of FPL*, 2014

$$\overset{(\text{Xiang2010})}{\overset{(\text{Xiang2010})}{\overset{(\text{Xiang2010})}{\overset{(\text{Xiang2010})}{\overset{(\text{Xiang2010})}{\overset{(\text{Xiang2010})}{\overset{(\text{Xiang2010})}{\overset{(\text{Xiang2010})}{\overset{(\text{Xiang2010})}{\overset{(\text{Xiang2010})}{\overset{(\text{Xiang2010})}{\overset{(\text{Xiang2010})}{\overset{(\text{Xiang2010})}{\overset{(\text{Xiang2010})}{\overset{(\text{Xiang2010})}{\overset{(\text{Xiang2010})}{\overset{(\text{Xiang2010})}{\overset{(\text{Xiang2010})}{\overset{(\text{Xiang2010})}{\overset{(\text{Xiang2010})}{\overset{(\text{Xiang2010})}{\overset{(\text{Xiang2010})}{\overset{(\text{Xiang2010})}{\overset{(\text{Xiang2010})}{\overset{(\text{Xiang2010})}{\overset{(\text{Xiang2010})}{\overset{(\text{Xiang2010})}{\overset{(\text{Xiang2010})}{\overset{(\text{Xiang2010})}{\overset{(\text{Xiang2010})}{\overset{(\text{Xiang2010})}{\overset{(\text{Xiang2010})}{\overset{(\text{Xiang2010})}{\overset{(\text{Xiang2010})}{\overset{(\text{Xiang2010})}{\overset{(\text{Xiang2010})}{\overset{(\text{Xiang2010})}{\overset{(\text{Xiang2010})}{\overset{(\text{Xiang2010})}{\overset{(\text{Xiang2010})}{\overset{(\text{Xiang2010})}{\overset{(\text{Xiang2010})}{\overset{(\text{Xiang2010})}{\overset{(\text{Xiang2010})}{\overset{(\text{Xiang2010})}{\overset{(\text{Xiang2010})}{\overset{(\text{Xiang2010})}{\overset{(\text{Xiang2010})}{\overset{(\text{Xiang2010})}{\overset{(\text{Xiang2010})}{\overset{(\text{Xiang2010})}{\overset{(\text{Xiang2010})}{\overset{(\text{Xiang2010})}{\overset{(\text{Xiang2010})}{\overset{(\text{Xiang2010})}{\overset{(\text{Xiang2010})}{\overset{(\text{Xiang2010})}{\overset{(\text{Xiang2010})}{\overset{(\text{Xiang2010})}{\overset{(\text{Xiang2010})}{\overset{(\text{Xiang2010})}{\overset{(\text{Xiang2010})}{\overset{(\text{Xiang2010})}{\overset{(\text{Xiang2010})}{\overset{(\text{Xiang2010})}{\overset{(\text{Xiang2010})}{\overset{(\text{Xiang2010})}{\overset{(\text{Xiang2010})}{\overset{(\text{Xiang2010})}{\overset{(\text{Xiang2010})}{\overset{(\text{Xiang2010})}{\overset{(\text{Xiang2010})}{\overset{(\text{Xiang2010})}{\overset{(\text{Xiang2010})}{\overset{(\text{Xiang2010})}{\overset{(\text{Xiang2010})}{\overset{(\text{Xiang2010})}{\overset{(\text{Xiang2010})}{\overset{(\text{Xiang2010})}{\overset{(\text{Xiang2010})}{\overset{(\text{Xiang2010})}{\overset{(\text{Xiang2010})}{\overset{(\text{Xiang2010})}{\overset{(\text{Xiang2010})}{\overset{(\text{Xiang2010})}{\overset{(\text{Xiang2010})}{\overset{(\text{Xiang2010})}{\overset{(\text{Xiang2010})}{\overset{(\text{Xiang2010})}{\overset{(\text{Xiang2010})}{\overset{(\text{Xiang2010})}{\overset{(\text{Xiang2010})}{\overset{(\text{Xiang2010})}{\overset{(\text{Xiang2010})}{\overset{(\text{Xiang2010})}{\overset{(\text{Xiang2010})}{\overset{(\text{Xiang2010})}{\overset{(\text{Xiang2010})}{\overset{(\text{Xiang2010})}{\overset{(\text{Xiang2010})}{\overset{(\text{Xiang2010})}{\overset{(\text{Xiang2010})}{\overset{(\text{Xiang2010})}{\overset{(\text{Xiang2010})}{\overset{(\text{Xiang2010})}{\overset{(\text{Xiang2010})}{\overset{(\text{Xiang2010})}{\overset{(\text{Xiang2010})}{\overset{(\text{Xiang2010})}{\overset{(\text{Xiang2010})}{\overset{(\text{Xiang2010})}{\overset{(\text{Xiang2010})}{\overset{(\text{Xiang2010})}{\overset{(\text{Xiang$$

- Motivation
- > Dynamic Partial Reconfiguration:
  - Background
  - Features
  - Aging mitigation
- > System model
- System Design methodology
- Experiment and Results
- > Conclusion









T. D. Nguyen and A. Kumar. PRFloor: An Automatic Floorplanner for Partially Reconfigurable FPGA Systems. In *Proceedings of FPGA*, 2016.







- Motivation
- > Dynamic Partial Reconfiguration:
  - Background
  - Features
  - Aging mitigation
- > System model
- System Design methodology
- Experiment and Results
- Conclusion

# Experiments and Results

#### Experiment Setup :

- Two CPUs: Intel Xeon E5-2609 v2 @ 2.50GHz (quad-core), 32 GB of memory
- Ubuntu 14.04 LTS 64-bit
- Virtex-6 XC6VLX240T
- *Gurobi* Solver for finding MILP solution
- Task-graphs generated using TGFF

# Experiments and Results

### ☆ Experiment Setup :

- **IP Pool:** 
  - 50 real-world hardware accelerators
  - Synthesized using Xilinx Vivado Suite (ver 16.2)

| PRM                   | LUT   | BRAM | DSP | Source    |
|-----------------------|-------|------|-----|-----------|
| DFDIV                 | 7309  | 1    | 24  | CHStone   |
| DFMUL                 | 4051  | 1    | 16  | CHStone   |
| Log2                  | 8212  | 0    | 0   | EPFL      |
| ADPCM                 | 6222  | 6    | 126 | OpenCores |
| FFT1024               | 19796 | 18   | 52  | OpenCores |
| SHA                   | 3069  | 20   | 0   | OpenCores |
| JPEG                  | 6581  | 11   | 10  | OpenCores |
| Video Stream Scaler   | 524   | 2    | 11  | Xilinx    |
| Video Test Pattern    | 2543  | 3    | 12  | Xilinx    |
| Microblaze (Max Area) | 5539  | 5    | 6   | Xilinx    |

Some notable PRMs used in experiments

Xilinx. 2017. Intellectual Property. www.xilinx.com/products/intellectualproperty.html. (2017).

EPFL. 2017. Combinational Benchmark Suite. lsi.epich/benchmarks. (201

OpenCores. 2017. www.opencores.org. (2017)

### Experiment Setup : Optimization modes :

- homogeneous / heterogeneous
- Minimize makespan / Maximize system MTTF

### **Task-graph types :**

Parallelism: *Fat* / *Slim*



ADFH 0 Jod= 2000 Frances Limits= 5 / 5

### ☆ Results: Fat graphs

System MTTF-aware scheduling : homogeneous PRRs



*sysMTTF*: Maximize system MTTF with deadline constraints *makespan:* Minimize makespan with deadline constraints

### ☆ Results: Fat graphs

> System MTTF-aware scheduling : *heterogeneous* PRRs



*sysMTTF*: Maximize system MTTF with deadline constraints *makespan:* Minimize makespan with deadline constraints

### Results: Fat graphs

#### System MTTF-aware system partitioning: *homogeneous* v/s

#### heterogeneous



- Maximize system MTTF with deadline constraints using *homogeneous* PRRs
- heterog--eneous

homog-

-eneous

- Maximize system MTTF with deadline
- constraints using *heterogeneous* PRRs

### ☆ Results: Summary

| Scenarios   | T=5  | T=10 | T = 15 | T=20 | T=25 | T=30 | T = 35 | T=40 | T = 45 | T=50 |
|-------------|------|------|--------|------|------|------|--------|------|--------|------|
| Fat, Large  | 0.00 | 0.21 | 0.82   | 0.75 | 1.52 | 1.37 | 6.62   | 7.96 | 8.33   | 7.33 |
| Slim, Large | 0.00 | 0.00 | 1.24   | 1.36 | 1.42 | 1.95 | 9.57   | 1.76 | 13.16  | 1.13 |

SysMTTF Improvements of Heterogeneous vs. Homogeneous Systems

<u>sysMTTF<sub>hetero</sub> – sysMTTF<sub>homo</sub> sysMTTF<sub>homo</sub></u>

### ☆ Results: Summary

| Scenarios   | T=5  | T=10 | T = 15 | T=20 | T=25 | T=30 | T=35 | T=40 | T = 45 | T=50 |
|-------------|------|------|--------|------|------|------|------|------|--------|------|
| Fat, Large  | 0.00 | 0.21 | 0.82   | 0.75 | 1.52 | 1.37 | 6.62 | 7.96 | 8.33   | 7.33 |
| Slim, Large | 0.00 | 0.00 | 1.24   | 1.36 | 1.42 | 1.95 | 9.57 | 1.76 | 13.16  | 1.13 |
| Fat, Small  | 0.00 | 0.00 | 0.00   | 0.00 | 0.00 | 0.00 | 0.17 | 0.06 | 0.06   | 0.00 |
| Slim, Small | 0.00 | 0.00 | 0.00   | 0.00 | 0.05 | 0.11 | 0.00 | 0.00 | 0.08   | 0.00 |

SysMTTF Improvements of Heterogeneous vs. Homogeneous Systems

 $\frac{sysMTTF_{hetero} - sysMTTF_{homo}}{sysMTTF_{homo}}$ 

# Conclusion

- A design methodology for lifetime-aware DPR-based systems was proposed
  - Scheduling with *aging-estimation*
  - Integration of *resource constraints* into scheduler
- Investigated *homogeneous* v/s *heterogeneous* PRRs
- Investigate trade-off between *aging-related* and *externally-induced* permanent faults (*future work*)
- Use other *global* optimization methods (*future work*)





### Results: Variation of System MTTF with #PRRs in a typical application with 25 tasks



### Results: Variation of System MTTF with #PRRs in a typical application with 25 tasks



### ☆ Results: Slim graphs



### ☆ Results: Slim graphs

System MTTF-aware scheduling : *homogeneous* PRRs



### ☆ Results: Slim graphs

System MTTF-aware scheduling : *heterogeneous* PRRs



### ☆ Results: Slim graphs

System MTTF-aware system partitioning: homogeneous v/s heterogeneous

