



### Low Power Design of the Next-Generation High Efficiency Video Coding

Authors: Muhammad Shafique, Jörg Henkel

CES – Chair for Embedded Systems

ces.itec.kit.edu

#### Outline



Introduction to the High Efficiency Video Coding (HEVC)

HEVC Analysis

- complexity, memory access, thermal
- Power-Efficient HEVC System Design

Conclusion

#### High Efficiency Video Coding (HEVC)



Ultra-HD (or supervision)

- 7680×4320 ≈ 33 million pixels per frame
- By 2017: 80% 90% global internet traffic

*Full HD @ 30fps* 1 second ≈ 712 Mbits 1 hour ≈ 2.4 Tbits

- New video compression standards/techniques required
- JCT-VC's High Efficiency Video Coding (HEVC)

~2× compression efficiency compared to H.264



#### Challenges for Developing HEVC-based Multimedia Systems





#### **HEVC Overview: Encoding Flow**





#### **HEVC Overview: Slices and Tiles**









#### **HEVC Overview: Tree-Block Structure**







#### **CTU Distribution**





#### **HEVC Overview: Intra and Inter Prediction**



#### **HEVC Intra Prediction**

**HEVC Inter Prediction** 



HEVC-Intra: ~2.56× more mode decisions than H.264
HEVC-Inter: ~2.2× more complex than H.264

#### **HEVC Overview: Motion Estimation**



- Block Matching (BM) or Motion Estimation (ME)
  - Compression by searching temporal neighbors
  - High energy/time, high compression efficiency (H.264-Inter, HEVC-Inter)



**Reference Frame** 



**Current Frame** 



**Residue Frame** 





#### Outline



#### Introduction to the High Efficiency Video Coding (HEVC)

#### HEVC Analysis

complexity, memory access, thermal

#### Power-Efficient HEVC System Design

#### Conclusion



# *Early PU size prediction* may provide significant reduction in computational and energy requirements

#### **HEVC Analysis: CTU Distribution**







Memory Access for Motion Estimation

- Memory accesses of HEVC ≈ 3.86× of H.264
- Most of the on-chip memory is wasted (leakage power)



 Adapting the search window size at run-time provides increased potential for leakage power savings

#### Using a thermal camera setup





DIAS Pyroview thermal camera operates at 50Hz with spatial resolution of 50 µm

Copyright: © Chair for Embedded Systems (CES), Karlsruhe Institute of Technology (KIT), Germany

dual-core processor (1.8 GHz) Src: Intel

## Temperature Measurements for HEVC [RaceHorses@37QP vs. 22QP]









#### Outline



Introduction to the High Efficiency Video Coding (HEVC)

#### HEVC Analysis

complexity, memory access, thermal

#### Power-Efficient HEVC System Design

#### Conclusion





#### **Analysis and Statistics**



| Parameter | Value | SAD    | SSE    | SATD | Kbps   |
|-----------|-------|--------|--------|------|--------|
| Max. CU   | 4     | 771    | 263    | 51   | 3001.7 |
| Depth     | 3     | 659.15 | 229.08 | 42.1 | 3320.9 |

### Variance and Motion based Classification

|  | AMP | 1 | 771    | 263   | 51    | 3001.7  |
|--|-----|---|--------|-------|-------|---------|
|  |     | 0 | 665.74 | 237.1 | 44.27 | 3072.92 |

#### **Complexity Reduction: PU Size Estimation**





#### **Time Savings and Video Quality Results**









Shafique @ ASPDAC, Jan. 2014





#### **HEVC Thermal Management**





#### **HEVC Thermal Management**





#### Power Efficient HEVC Design: Hardware Architecture





#### Hardware Accelerators





- Occupied Slices (luma)
- Occupied Slices (chroma)

**8 PPC** 

Predictor



External Memory holds the current frame

- High density, low read and write power
- On-chip SRAM memory (FIFO) holds only the current block
  - High read and write speed and low dynamic write power
  - Hides latencies from HEVC engine





- One MRAM buffer holds a full reference frame
- Each column (sector) of reference buffer is power-gated
- Reference read and write masters read and write data to the MRAM buffer



#### **AMBER: Reference Buffer Power Management**



Observation: Not all of the search window is used

Block matching algorithm accesses only a small percentage of reference buffer sectors



#### **Power Consumption (4 reference frames)**





 Increasing the number of reference frames improves the power consumption of the AMBER system compared to the search window approach

#### Conclusion



- Comprehensive analysis of HEVC
  - Architecture, power, thermal and complexity
- Challenges posed by HEVC
  - Architectural (memory, reconfiguration, accelerators)
  - Power/thermal (power-gating, configuration control)
  - Complexity (parallelization, many-core, workload balancing)
- Both Hardware and Software need to be optimized while leveraging the application-specific knowledge

#### Our approach

- Adaptive complexity management
- Video tiling, workload budgeting, CU/PU partitioning
- Power and thermal aware HEVC configuration
- Hybrid video memory hierarchy with content-driven power-gating

#### ces265: Multi-threaded HEVC Encoder



- Open-source
- C++ based
- Multithreading via pthread API
- One thread of ces265 ≈ 13.2× faster than HM-9.2



#### Web

- http://ces.itec.kit.edu/ces265/
- Download
  - https://sourceforge.net/projects/ces265/



#### Acknowledgement







Muhammad Daniel Usman Karim Palomino Khan Claudio M.

Diniz

36

Felipe Sampaio

ces.itec.kit.edu CE

# Thank you! Questions?

Web: http://ces.itec.kit.edu/ces265/ Download: https://sourceforge.net/projects/ces265/