(Back to Session Schedule)

The 19th Asia and South Pacific Design Automation Conference

Session 4S  Special Session: Design Automation Methods for Highly-Complex Multimedia Systems
Time: 10:10 - 12:15 Wednesday, January 22, 2014
Location: Room 302
Organizer: Sri Parameswaran (University of New South Wales, Australia)

4S-1 (Time: 10:10 - 10:40)
Title(Invited Paper) SDG2KPN: System Dependency Graph to Function-Level KPN Generation of Legacy Code for MPSoCs
AuthorJude Angelo Ambrose, Jorgen Peddersen (University of New South Wales, Australia), Alvin Labios, Yusuke Yachide (Canon Information Systems Research Australia (CiSRA), Australia), *Sri Parameswaran (University of New South Wales, Australia)
Pagepp. 267 - 273
KeywordMPSoC, KPN
AbstractThe Multiprocessor System-on-Chip (MPSoC) paradigm as a viable implementation platform for parallel processing has expanded to encompass embedded devices. The ability to execute code in parallel gives MPSoCs the potential to achieve high performance with low power consumption. In order for sequential legacy code to take advantage of the MPSoC design paradigm, it must first be partitioned into data flow graphs (such as Kahn Process Networks --- KPNs) to ensure the data elements can be correctly passed between the separate processing elements that operate on them. Existing techniques are inadequate for use in complex legacy code. This paper proposes SDG2KPN, a System Dependency Graph to KPN conversion methodology targeting the conversion of legacy code. By creating KPNs at the granularity of the function-/procedure-level, SDG2KPN is the first of its kind to support shared and global variables as well as many more program patterns/application types. We also provide a design flow which allows the creation of MPSoC systems utilizing the produced KPNs. We demonstrate the applicability of our approach by retargeting several sequential applications to the Tensilica MPSoC framework. Our system parallelized AES, an application of 950 lines, in 4.8 seconds, while H.264, of 57896 lines, took 164.9 seconds to parallelize.
Slides

4S-2 (Time: 10:40 - 11:10)
Title(Invited Paper) Low Power Design of the Next-Generation High Efficiency Video Coding
Author*Muhammad Shafique, Jörg Henkel (Karlsruhe Institute of Technology, Germany)
Pagepp. 274 - 281
KeywordLow Power, HEVC, Temperature, Accelerator, Video Memory
AbstractThis paper provides a comprehensive analysis of the computational complexity, temperature, and memory access behavior for the next-generation High Efficiency Video Coding (HEVC) standard. We highlight the associated design challenges and present several low-power algorithmic and architectural techniques for developing power-efficient HEVC-based multimedia system. We explore the interplay between the algorithms and architectures to provide high power efficiency while leveraging the application-specific knowledge and video content characteristics.
Slides

4S-3 (Time: 11:10 - 11:40)
Title(Invited Paper) Mapping Complex Algorithm into FPGA with High Level Synthesis
Author*Kazutoshi Wakabayashi, Takashi Takenaka, Hiroaki Inoue (NEC Corp., Japan)
Pagepp. 282 - 284
KeywordHigh Level Synthesis, FPGA, Contol dependency, data dependency, compiler
AbstractThis presentation discusses on the comparison between “Reconfigurable Chip with High Level Synthesis” and “CPU, GPCPU with compiler such as CUDA” from the compiler perspective. Initially, we introduce several demands for acceleration with FPGA to achieve low latency calculation and control. As an application example, we show a High Frequency Trading. We accelerate it by FPGA NIC with C-based and SQL-based HLS, and show the necessity of high level language customizable reconfigurable chip. Then, we illustrate the difference of FPGA and processor (CPU, GPGPU) with the “FSM+Datapath” model and examine how the architecture difference affects delay and parallelism of operations. Next, we discuss parallelization of operations, threads with High Level Synthesis for FPGA and software compiler for processors. The main advantage of the former method is it is able to parallelize operations beyond control dependencies while the latter method has to obey control dependencies. Finally, some experimental results prove that “FPGA and HLS” generate better performance than a processor for control intensive algorithm.

4S-4 (Time: 11:40 - 12:10)
Title(Invited Paper) Leveraging Parallelism in the Presence of Control Flow on CGRAs
AuthorJihyun Ryoo, Kyuseung Han, *Kiyoung Choi (Seoul National University, Republic of Korea)
Pagepp. 285 - 291
KeywordCGRA, control flow, mapping
AbstractCoarse-Grained Reconfigurable Architectures (CGRAs) are suitable for accelerating data-intensive applications in embedded systems due to high performance and power efficiency. However, as application programs become complex having more control flows in them, it becomes harder to accelerate such programs on CGRAs. Previous researches on this issue have focused on correct execution of control flows rather than their acceleration. This paper reveals how control flows degrade the performance of programs and proposes a software approaches to accelerating control flows by exploiting parallelism residing in each conditionals as well as among conditionals. Experiments show that our proposed techniques improve performance by 2.51 times on average.
Slides