ASP-DAC 2014 Technical Program

The 19th Asia and South Pacific Design Automation Conference

Session 4S Special Session: Design Automation Methods for Highly-Complex Multimedia Systems
Time: 10:10 - 12:15 Wednesday, January 22, 2014
Location: Room 302
Organizer: Sri Parameswaran (University of New South Wales, Australia)

4S-1 (Time: 10:10 - 10:40)

Title	(Invited Paper) SDG2KPN: System Dependency Graph to Function-Level KPN Generation of Legacy Code for MPSoCs
Author	Jude Angelo Ambrose, Jorgen Peddersen (University of New South Wales, Australia), Alvin Labios, Yusuke Yachide (Canon Information Systems Research Australia (CiSRA), Australia), *Sri Parameswaran (University of New South Wales, Australia)
Page	pp. 267 - 273
Keyword	MPSoC, KPN
Abstract	The Multiprocessor System-on-Chip (MPSoC) paradigm as a viable implementation platform for parallel processing has expanded to encompass embedded devices. The ability to execute code in parallel gives MPSoCs the potential to achieve high performance with low power consumption. In order for sequential legacy code to take advantage of the MPSoC design paradigm, it must first be partitioned into data flow graphs (such as Kahn Process Networks --- KPNs) to ensure the data elements can be correctly passed between the separate processing elements that operate on them. Existing techniques are inadequate for use in complex legacy code. This paper proposes SDG2KPN, a System Dependency Graph to KPN conversion methodology targeting the conversion of legacy code. By creating KPNs at the granularity of the function-/procedure-level, SDG2KPN is the first of its kind to support shared and global variables as well as many more program patterns/application types. We also provide a design flow which allows the creation of MPSoC systems utilizing the produced KPNs. We demonstrate the applicability of our approach by retargeting several sequential applications to the Tensilica MPSoC framework. Our system parallelized AES, an application of 950 lines, in 4.8 seconds, while H.264, of 57896 lines, took 164.9 seconds to parallelize.
Slides

4S-2 (Time: 10:40 - 11:10)

Title	(Invited Paper) Low Power Design of the Next-Generation High Efficiency Video Coding
Author	*Muhammad Shafique, Jörg Henkel (Karlsruhe Institute of Technology, Germany)
Page	pp. 274 - 281
Keyword	Low Power, HEVC, Temperature, Accelerator, Video Memory
Abstract	This paper provides a comprehensive analysis of the computational complexity, temperature, and memory access behavior for the next-generation High Efficiency Video Coding (HEVC) standard. We highlight the associated design challenges and present several low-power algorithmic and architectural techniques for developing power-efficient HEVC-based multimedia system. We explore the interplay between the algorithms and architectures to provide high power efficiency while leveraging the application-specific knowledge and video content characteristics.
Slides

4S-3 (Time: 11:10 - 11:40)

Title	(Invited Paper) Mapping Complex Algorithm into FPGA with High Level Synthesis
Author	*Kazutoshi Wakabayashi, Takashi Takenaka, Hiroaki Inoue (NEC Corp., Japan)
Page	pp. 282 - 284
Keyword	High Level Synthesis, FPGA, Contol dependency, data dependency, compiler
Abstract	This presentation discusses on the comparison between “Reconfigurable Chip with High Level Synthesis” and “CPU, GPCPU with compiler such as CUDA” from the compiler perspective. Initially, we introduce several demands for acceleration with FPGA to achieve low latency calculation and control. As an application example, we show a High Frequency Trading. We accelerate it by FPGA NIC with C-based and SQL-based HLS, and show the necessity of high level language customizable reconfigurable chip. Then, we illustrate the difference of FPGA and processor (CPU, GPGPU) with the “FSM+Datapath” model and examine how the architecture difference affects delay and parallelism of operations. Next, we discuss parallelization of operations, threads with High Level Synthesis for FPGA and software compiler for processors. The main advantage of the former method is it is able to parallelize operations beyond control dependencies while the latter method has to obey control dependencies. Finally, some experimental results prove that “FPGA and HLS” generate better performance than a processor for control intensive algorithm.

4S-4 (Time: 11:40 - 12:10)

Title	(Invited Paper) Leveraging Parallelism in the Presence of Control Flow on CGRAs
Author	Jihyun Ryoo, Kyuseung Han, *Kiyoung Choi (Seoul National University, Republic of Korea)
Page	pp. 285 - 291
Keyword	CGRA, control flow, mapping
Abstract	Coarse-Grained Reconfigurable Architectures (CGRAs) are suitable for accelerating data-intensive applications in embedded systems due to high performance and power efficiency. However, as application programs become complex having more control flows in them, it becomes harder to accelerate such programs on CGRAs. Previous researches on this issue have focused on correct execution of control flows rather than their acceleration. This paper reveals how control flows degrade the performance of programs and proposes a software approaches to accelerating control flows by exploiting parallelism residing in each conditionals as well as among conditionals. Experiments show that our proposed techniques improve performance by 2.51 times on average.
Slides