Title | (Invited Paper) SMYLE Project: Toward High-Performance, Low-Power Computing on Manycore-Processor SoCs |
Author | *Koji Inoue (Kyushu University, Japan) |
Page | pp. 558 - 560 |
Keyword | manycore, SoC, low power, high performance, processor |
Abstract | This paper introduces a manycore research project called SMYLE (Scalable ManYcore for Low Energy computing). The aims of this project are: 1) proposing a manycore SoC architecture and developing a suitable programming and execution environment, 2) designing a domain specific manycore system for emerging video mining applications, and 3) releasing developed software tools and FPGA emulation environments to accelerate manycore research and development in the community. The project started in December 2010 with full support from the New Energy and Industrial Technology Development Organization (NEDO). |
Title | (Invited Paper) SMYLEref: A Reference Architecture for Manycore-Processor SoCs |
Author | *Masaaki Kondo, Son Truong Nguyen (The University of Electro-Communications, Japan), Tomoya Hirao, Takeshi Soga, Hiroshi Sasaki, Koji Inoue (Kyushu University, Japan) |
Page | pp. 561 - 564 |
Keyword | Manycore Processor, Prototyping, FPGA |
Abstract | Nowadays, the trend of developing micro-processor with tens of
cores brings a promising prospect for embedded systems. Realizing a
high performance and low power many-core processor is becoming a
primary technical challenge. We are currently developing a
many-core processor architecture for embedded systems as a part of
a NEDO's project. This paper introduces the many-core architecture
called SMYLEref along whit the concept of Virtual Accelerator on
Many-core, in which many cores on a chip are utilized as a hardware
platform for realizing multiple virtual accelerators. We are
developing its prototype system with off-the-shelf FPGA evaluation
boards. In this paper, we introduce the architecture of SMYLEref
and the detail of the prototype system. In addition, several
initial experiments with the prototype system are also presented. |
Slides |
Title | (Invited Paper) SMYLE OpenCL: A Programming Framework for Embedded Many-core SoCs |
Author | *Hiroyuki Tomiyama, Takuji Hieda, Naoki Nishiyama, Noriko Etani, Ittetsu Taniguchi (Ritsumeikan University, Japan) |
Page | pp. 565 - 567 |
Keyword | manycore SoCs, OpenCL, embedded systems |
Abstract | Embedded SoC architecture has shifted from
single-core to multi/many-core paradigm because of better
power/performance efficiency. In order to exploit the potential
power/performance efficiency of the many-core architecture, a
parallel computing framework is necessary. OpenCL is one of the
most popular parallel computing frameworks in the field of
general-purpose computing on GPUs and multicore servers.
However, the existing OpenCL implementations are not suitable
to embedded real-time systems because of the large runtime
overhead. In this paper, we describe a lightweight OpenCL
framework for embedded multi/many-core SoCs. Our OpenCL
framework minimizes the runtime overhead by statically
creating threads and mapping them onto cores. Preliminary
experiments on an FPGA prototype board with a five-core
architecture shows a significant reduction in runtime overhead
compared with an existing OpenCL framework. |
Title | (Invited Paper) Support Tools for Porting Legacy Applications to Multicore |
Author | Yuri Ardila, *Natsuki Kawai, Takashi Nakamura, Yosuke Tamura (Fixstars Corporation, Japan) |
Page | pp. 568 - 573 |
Keyword | auto-parallelizer, performance estimation, benchmark, parallel computing |
Abstract | Abstract| This paper presents PEMAP, an automated performance estimation tool to project performance of hand-parallelized programs from sequential programs and BEMAP, a benchmark suite to measure an auto-parallelizer or even a machine's performance. BEMAP is an open-source project, and the documentations on code explanations and experimental results are also provided. Our experiments on PEMAP shows we can estimate performance of hand-parallelized programs in an error of 0.44% of sequential program's performance on average, while using BEMAP shows that the ability of an auto-parallelizer can be measured by comparing the compiled code to the hand-tuned parallelized OpenCL code, and therefore assisting the development of the auto-parallelizer tool. |
Slides |
Title | (Invited Paper) Manycore Processor for Video Mining Applications |
Author | *Yukoh Matsumoto, Hiroyuki Uchida, Michiya Hagimoto, Yasumori Hibi, Sunao Torii, Masamichi Izumida (TOPS Systems Corporation, Japan) |
Page | pp. 574 - 575 |
Abstract | Through Architecture-Algorithm co-design for Video Mining Applications we designed a scalable Manycore processor consists of clustered heterogeneous cores with stream processing capabilities, and zero-overhead inter-process communication through FIFO with a hardware-software mechanism. For achieving high-performance and low-power consumption, especially so as to reduce memory access required for Video Mining Applications, each application is partitioned to exploit both task and data parallelism, and programmed as a distributed stream processing with relatively large local register-file based on Kahn Process Network model. |
Slides |