Title | (Invited Paper) Aspects of GPU for General Purpose High Performance Computing |
Author | *Reiji Suda (The University of Tokyo/JST CREST, Japan), Takayuki Aoki (Tokyo Institute of Technology/JST CREST, Japan), Shoichi Hirasawa (University of Electro-Communications/JST CREST, Japan), Akira Nukada (Tokyo Institute of Technology/JST CREST, Japan), Hiroki Honda (University of Electro-Communications/JST CREST, Japan), Satoshi Matsuoka (Tokyo Institute of Technology/JST CREST/NII, Japan) |
Page | pp. 216 - 223 |
Keyword | GPU computing, performance evaluation, scheduling algorithm, task parallel paradigm |
Abstract | We discuss hardware and software aspects of GPGPU, specifically focusing on NVIDIA cards and CUDA, from the viewpoints of parallel computing. The major weak points of GPU against newest supercomputers are identified to be and summarized as only four points: large SIMD vector length, small memory, absence of fast L2 cache, and high register spill penalty. As software concerns, we derive optimal scheduling algorithm for latency hiding of host-device data transfer, and discuss SPMD parallelism on GPUs. |
Title | (Invited Paper) Parallelizing Fundamental Algorithms such as Sorting on Multi-core Processors for EDA Acceleration |
Author | *Masato Edahiro (System IP Core Research Laboratories, NEC Corporation/Department of Computer Science, University of Tokyo, Japan) |
Page | pp. 230 - 233 |
Keyword | multi-core, many-core, parallel algorithm, sorting |
Abstract | Fundamental algorithms should be parallelized to accelerate EDA software on multi-core architecture. In this paper, we introduce scalable algorithms that have scalability on multi-cores. As an example, a sorting algorithm, called Map Sort, is presented. This algorithm uses a map from subsets of input data to intervals on data range. Experimental results show that, in comparison with quick sort on a single CPU, processing time of Map Sort is comparable on a CPU and three times faster on four CPUs. |
Slides |