|Title||Computation and Data Transfer Co-Scheduling for Interconnection Bus Minimization|
|Author||Cathy Qun Xu (University of Texas at Dallas, United States), *Chun Jason Xue, Bessie C Hu (City University of Hong Kong, Hong Kong), Edwin H.M. Sha (University of Texas at Dallas, United States)|
|Page||pp. 311 - 316|
|Keyword||Scheduling, Interconnection network, clustered processors, data path synthesis|
|Abstract||High Instruction-Level-Parallelism in DSP and media applications demands highly clustered architecture. It is challenge to design an efficient, flexible yet cost saving inter-connection network to satisfy the rapid increasing inter-cluster data transfer needs. This paper presents a computation and data transfer co-scheduling technique to minimize the number of partially connected interconnection buses required for a given embedded application while minimizing its schedule length. Previous researches in this area focused on scheduling computations to minimize the number of inter-cluster data transfers. The proposed co-scheduling technique not only schedule computations to reduce the number of inter-cluster data transfers, but also schedule inter-cluster data transfers to minimize the number of required partially connected buses for inter-cluster connection network. Experimental results indicate that 52.3% fewer buses required compared to current best known technique while achieving the same schedule length minimization.|
|Title||Prototyping Pipelined Applications on a Heterogeneous FPGA Multiprocessor Virtual Platform|
|Author||*Antonino Tumeo, Marco Branca, Lorenzo Camerini, Marco Ceriani (Politecnico di Milano, Italy), Matteo Monchiero (HP Labs, United States), Gianluca Palermo, Fabrizio Ferrandi, Donatella Sciuto (Politecnico di Milano, Italy)|
|Page||pp. 317 - 322|
|Keyword||FPGA, Prototyping, Pipelining, Multiprocessor, Multimedia|
|Abstract||Multiprocessors on a chip are the reality of these days. Semiconductor industry has recognized this approach as the most efficient in order to exploit chip resources, but the success of this paradigm heavily relies on the efficiency and widespread diffusion of parallel software. Among the many techniques to express the parallelism of applications, this paper focuses on pipelining, a technique well suited to data-intensive multimedia applications. We introduce a prototyping platform (FPGA-based) and a methodology for these applications. Our platform consists of a mix of standard and custom heterogeneous cores. We discuss several case studies, analyzing the interaction of the architecture and applications and we show that multimedia and telecommunication applications with unbalanced pipeline stages can be easily deployed. Our framework eases the development cycle and enables the developers to focus directly on the problems posed by the programming model in the direction of the implementation of a production system.|
|Title||Partial Conflict-Relieving Programmable Address Shuffler for Parallel Memories in Multi-Core Processor|
|Author||*Young-Su Kwon, Bon-Tae Koo, Nak-Woong Eum (Electronics and Telecommunications Research Institute, Republic of Korea)|
|Page||pp. 329 - 334|
|Keyword||parallel memory, access conflict, multi-core, memory|
|Abstract||The advancement of process technology enables the integration
of multiple cores featuring parallel processing.
The requirement of extensive memory bandwidth
puts a major performance bottleneck in multi-core architectures
for media applications.
While the parallel memory system is a viable solution
to account for a large amount of memory transactions required by
multiple cores, memory access conflicts caused by
simultaneous accesses to an identical memory page
by two or more cores limit the performance of
We propose and evaluate the programmable memory address
shuffler associated with the novel memory shuffling algorithm
integrated in multi-core architectures with parallel memory system.
The address shuffler efficiently translates the requested
memory addresses into the shuffled addresses such that
access conflicts diminish by analyzing the access
pattern of the application.
We demonstrate that the shuffling of sub-pages is represented
by cyclic linked list which enables partial address
shuffling with the minimal number of shuffling table entries.
The programmable address shuffler reduces the amount of
access conflicts by 83% for pitch-shifting audio decompression.|
|Title||HitME: Low Power Hit MEmory Buffer for Embedded Systems|
|Author||Andhi Janapsatya, *Sri Parameswaran, Aleksandar Ignjatovic (University of New South Wales, Australia)|
|Page||pp. 335 - 340|
|Keyword||memory, low power, cache, loop cache|
|Abstract||In this paper, we present a novel HitME (Hit-MEmory) buffer to reduce the energy consumption of memory hierarchy in embedded processors. The HitME buffer is a small direct-mapped cache memory that is added as additional memory into existing cache memory hierarchies. The HitME buffer is loaded only when there is a hit on L1 cache. Otherwise, L1 cache is updated from the memory and the processor's memory request is served directly from the L1 cache. The strategy works due to the fact that 90% of memory accesses are only accessed once, and these often pollute the cache.
Energy reduction is achieved by reducing the number of accesses to the L1 cache memory. Experimental results show that the use of HitME buffer will reduce the L1 cache accesses resulting in a reduction in the energy consumption of the memory hierarchy. This decrease in L1 cache accesses reduces the cache system energy consumption by an average of 60.9% when compared to traditional L1 cache memory architecture and an energy reduction of 6.4% when compared to filter cache architecture for 70nm cache technology.|