ASP-DAC 2009 Technical Program

The 14th Asia and South Pacific Design Automation Conference

Session 4A System Level Architectures
Time: 10:15 - 12:20 Wednesday, January 21, 2009
Location: Room 411+412
Chairs: Samar Abdi (University of California, Irvine, United States), Jun Yang (Univ. of Pittsburgh)

4A-1 (Time: 10:15 - 10:40)

Title	Computation and Data Transfer Co-Scheduling for Interconnection Bus Minimization
Author	Cathy Qun Xu (University of Texas at Dallas, United States), *Chun Jason Xue, Bessie C Hu (City University of Hong Kong, Hong Kong), Edwin H.M. Sha (University of Texas at Dallas, United States)
Page	pp. 311 - 316
Keyword	Scheduling, Interconnection network, clustered processors, data path synthesis
Abstract	High Instruction-Level-Parallelism in DSP and media applications demands highly clustered architecture. It is challenge to design an efficient, flexible yet cost saving inter-connection network to satisfy the rapid increasing inter-cluster data transfer needs. This paper presents a computation and data transfer co-scheduling technique to minimize the number of partially connected interconnection buses required for a given embedded application while minimizing its schedule length. Previous researches in this area focused on scheduling computations to minimize the number of inter-cluster data transfers. The proposed co-scheduling technique not only schedule computations to reduce the number of inter-cluster data transfers, but also schedule inter-cluster data transfers to minimize the number of required partially connected buses for inter-cluster connection network. Experimental results indicate that 52.3% fewer buses required compared to current best known technique while achieving the same schedule length minimization.

4A-2 (Time: 10:40 - 11:05)

Title	Prototyping Pipelined Applications on a Heterogeneous FPGA Multiprocessor Virtual Platform
Author	*Antonino Tumeo, Marco Branca, Lorenzo Camerini, Marco Ceriani (Politecnico di Milano, Italy), Matteo Monchiero (HP Labs, United States), Gianluca Palermo, Fabrizio Ferrandi, Donatella Sciuto (Politecnico di Milano, Italy)
Page	pp. 317 - 322
Keyword	FPGA, Prototyping, Pipelining, Multiprocessor, Multimedia
Abstract	Multiprocessors on a chip are the reality of these days. Semiconductor industry has recognized this approach as the most efficient in order to exploit chip resources, but the success of this paradigm heavily relies on the efficiency and widespread diffusion of parallel software. Among the many techniques to express the parallelism of applications, this paper focuses on pipelining, a technique well suited to data-intensive multimedia applications. We introduce a prototyping platform (FPGA-based) and a methodology for these applications. Our platform consists of a mix of standard and custom heterogeneous cores. We discuss several case studies, analyzing the interaction of the architecture and applications and we show that multimedia and telecommunication applications with unbalanced pipeline stages can be easily deployed. Our framework eases the development cycle and enables the developers to focus directly on the problems posed by the programming model in the direction of the implementation of a production system.
Slides

4A-3 (Time: 11:05 - 11:30)

Title	Variability-Aware Robust Design Space Exploration of Chip Multiprocessor Architectures
Author	*Gianluca Palermo, Cristina Silvano, Vittorio Zaccaria (Politecnico di Milano, DEI, Italy)
Page	pp. 323 - 328
Keyword	Design Space Exploration
Abstract	In the context of a design space exploration framework for supporting the platform-based design approach, we address the problem of robustness with respect to manufacturing process variations. First, we introduce response surface modeling techniques to enable an efficient evaluation of the statistical measures of execution time and energy consumption for each system configuration. We then introduce a robust design space exploration frameworkto afford the problem of the impact of manufacturing process variations onto the system-level metrics and consequently onto the application-level constraints. We finally provide a comparison of our design space exploration technique with conventional approaches.
Slides

4A-4 (Time: 11:30 - 11:55)

Title	Partial Conflict-Relieving Programmable Address Shuffler for Parallel Memories in Multi-Core Processor
Author	*Young-Su Kwon, Bon-Tae Koo, Nak-Woong Eum (Electronics and Telecommunications Research Institute, Republic of Korea)
Page	pp. 329 - 334
Keyword	parallel memory, access conflict, multi-core, memory
Abstract	The advancement of process technology enables the integration of multiple cores featuring parallel processing. The requirement of extensive memory bandwidth puts a major performance bottleneck in multi-core architectures for media applications. While the parallel memory system is a viable solution to account for a large amount of memory transactions required by multiple cores, memory access conflicts caused by simultaneous accesses to an identical memory page by two or more cores limit the performance of multi-core architectures. We propose and evaluate the programmable memory address shuffler associated with the novel memory shuffling algorithm integrated in multi-core architectures with parallel memory system. The address shuffler efficiently translates the requested memory addresses into the shuffled addresses such that access conflicts diminish by analyzing the access pattern of the application. We demonstrate that the shuffling of sub-pages is represented by cyclic linked list which enables partial address shuffling with the minimal number of shuffling table entries. The programmable address shuffler reduces the amount of access conflicts by 83% for pitch-shifting audio decompression.

4A-5 (Time: 11:55 - 12:20)

Title	HitME: Low Power Hit MEmory Buffer for Embedded Systems
Author	Andhi Janapsatya, *Sri Parameswaran, Aleksandar Ignjatovic (University of New South Wales, Australia)
Page	pp. 335 - 340
Keyword	memory, low power, cache, loop cache
Abstract	In this paper, we present a novel HitME (Hit-MEmory) buffer to reduce the energy consumption of memory hierarchy in embedded processors. The HitME buffer is a small direct-mapped cache memory that is added as additional memory into existing cache memory hierarchies. The HitME buffer is loaded only when there is a hit on L1 cache. Otherwise, L1 cache is updated from the memory and the processor's memory request is served directly from the L1 cache. The strategy works due to the fact that 90% of memory accesses are only accessed once, and these often pollute the cache. Energy reduction is achieved by reducing the number of accesses to the L1 cache memory. Experimental results show that the use of HitME buffer will reduce the L1 cache accesses resulting in a reduction in the energy consumption of the memory hierarchy. This decrease in L1 cache accesses reduces the cache system energy consumption by an average of 60.9% when compared to traditional L1 cache memory architecture and an energy reduction of 6.4% when compared to filter cache architecture for 70nm cache technology.