ASP-DAC 2013 Technical Program

The 18th Asia and South Pacific Design Automation Conference

Session 7B Simulation Acceleration
Time: 10:20 - 12:20 Friday, January 25, 2013
Chairs: Farhad Mehdipour (Kyushu University, Japan), Antoine Trouve (Institute of Systems, Information Technologies and Nanotechnologies, Japan)

7B-1 (Time: 10:20 - 10:50)

Title	Native Simulation of Complex VLIW Instruction Sets using Static Binary Translation and Hardware-Assisted Virtualization
Author	*Mian-Muhammad Hamayun, Frédéric Pétrot, Nicolas Fournel (TIMA Laboratory, CNRS/INP Grenoble/UJF, France)
Page	pp. 576 - 581
Keyword	System Simulation, Static Binary Translation, Hardware-Assisted Virtualization, VLIW
Abstract	We introduce a static binary translation flow in native simulation context for cross-compiled VLIW executables. This approach is interesting in situations where either the source code is not available or the target platform is not supported by any retargetable compilation framework, which is usually the case for VLIW processors. The generated simulators execute on a Hardware-Assisted Virtualization (HAV) based native platform. We have implemented this approach for a TI C6x series processor and our simulation results show a speed-up of around two orders of magnitude compared to the cycle accurate simulators.
Slides

7B-2 (Time: 10:50 - 11:20)

Title	RExCache: Rapid Exploration of Unified Last-level Cache
Author	*Su Myat Min Shwe, Haris Javaid, Sri Parameswaran (University of New South Wales, Australia)
Page	pp. 582 - 587
Keyword	estimator, exploration, cache
Abstract	In this paper, we propose to explore design space of a unified last-level cache to improve system performance and energy efficiency. The challenge is to quickly estimate the execution time and energy consumption of the system with distinct cache configurations using minimal number of slow full-system cycle-accurate simulations. To this end, we propose a novel, simple yet highly accurate execution time estimator and a simple, reasonably accurate energy estimator. Our framework, RExCache, combines a cycle-accurate simulator and a trace-driven cache simulator with our novel execution time estimator and energy estimator to avoid cycle-accurate simulations of all the last-level cache configurations. Once execution time and energy estimates are available from the estimators, RExCache chooses minimum execution time or minimum energy consumption cache configuration. Our experiments with nine different applications from mediabench, and 330 last-level cache configurations show that the execution time and energy estimators had at least average absolute accuracy of 99.74% and 80.31% respectively. RExCache took only a few hours (21 hours for H.264enc) to explore last-level cache configurations compared to several days of traditional method (36 days for H.264enc) and cycle-accurate simulations (257 days for H.264enc), enabling quick exploration of the last-level cache. When 100 different real-time constraints on execution time and energy were used, all the cache configurations found by RExCache were similar to those from cycle-accurate simulations. On the other hand, the traditional method found correct cache configurations for only 69 out of 100 constraints. Thus, RExCache has better absolute accuracy than the traditional method, yet reducing the simulation time by at least 97%.
Slides

7B-3 (Time: 11:20 - 11:50)

Title	An Efficient Hybrid Synchronization Technique for Scalable Multi-Core Instruction Set Simulations
Author	*Bo-Han Zeng, Ren-Song Tsay, Ting-Chi Wang (National Tsing Hua University, Taiwan)
Page	pp. 588 - 593
Keyword	Timing Synchronization, Multi-Core Simulator, Instruction Set Simulator
Abstract	Multi-core system simulation techniques have been essential to system development in recent years. Although these techniques have been studied extensively, we have found that both conventional polling and collaborative timing synchronization approaches encounter a severe scalability issue when the number of target cores is more than that of the host cores. To resolve this issue, we propose an effective hybrid technique that combines the advantage of the two approaches. According to the experimental results, the proposed technique effectively resolves the scalability issue and shows one to four orders of improvement compared to conventional approaches.