ASP-DAC 2014 Technical Program

The 19th Asia and South Pacific Design Automation Conference

Session 5A Simulation and Modeling
Time: 13:50 - 15:30 Wednesday, January 22, 2014
Location: Room 300
Chairs: Atushi Ike (Fujitsu Laboratories, Japan), Yuichi Nakamura (NEC, Japan)

5A-1 (Time: 13:50 - 14:15)

Title	Amphisbaena: Modeling Two Orthogonal Ways to Hunt on Heterogeneous Many-Cores
Author	*Jun Ma (University of Chinese Academy of Sciences; Institute of Computing Technology, Chinese Academy of Sciences, China), Guihai Yan, Yinhe Han, Xiaowei Li (Institute of Computing Technology, Chinese Academy of Sciences, China)
Page	pp. 394 - 399
Keyword	performance modeling, heterogeneous architecture, amphisbaena, scale-out speedup, scale-up speedup
Abstract	Heterogeneous many-cores can deliver high performance or energy efficiency. There are two orthogonal ways to improve performance: 1) scale-out by exploiting thread-level parallelism, and 2) scale-up by enabling core heterogeneity. Predicting the performance of such architecture is increasingly challenging. We propose a comprehensive performance model Amphisbaena, or Phi, built from two orthogonal functions alpha and beta. Function alpha describes the scale-out speedup and function beta handles the scale-up speedup. The Phi model can clearly tell not only the overall speedup of a given multithreading and core mapping strategy, but also how to improve the multithreading and core mapping, hence should be a promising performance predictor for future heterogenous many-cores. The results show that Phi model’s error rate is within 12%, which is lower than state-of-the-art methods. We demonstrate the application of Phi model by introducing a heuristic scheduling algorithm, which outperforms the baselines by 13% on average.

5A-2 (Time: 14:15 - 14:40)

Title	Co-Simulation Framework for Streamlining Microprocessor Development on Standard ASIC Design Flow
Author	*Tomoyuki Nakabayashi, Tomoyuki Sugiyama, Takahiro Sasaki (Mie University, Japan), Eric Rotenberg (North Carolina State University, U.S.A.), Toshio Kondo (Mie University, Japan)
Page	pp. 400 - 405
Keyword	co-simulation, development environment, ASIC design, microprocessor
Abstract	In this paper, we present a practical processor co-simulation framework on a standard ASIC design flow. We propose an off-chip system call emulator, checkpoint mechanism, and cache warming mechanism to streamline design and verification of a processor. These mechanisms provide a short turnaround time, processor prototyping, and highly accurate evaluation result. All our proposed approaches can be consistently used not only in RTL simulation but also in gate/transistor simulation and even in chip evaluation with an LSI tester.
Slides

5A-3 (Time: 14:40 - 15:05)

Title	Annotation and Analysis Combined Cache Modeling for Native Simulation
Author	Rongjie Yan (State Key Laboratory of Computer Science, Institute of Software, Chinese Academy of Sciences, China), *De Ma (Institute of Microelectronic CAD, Hangzhou Dianzi University, China), Kai Huang, Xiaoxu Zhang, Siwen Xiu (Institute of VLSI Design, Zhejiang University, China)
Page	pp. 406 - 411
Keyword	cache model, dynamic annotation, static analysis, native simulation
Abstract	To accelerate the speed of performance estimation and raise its accuracy for MPSoC, we propose a static analysis and dynamic annotation combined method to efficiently model cache mechanism in native simulation. We use a new cache model to analyze segmental profiling results statically to speed up simulation, and take advantage of a dynamic annotation technique to exactly trace the addresses of local variables. Experimental results show the efficiency of the proposed techniques for more accurate system performance estimation.

5A-4 (Time: 15:05 - 15:30)

Title	A Scorchingly Fast FPGA-Based Precise L1 LRU Cache Simulator
Author	*Josef Schneider, Jorgen Peddersen, Sri Parameswaran (University of New South Wales, Australia)
Page	pp. 412 - 417
Keyword	Cache simulation, FPGA, LRU
Abstract	Judicious selection of cache configuration is critical in embedded systems as the cache design can impact power consumption and processor throughput. A large cache increases cache hits but requires more hardware and more power, and will be slower for each access. A smaller cache is more economical and faster per access, but may result in significantly more cache misses resulting in a slower system. For a given application or a class of applications on a given hardware system, the designer can aim to optimise cache configuration through cache simulation. We present here the first multiple cache simulator based on hardware. The FPGA implementation is characterised by a trace consumption rate of 100MHz making our cache simulation core up to 53x faster, for a set of benchmarks, than the fastest software based cache simulator. Our cache simulator can determine the hit rates of 308 cache configurations, of which it can determine the hit rates of 44 simultaneously.
Slides