Title | Amphisbaena: Modeling Two Orthogonal Ways to Hunt on Heterogeneous Many-Cores |
Author | *Jun Ma (University of Chinese Academy of Sciences; Institute of Computing Technology, Chinese Academy of Sciences, China), Guihai Yan, Yinhe Han, Xiaowei Li (Institute of Computing Technology, Chinese Academy of Sciences, China) |
Page | pp. 394 - 399 |
Keyword | performance modeling, heterogeneous architecture, amphisbaena, scale-out speedup, scale-up speedup |
Abstract | Heterogeneous many-cores can deliver high performance
or energy efficiency. There are two orthogonal ways
to improve performance: 1) scale-out by exploiting thread-level
parallelism, and 2) scale-up by enabling core heterogeneity.
Predicting the performance of such architecture is increasingly
challenging. We propose a comprehensive performance model
Amphisbaena, or Phi, built from two orthogonal functions alpha and
beta. Function alpha describes the scale-out speedup and function beta
handles the scale-up speedup. The Phi model can clearly tell not
only the overall speedup of a given multithreading and core
mapping strategy, but also how to improve the multithreading
and core mapping, hence should be a promising performance
predictor for future heterogenous many-cores. The results show
that Phi model’s error rate is within 12%, which is lower than
state-of-the-art methods. We demonstrate the application of Phi
model by introducing a heuristic scheduling algorithm, which
outperforms the baselines by 13% on average. |
Title | Co-Simulation Framework for Streamlining Microprocessor Development on Standard ASIC Design Flow |
Author | *Tomoyuki Nakabayashi, Tomoyuki Sugiyama, Takahiro Sasaki (Mie University, Japan), Eric Rotenberg (North Carolina State University, U.S.A.), Toshio Kondo (Mie University, Japan) |
Page | pp. 400 - 405 |
Keyword | co-simulation, development environment, ASIC design, microprocessor |
Abstract | In this paper, we present a practical processor co-simulation framework on a standard ASIC design flow.
We propose an off-chip system call emulator, checkpoint mechanism, and cache warming mechanism to streamline design and verification of a processor.
These mechanisms provide a short turnaround time, processor prototyping, and highly accurate evaluation result.
All our proposed approaches can be consistently used not only in RTL simulation but also in gate/transistor simulation and even in chip evaluation with an LSI tester. |
Slides |
Title | Annotation and Analysis Combined Cache Modeling for Native Simulation |
Author | Rongjie Yan (State Key Laboratory of Computer Science, Institute of Software, Chinese Academy of Sciences, China), *De Ma (Institute of Microelectronic CAD, Hangzhou Dianzi University, China), Kai Huang, Xiaoxu Zhang, Siwen Xiu (Institute of VLSI Design, Zhejiang University, China) |
Page | pp. 406 - 411 |
Keyword | cache model, dynamic annotation, static analysis, native simulation |
Abstract | To accelerate the speed of performance estimation and raise its accuracy for MPSoC, we propose a static analysis and dynamic annotation combined method to efficiently model cache mechanism in native simulation. We use a new cache model to analyze segmental profiling results statically to speed up simulation, and take advantage of a dynamic annotation technique to exactly trace the addresses of local variables. Experimental results show the efficiency of the proposed techniques for more accurate system performance estimation. |
Title | A Scorchingly Fast FPGA-Based Precise L1 LRU Cache Simulator |
Author | *Josef Schneider, Jorgen Peddersen, Sri Parameswaran (University of New South Wales, Australia) |
Page | pp. 412 - 417 |
Keyword | Cache simulation, FPGA, LRU |
Abstract | Judicious selection of cache configuration is critical in embedded systems as the cache design can impact power consumption and processor throughput. A large cache increases cache hits but requires more hardware and more power, and will be slower for each access. A smaller cache is more economical and faster per access, but may result in significantly more cache misses resulting in a slower system. For a given application or a class of applications on a given hardware system, the designer can aim to optimise cache configuration through cache simulation. We present here the first multiple cache simulator based on hardware. The FPGA implementation is characterised by a trace consumption rate of 100MHz making our cache simulation core up to 53x faster, for a set of benchmarks, than the fastest software based cache simulator. Our cache simulator can determine the hit rates of 308 cache configurations, of which it can determine the hit rates of 44 simultaneously. |
Slides |