ASP-DAC 2015 Technical Program

The 20th Asia and South Pacific Design Automation Conference

Session 9B (Special Session) System-Level Designs and Tools for Multicore Systems
Time: 15:50 - 17:30 Thursday, January 22, 2015
Location: Room 104
Chair: Chung-Ta King (National Tsing Hua University, Taiwan)

9B-1 (Time: 15:50 - 16:15)

Title	(Invited Paper) Heterogeneous Architecture Design with Emerging 3D and Non-Volatile Memory Technologies
Author	Qiaosha Zou, Matthew Poremba (The Pennsylvania State University, U.S.A.), Rui He, Wei Yang, Junfeng Zhao (Huawei Shannon Lab, China), *Yuan Xie (University of California at Santa Barbara, U.S.A.)
Page	pp. 785 - 790
Abstract	In this paper, different perspectives of heterogeneous architecture options will be presented. With 3D die stacking, disparate and heterogeneous technologies can be integrated on the same chip, such as CMOS logic and emerging non-volatile memory, enabling a new paradigm of architecture design. Futhermore, DRAM/NVM heterogeneous memory architecture also combines the benefits from different technologies and shed lights on future hybrid memory architectures. Design tradeoffs will be discussed with preliminary results presented.

9B-2 (Time: 16:15 - 16:40)

Title	(Invited Paper) Alleviate Chip I/O Pin Constraints for Multicore Processors through Optical Interconnects
Author	*Zhehui Wang, Jiang Xu, Peng Yang, Xuan Wang, Zhe Wang, Luan H.K. Duong, Zhifei Wang, Haoran Li, Rafael K.V. Maeda, Xiaowen Wu (Hong Kong University of Science and Technology, Hong Kong), Yaoyao Ye, Qinfen Hao (Huawei Technologies, China)
Page	pp. 791 - 796
Keyword	interconnect, modeling, performance
Abstract	Chip I/O pins are an increasingly limited resource and significantly affect the performance, power and cost of multicore processors. Optical interconnects promise low power and high bandwidth, and are potential alternatives to electrical interconnects. This work systematically developed a set of analytical models for electrical and optical interconnects to study their structures, receiver sensitivities, crosstalk noises, and attenuations. We verified the models by published implementation results. The analytical models quantitatively identified the advantages of optical interconnects in terms of bandwidth, energy consumption, and transmission distance. We showed that optical interconnects can significantly reduce chip pin counts. For example, compared to electrical interconnects, optical interconnects can save at least 92% signal pins when connecting chips more than 25 cm (10 inches) apart.
Slides

9B-3 (Time: 16:40 - 17:05)

Title	(Invited Paper) A Fast and Accurate Network-on-Chip Timing Simulator with a Flit Propagation Model
Author	Ting-Shuo Hsu, Jun-Lin Chiu, Chao-Kai Yu, *Jing-Jia Liou (National Tsing Hua University, Taiwan)
Page	pp. 797 - 802
Keyword	network-on-chip, network-on-chip simulator, wormhole switching, router microarchitecture
Abstract	Network-on-chip (NoC) can be a simulation bottleneck in a many-core system. Traditional cycle-accurate NoC simulators need a long simulation time, as they synchronize all components (routers and FIFOs) every cycle to guarantee the exact behaviors. Also, a NoC simulation does not benefit from transaction-level modeling (TLM) in speed without any accuracy loss, because the transaction timings of a simulated packet depend on other packets due to wormhole switching. In this paper, we propose a novel NoC simulation method which can calculate cycle-accurate timings with wormhole switching. Instead of updating states of routers and FIFOs cycle-by-cycle, we use a pre-built model to calculate a flit's exact times at ports of routers in a NoC. The results of the proposed simulator are verified with NoC implementations (cycle-accurate at RTL) created by a commercial NoC compiler. All timing results match perfectly with packet waveforms generated by above NoCs (with 40--325 times speed up). As another comparison, the speed of the simulator is similar or faster (0.5-23X) than a TG2 NoC model, which is a SystemC and transaction-level model without timing accuracy (due to ignoring wormhole traffics).
Slides

9B-4 (Time: 17:05 - 17:30)

Title	(Invited Paper) Application-Level Embedded Communication Tracer for Many-Core Systems
Author	*Chih-Tsun Huang, Kuan-Chun Tasi, Jun-Shen Lin, Hsiao-Wei Chien (National Tsing Hua University, Taiwan)
Page	pp. 803 - 808
Keyword	Many-Core Systems, Embedded Tracer, Application Level, Debugging
Abstract	Design verification and debugging with both software and hardware is ever challenging for many-core systems. We present the embedded tracer architecture for application-level communication. Not only can the trace information be optimized, but also the verification can be performed at the system level efficiently. The unified architecture consolidates the debugging flow at different abstraction levels, and facilitates the performance analysis of the entire system as well. The use-case study and experiments have justified the effectiveness of the proposed tracer architecture.