Tutorials
ASP-DAC 2024 offers attendees a set of three-hour intense introductions to specific topics. If you register for tutorials, you have the option to select two out of the eight topics. This year, all tutorials will be held in-person.
- Date: Monday, January 22, 2024 (9:30 — 17:00)
Room 204 | Room 205 | Room 206 | Room 207 | |
9:30 — 12:30 (KST) | Tutorial-1 Tutorial to NeuroSim: A Versatile Benchmark Framework for AI Hardware |
Tutorial-2 Toward Robust Neural Network Computation on Emerging Crossbar-based Hardware and Digital Systems |
Tutorial-3 Morpher: A Compiler and Simulator Framework for CGRA |
Tutorial-4 Machine Learning for Computational Lithography |
14:00 — 17:00 (KST) | Tutorial-5 Low Power Design: Current Practice and Opportunities |
Tutorial-6 Leading the industry, Samsung CXL Technology |
Tutorial-7 Sparse Acceleration for Artificial Intelligence: Progress and Trends |
Tutorial-8 CircuitOps and OpenROAD: Unleashing ML EDA for Research and Education |
Tutorial-1: Monday, January 22, 9:30—12:30 (KST) @ Room 204
Tutorial to NeuroSim: A Versatile Benchmark Framework for AI Hardware
- Speakers:
- Shimeng Yu (Georgia Institute of Technology, USA)
Abstract:
NeuroSim is a widely used open-source simulator to benchmark AI hardware, and it is primarily developed for compute-in-memory (CIM) accelerators, for deep neural network (DNN) inference and training, with hierarchical design options from device-level to circuit-level, and up to algorithm-level. It is timely to hold a tutorial to introduce the research community about new features and recent updates of NeuroSim, including the technology support down to 1 nm node, new modes of operations such as digital CIM (DCIM) and compute-in-3D NAND, as well as the heterogeneous 3D integration of chiplets to support ultra-large AI models. During this tutorial, a real-time demo will also be shown to provide hands-on experiences for the attendees to use/modify the NeuroSim framework to suite their own research purposes.Tutorial target and outline:
- i. Introduction to AI hardware and CIM paradigm
- ii. NeuroSim hardware-level modeling methodologies: area, latency, and energy estimation
- iii. NeuroSim software-level modeling methodologies: supporting quantization and other non-ideal device effects.
- iv. Examples of running inference engine and training accelerators for convolutional neural networks with various technology choices.
- v. NeuroSim extension 1: technology updates to 1 nm node and DCIM support
- vi. NeuroSim extension 2: TPU-like architecture benchmark with novel buffer global buffer memory designs
- vii. NeuroSim extension 3: chiplet based integration for ultra-large-scale transformer model
- viii. NeuroSim extension 4: 3D NAND based CIM for hyperdimensional computing
- ix. Demo of running inference engine benchmarking (DNN+NeuroSim V1.4)
Biographies:
Prof. Shimeng Yu is a full professor of electrical and computer engineering at Georgia Institute of Technology. He received the Ph.D. degree in electrical engineering from Stanford University. Among Prof. Yu’s honors, he was a recipient of NSF CAREER Award in 2016, IEEE Electron Devices Society (EDS) Early Career Award in 2017, ACM Special Interests Group on Design Automation (SIGDA) Outstanding New Faculty Award in 2018, Semiconductor Research Corporation (SRC) Young Faculty Award in 2019, ACM/IEEE Design Automation Conference (DAC) Under-40 Innovators Award in 2020, IEEE Circuits and Systems Society (CASS) Distinguished Lecturer 2021-2022, and IEEE Electron Devices Society (EDS) Distinguished Lecturer 2022-2023. Prof. Yu is active in the service in the EDA community as TPC members of DAC, ICCAD, DATE, etc. Prof. Yu has given multiple short course presentations at IEDM, ISCAS, ESSCIRC. He also gave workshop presentations at DAC and Design Automation Summer School (DASS) in the past.Tutorial-2: Monday, January 22, 9:30—12:30 (KST) @ Room 205
Toward Robust Neural Network Computation on Emerging Crossbar-based Hardware and Digital Systems
- Speaker:
- Yiyu Shi (University of Notre Dame, USA)
Masanori Hashimoto (Kyoto University, Japan)
Abstract:
As a promising alternative to traditional neural network computation platforms, Compute-in-Memory (CiM) neural accelerators based on emerging devices have been intensively studied. These accelerators present an opportunity to overcome memory access bottlenecks. However, they face significant design challenges. The non-ideal conditions resulting from the manufacturing process of these devices induce uncertainties. Consequently, the actual weight values in deployed accelerators may deviate from those trained offline in data centers, leading to performance degradation. The first part of this tutorial will cover:- Efficient worst-case analysis for neural network inference using emerging device-based CiM,
- Enhancement of worst-case performance through noise-injection training,
- Co-design of software and neural architecture specifically for emerging device-based CiMs.
- Identification of vulnerabilities in neural networks,
- Reliability analysis and enhancement of AI accelerators for edge computing,
- Reliability assessment of GPUs against soft errors.
Biography:


Tutorial-3: Monday, January 22, 9:30—12:30 (KST) @ Room 206
Morpher: A Compiler and Simulator Framework for CGRA
- Speakers:
- Tulika Mitra (National University of Singapore, Singapore)
Zhaoying Li (National University of Singapore, Singapore)
Thilini Kaushalya Bandara (National University of Singapore, Singapore)
Abstract:
Coarse-Grained Reconfigurable Architecture (CGRA) provides a promising pathway to scale the performance and energy efficiency of computing systems by accelerating the compute-intensive loop kernels. However, there exists no end-to-end open-source toolchain for CGRA, supporting architectural design space exploration, compilation, simulation, and FPGA emulation for real-world applications. This hands-on tutorial presents Morpher, an open-source end-to-end compilation and simulation framework for CGRA, featuring state-of-the-art mappers, assisting in design space exploration, and enabling application-level testing of CGRA. Morpher can take a real-world application with a compute-intensive kernel and a user-provided CGRA architecture as input, compile the kernel using different mapping methods, and automatically validate the compiled binaries through cycle-accurate simulation using test data extracted from the application. Morpher can handle real-world application kernels without being limited to simple toy kernels through its feature-rich compiler.Morpher architecture description language allows users to easily specify a variety of architectural features such as complex interconnects, multi-hop routing, and memory organizations. Morpher is available online at https://github.com/ecolab-nus/morpher.Biographies:

Mr. Zhaoying Li is currently working toward his Ph.D. degree from the National University of Singapore. His current research interests include reconfigurable architectures and compiler optimizations.
Ms. Thilini Kaushalya Bandara is a fourth-year PhD student at the School of Computing, National University of Singapore. Her research interests include hardware-software co-design of power-efficient reconfigurable architectures and their design space exploration.
Tutorial-4: Monday, January 22, 9:30—12:30 (KST) @ Room 207
Machine Learning for Computational Lithography
- Speaker:
- Yonghwi Kwon (Synopsys, USA)
Haoyu Yang (NVIDIA Research, USA)
Abstract:
As the technology node shrinks, the number of mask layers and pattern density has been increasing exponentially. This has led to a growing need for faster and more accurate mask optimization techniques to achieve high manufacturing yield and faster turn-around-time (TAT). Machine learning has emerged as a promising solution for this challenge, as it can be used to automate and accelerate the mask optimization process. This tutorial will introduce recent studies on using machine learning for computational lithography. We will start with a comprehensive introduction to computational lithography, including its challenges and how machine learning can be applied to address them. We will then present recent research in four key areas:- (1) Mask optimization: This is the most time-consuming step in the resolution enhancement technique (RET) flow. This tutorial compares different approaches for machine learning-based mask optimization, based on features and machine learning model architecture.
- (2) Lithography modeling: An accurate and fast lithography model is essential for every step of the RET flow. Machine learning can be used to develop more accurate and efficient lithography models, by incorporating physical properties into the model and learning from real-world data.
- (3) Sampling and synthesis of test patterns: A comprehensive set of test patterns is needed for efficient modeling and machine learning training. Machine learning can be used to identify effective sampling methods and generate new patterns for better coverage.
- (4) Hotspot prediction and correction: Lithography hotspots can lead to circuit failure. Machine learning can be used to predict hotspots and develop correction methods that can improve the yield of manufactured chips.
Biography:


Tutorial-5: Monday, January 22, 14:00—17:00 (KST) @ Room 204
Low Power Design: Current Practice and Opportunities
- Speaker:
- Gang Qu (University of Maryland, USA)
Abstract:
Power and energy efficiency has been one of the most critical design criteria for the past several decades. This tutorial consists of three parts that will help both academic researchers and industrial practitioners to understand the current state-of-the-art, the new advances, challenges and opportunities related to low power design. After a brief motivation, we will review some of the most popular low power design technologies including dynamic voltage and frequency scaling (DVFS), clock gating and power gating. Then we will cover some of the recent advances such as approximate computing and in-memory computing. Finally, we will share with the audience some of the security pitfalls in implementing these low power methods. This tutorial is designed for graduate students and professionals from industry and government working in the general fields of EDA, embedded systems, and Internet of Things. Previous knowledge on low power design and security are not required.Biography:

Tutorial-6: Monday, January 22, 14:00—17:00 (KST) @ Room 205
Leading the industry, Samsung CXL Technology
- Speaker:
- Jeonghyeon Cho (Samsung Electronics, Republic of Korea)
Jinin So (Samsung Electronics, Republic of Korea)
Kyungsan Kim (Samsung Electronics, Republic of Korea)
Abstract:
1. Leading the industry, Samsung CXL Technology: The rapid development of data-intensive technology has driven an increasing demand for new architectural solutions with scalable, composable, and coherent computing environments. Recent efforts on compute express link (CXL) are a key enabler in accelerating the architecture shift for memory-centric architecture. CXL is an industry standard interconnect protocol for various processors to efficiently expand memory capacity and bandwidth with a memory-semantic protocol. Also, the memory connected with CXL allows the handshaking communication to include processing-near-memory (PNM) engine into the memory. As the foremost leader in the memory solution, Samsung electronics has developed CXL-enabled memory solutions: CXL-MXP and CXL-PNM. CXL-MXP allows flexible memory expansion compared to current DIMM-based memory solution and CXL-PNM is the world first CXL-based PNM solution for GPT inference acceleration. By adopting CXL protocol to memory solutions, our memory solutions will expand the CXL memory ecosystem while strengthening its presence in the next-generation memory solutions market.2. Expanding Memory Boundaries through the CXL-Enabled Devices: With the growth of data volumes in data analytics and machine learning applications, the importance of a memory system performance is becoming increasingly critical. For powerful computation, the memory technology has evolved in both bandwidth and capacity throughput CXL-enabled devices. CXL technology can promote efficiency of various solutions such as CPU/GPU, ML accelerator, and memory. Memory chipmakers focused on CXL’s technological characteristic of supporting memory and accelerator to proceed with a preemptive research and development and, consequently, come up with the lasted integrated solution. In this tutorial, CXL-enabled devices and solutions will be discussed: CXL-based memory expander (MXP) and processing-near-memory (PNM). MXP provides robust disaggregated memory pooling and expansion capability for processors, accelerators to overcome these memory bandwidth and capacity constraints. PNM enables performance several times faster than that of dozen CPU cores in memory intensive computations. Also, CXL protocol’s challenges and various research topics will be discussed as well.
3. Software Challenges of Heterogeneous CXL Compute Pool, and SMDK: CXL is leading a new architecture of heterogeneous compute system with various device types and connectivity topology. The technology complies open-standard connectivity and inherits experiences of conventional memory tiering system. In reality, however, due to the novelty of the CXL devices and architectural deployment, the technology is raising a number of SW challenges across software layers to properly utilize the expanded computing resource pool consisting of CXL memory and accelerator, CXL-MXP and CXP-PNM (Processing Near Memory). The tutorial explains software considerations of the CXL compute pool and SMDK, Samsung's CXL software suite to leverage the pool. SMDK provides a variety of software and functionalities in an optimized way, and has open-sourced for CXL industry and researcher since March 2022.
Biography:



Tutorial-7: Monday, January 22, 14:00—17:00 (KST) @ Room 206
Sparse Acceleration for Artificial Intelligence: Progress and Trends
- Speakers:
- Guohao Dai (Shanghai Jiao Tong University, China)
Xiaoming Chen (Chinese Academy of Sciences, China)
Mingyu Gao (Tsinghua University, China)
Zhenhua Zhu (Tsinghua University, China)
Abstract:
After decades of advancements, artificial intelligence algorithms have become increasingly sophisticated, with sparse computing playing a pivotal role in their evolution. On the one hand, sparsity is an important method to compress neural network models and reduce computational workload. Furthermore, generative algorithms like Large Language Models (LLM) have brought AI into the 2.0 era, and the large computational complexity of LLM makes using sparsity to reduce workload more crucial. On the other hand, for real-world sparse applications like point cloud and graph processing, emerging AI algorithms have been developed to process sparse data. In this tutorial, we will review and summarize the characteristics of sparse computing in AI 1.0 & 2.0. Then, the development trend of sparse computing will also be discussed from the circuit level to the system level. This tutorial includes three parts: (1) From the circuit perspective, emerging Processing-In-Memory (PIM) circuits have demonstrated attractive performance potential compared to von Neumann architectures. We will first explain the opportunities and challenges when deploying irregular sparse computing onto the dense PIM circuits. Then, we will introduce multiple PIM circuits and sparse algorithm co-optimization strategies to improve the PIM computing energy efficiency and reduce the communication latency overhead; (2) From the architecture perspective, this tutorial will first present multiple domain-specific architectures (DSA) for efficient sparse processing, including application-dedicated accelerators and Near Data Processing (NDP) architectures. These DSA architectures achieve performance improvement of one to two orders of magnitude compared to CPU in graph mining, recommendation systems, etc. After that, we will discuss the design idea and envision the future of general sparse processing for various sparsity and sparse operators; (3) From the system perspective, we will introduce the sparse kernel optimization strategies on GPU systems. Based on these studies, an open-source sparse kernel library, i.e., dgSPARSE, will be presented. The dgSPARSE library outperforms commercial libraries on various graph neural network models and sparse operators. Furthermore, we will also discuss the application of the above system-level design methodologies in the optimization of LLM.Biographies:




Tutorial-8: Monday, January 22, 14:00—17:00 (KST) @ Room 207 (*with possible overflow into the 17:00~18:00 time period)
CircuitOps and OpenROAD: Unleashing ML EDA for Research and Education
- Speakers:
- Andrew B. Kahng (University of California San Diego, USA)
Vidya A. Chhabria (Arizona State University, USA)
Bing-Yue Wu (Arizona State University, USA)
Abstract:
This tutorial will first present NVIDIA’s CircuitOps approach to modeling chip data, and generation of chip data using open-source infrastructure of OpenROAD (in particular, OpenDB and OpenSTA, along with Python APIs). The tutorial will highlight how integration of CircuitOps and OpenROAD has created an ML EDA infrastructure which will serve as a playground for users to directly experiment with generative and reinforcement learning-based ML techniques within an open-source EDA tool. Recently developed Python APIs around OpenROAD have allowed the integration of CircuitOps with OpenROAD to both query data from OpenDB and modify the design via ML-algrothim to OpenDB callbacks. As a part of the tutorial, participants will work with OpenROAD’s python interpreter and leverage CircuitOps to (i) represent and query chip data in ML-friendly data formats such as graphs, numpy arrays, pandas dataframes, and images, and (ii) modify circuit netlist information from a simple implementation of a reinforcement learning framework for logic gate sizing. Several detailed examples will show how ML EDA applications can be built on OpenROAD and CircuitOps ML EDA infrastructure. The tutorial will also survey the rapidly-evolving landscape of ML EDA – spanning generative methods, reinforcement learning, and other methods – that build on open design data, data formats and tool APIs. Attendees will receive pointers to optional pre-reading and exercises in case they would like to familiarize themselves with the subject matter before attending the tutorial. The tutorial will be formulated to be as broadly interesting and useful as possible to students, researchers and faculty, and to practicing engineers in both EDA and design.Outline:
- Background: What does an ML EDA infrastructure require? CircuitOps, OpenROAD infrastructure? Data format, APIs, Algorithms.
- Key Python APIs for reading and writing into the database and what types of ML algorithms does the infrastructure enable
- Hands-on-session: Demonstration of the query-based APIs and ML-friendly data formats supported within OpenROAD (images, graphs, dataframes)
- Architecture and details of example ML EDA applications built on CircuitOps + OpenROAD
- Hands-on-session: Demonstration of callback APIs and RL-based gate sizing iterations with CircuitOps + OpenROAD
- The current landscape of ML-EDA, emerging standards, data models/formats, and similar efforts worldwide
Biographies:


