Shigeru Yamashita (Ritsumeikan University)
He Li (Southeast University)
Zhiding Liang (CUHK)
Robert Wille (Technical University of Munich)

Abstract

As quantum computing transitions from NISQ experimentation to the early fault-tolerant (Early FTQC) era, progress hinges on crosslayer methods that can solve critical design automation challenges. Success in Early FTQC will require (i) reducing expensive non-Clifford resources (T-count/T-depth), (ii) co-designing algorithms and ansätze with problem structure and hardware constraints, and (iii) sustaining low physical error rates through scalable, hardware-aware calibration. This tutorial brings together four complementary perspectives to address these needs. We will explore T-depth-aware decomposition for MCT-intensive oracles, application-driven algorithm/ansatz co-design using contextual subspace strategies, and fine-grained, graph-parallel calibration protocols validated on real devices. Finally, we will present a unifying design-automation (QDA) view that connects today’s tools to the emerging requirements of Early FTQC. Attendees will leave with concrete techniques, open-source pointers, and evaluation checklists to apply immediately in their research and development.

Abstract

Modern processors face a dual challenge: achieving peak performance while ensuring robust security and reliability. With increasing complexity in CPU/GPU/SoC architectures, post-silicon validation has become critical in detecting design flaws, mitigating microarchitectural vulnerabilities, and balancing power-performance tradeoffs. This tutorial provides a practitioner’s perspective, drawing on experiences from AMD and Intel, to bridge the gap between academic research and industrial practice. Topics will include silicon bring-up methodologies, case studies of hardware security vulnerabilities (e.g., speculative execution, side-channel attacks), debug and measurement techniques, and future challenges in secure processor design. Participants will gain insights into practical validation flows, security-hardening strategies, and opportunities for research collaboration with industry.

Opening and Keynote Session I

Opening Ceremony

08:05-08:20 | Tuesday, January 20, 2026 | Cinderella Ballroom 1/6/7/8

Keynote Addresses

08:20-09:50 | Tuesday, January 20, 2026 | Cinderella Ballroom 1/6/7/8

Chenming Hu
TSMC Distinguished Professor Emeritus
University of California, Berkeley

08:20-09:05

Keynote Address

FinFET - from Lab to Foundry to EDA/Fabless

Biography

Chenming Hu is the Emeritus TSMC Chair Professor of UC Berkeley and former CTO of TSMC. He led the creation of the BSIM standard model and the 3D transistor FinFET used in all phones, computers, data centers, and AI chips.
He received the US National Medal of Technology from President Obama and IEEE's highest honor (Medal of Honor). EDA industry's Kaufman Award cited his “tremendous career of creativity and innovation that fueled the past four decades of the semiconductor industry, including its adoption of FinFET.”

Abstract

25 years ago, the keynote speaker of the 2001 ISSCC in San Francisco projected that processor chips would dissipate more heat per area than nuclear reactor cores and rocket engine nozzles in a decade. His projection echoed the 1996 industry concensus of an end to Moore's Law in 2007 "with no known solution" (in ITRS - International Technology Rodmap of Semiconductors).
What was the cause of that chip heating crisis? How did FinFET prevent it from happening? How did FinFET find its way from the research laboratory to fabs, and the EDA/DAC and fabless communities? These and other FinFET stories will be told.

Yiran Chen
John Cocke Distinguished Professor
Duke University

09:05-09:50

Keynote Address

Edge AI: Everything, Everywhere, All at Once

Biography

Dr. Yiran Chen is the John Cocke Distinguished Professor of Electrical and Computer Engineering at Duke University. He serves as the Principal Investigator and Director of the NSF AI Institute for Edge Computing Leveraging Next Generation Networks (Athena) and Co-Director of the Duke Center for Computational Evolutionary Intelligence (DCEI). His research group focuses on innovations in emerging memory and storage systems, machine learning and neuromorphic computing, and edge computing. Dr. Chen has authored or coauthored over 700 publications and holds 96 U.S. patents. His work has received widespread recognition, including two Test-of-Time Awards and 14 Best Paper/Poster Awards. He is the recipient of the IEEE Circuits and Systems Society's Charles A. Desoer Technical Achievement Award and the IEEE Computer Society's Edward J. McCluskey Technical Achievement Award. He also serves as the inaugural Editor-in-Chief of the IEEE Transactions on Circuits and Systems for Artificial Intelligence (TCASAI) and the founding Chair of the IEEE Circuits and Systems Society's Machine Learning Circuits and Systems (MLCAS) Technical Committee. Dr. Chen is a Fellow of the AAAS, ACM, IEEE, and NAI, and a member of the European Academy of Sciences and Arts.

*Yiren Zhu, Junsheng Zhou, Yiming Ren, Hanshu Hezi, Yuexin Ma, Xin Lou (ShanghaiTech University)

Keywords

Human motion capture, LiDAR, Lightweight, minGRU, Edge Devices

Manikanta Prahlad Manda (Sejong University), Seunggyu Lee (Korea Advanced Institute of Science and Technology), *Daijoon Hyun (Sejong University)

Keywords

Library characterization, regression model, table entry, timing parameter

Jiaji He, *Junfeng Cai (Tianjin University), Yaohua Wang (National University of Defense Technology), Fei Zhao (Peking University), Mao Ye (Tianjin University), Yongqiang Lyu (Tsinghua University)

Keywords

Integrated Circuits Security, Active Shield, Dual-Layer Structure, Direction Entropy, Artificial Fish-Swarm Algorithm

*Jianxin Chen (Tsinghua University)

Keywords

quantum instruction set, quantum error correction, fault-tolerant quantum computing, quantum design automation

Jayanth Thangellamudi, Raghul Saravanan, *Sai Manoj Pudukotai Dinakarrao (George Mason University)

Keywords

Hardware Description Language (HDL), Electronic Design Automation (EDA), Large Language Model(LLM), Retrieval-augmented generation(RAG), Verilog, System Verilog Assertions

*Rick Luiken, Lorenzo Pes, Manil Dev Gomony, Sander Stuijk (TU Eindhoven)

Keywords

Spiking Neural Networks, Neuromorphic Computing, Edge Computing

Abstract

Bio-inspired sensors like Dynamic Vision Sensors (DVS) and silicon cochleas are often combined with Spiking Neural Networks (SNNs), enabling efficient, event-driven processing similar to biological sensory systems. To realize the low-power constraints of the edge, the SNN should run on a hardware architecture that can exploit the sparse nature of the spikes. In this paper, we introduce LOKI, a digital architecture for Fully-Connected (FC) SNNs. By using Multi-Cycle Clock-Gated (MCCG) SRAMs, LOKI can operate at 0.59 V, while running at a clock frequency of 667 MHz. At full throughput, LOKI only consumes 0.266 pJ/SOP. We evaluate LOKI on both the Neuromorphic MNIST (N-MNIST) and the Keyword Spotting (KWS) tasks, achieving 98.0% accuracy at 119.8 nJ/inference and 93.0 % accuracy at 546.5 nJ/inference respectively.

Luncheon Talk I

12:30-13:15 | Tuesday, January 20, 2026 | Cinderella Ballroom 1/6/7/8

Patrick Groeneveld
Senior Fellow at AMD
Adjunct Professor, Stanford University

12:30-13:15

Luncheon Talk

When Moore Surpasses Mind: The Impact of 6 decades of Relentless Design Automation

Biography

Dr. Patrick Groeneveld is Senior Fellow at AMD and adjunct lecturer in Stanford University's Department of Electrical Engineering. With an extensive career in Electronic Design Automation, he has held roles at both Cadence and Synopsys and served as Chief Technologist at Magma Design Automation, where he contributed to the development of a pioneering RTL-to-GDS2 synthesis tool. Patrick has also worked with AI hardware startups and held a Full Professorship in Electrical Engineering at Eindhoven University. He is the Finance Chair on the Executive Committee of the Design Automation Conference. Patrick earned his MSc and PhD degrees from Delft University of Technology in the Netherlands.

*Tao Han, Ang Li (Delft University of Technology), Qinyu Chen (Leiden University), Chang Gao (Delft University of Technology)

Keywords

Eye Tracking, Extended Reality, ASIC, Deep Neural Network

Abstract

Eye tracking has become a key technology for gaze-based interactions in Extended Reality (XR). However, conventional frame-based eye-tracking systems often fall short of XR's stringent requirements for high accuracy, low latency, and energy efficiency. Event cameras present a compelling alternative, offering ultra-high temporal resolution and low power consumption. In this paper, we present JaneEye, an energy-efficient event-based eye-tracking hardware accelerator designed specifically for wearable devices, leveraging sparse, high-temporal-resolution event data. We introduce an ultra-lightweight neural network architecture featuring a novel ConvJANET layer, which simplifies the traditional ConvLSTM by retaining only the forget gate, thereby halving computational complexity without sacrificing temporal modeling capability. Our proposed model achieves high accuracy with a pixel error of 2.45 on the 3ET+ dataset, using only 17.6K parameters, with up to 1250 Hz event frame rate. To further enhance hardware efficiency, we employ custom linear approximations of activation functions (hardsigmoid and hardtanh) and fixed-point quantization. Through software-hardware co-design, our 12-nm ASIC implementation operates at 400 MHz, delivering an end-to-end latency of 0.5 ms (equivalent to 2000 Frames Per Second (FPS)) at an energy efficiency of 18.9 μJ/frame. JaneEye sets a new benchmark in low-power, high-performance eye-tracking solutions suitable for integration into next-generation XR wearables.

Session 2B

CEDA-EDS Special Session: 100 Years of the FET: From Technology Foundations to the EDA Ecosystem

13:30-15:35 | Tuesday, January 20, 2026 | Snow White 2
Chair(s):
Yu [Kevin] Cao (University of Minnesota)

2B-1

13:30-13:55

FET100: Celebrating the Past and Inspiring the Future

*Bin Zhao (Jr. Past President, IEEE Electron Devices Society)

Abstract

The field-effect transistor (FET) stands as one of the most consequential inventions in the history of modern electronics. From Julius Lilienfeld’s pioneering patents filed in 1925 and 1926 to today’s highly integrated nanoscale devices, the FET has enabled decades of transformative advances in electronics, computing, and communications. At the CEDA–EDS Special Session, this talk marks the FET100 milestone—100 Years of the FET—by reflecting on the origins and evolution of FET technologies, their central role in the transition from vacuum tubes to solid-state electronics, and their foundational impact on integrated circuits. The talk highlights how continued innovations in device architectures, materials, and 3D integration increasingly demand closer interactions between device technology, circuit design, and electronic design automation (EDA). EDA plays a critical role in translating FET innovations into scalable, manufacturable systems and in bridging device technology with circuit and system design. To further advance AI, energy-efficient computing, and emerging applications, FET100 serves not only as a celebration of the past, but also as an opportunity to strengthen the connection between FET technology foundations and the future EDA ecosystem.

2B-2

13:55-14:20

The FET at 100: Old and Needing Assistance

*Greg Yeric

Abstract

This special session celebrates the 100th anniversary of J.E. Lilienfeld's patent of the Field Effect Transistor. Lilianfeld did not make a working FET, due to lack of materials quality as well as the understanding of semiconductor physics in 1926. He did live to see the first working FET, but the success and impact of his invention 100 years on would have been unfathomable to him. In 2026, the FET is in some ways a victim of its own success. The FET is used at such a scale that it contributes to appreciable amounts of global energy demand, and it has unlocked such value in fields such as communications and artificial intelligence that there is no end in sight to a continued increase in demand for more FETs. Yet it has scaled to such extreme nanoscale dimensions that it is no longer the controlling factor in addressing energy use. This talk will address challenges for the FET going forward, the needs and opportunities to replace it with something else, and challenges faced to bring any new technology up to the task of challenging the mighty FET.

2B-3

14:20-14:45

CMOS 2.0: UnFETtering the Scaling of CMOS

*Julien Ryckaert (IMEC)

Abstract

The Field Effect Transistor has long been a cornerstone of VLSI scaling. Indeed, together with interconnect scaling, they enabled a long series of technology generations that exponentially enhanced computing performance. Staring the late 2000 it has then been supported by Design and System-Technology Co-Optimization. This evolution has shifted the focus from pure dimensional scaling to re-engineering the device architecture, better balancing current drive and device capacitance under geometric scaling. VLSI has more recently been supported by system-technology boosters that offered an efficient scaling at the level of the SoC. These innovations include Backside technology and die stacking in 3D. Moving forward, in order to keep supporting the high demand in compute scaling we will need to double down on the heterogeneity offered by 3D and backside processing. This transition from VLSI to Heterogeneous Large Scale Integration (HLSI) will require profound readjustments of the semiconductor ecosystem. Built on a more intimate optimization between geometric scaling and the assembly of various technologies, it not only requires innovation in process and material, but more importantly in system architecture and design enablement. Indeed, it will require breaking the SoC paradigm by better capitalizing on two properties offered by 3D technologies: heterogeneity and volumetric optimization.

2B-4

14:45-15:10

Compact Modeling - A Bridge between Foundry and Circuit Design

*Yogesh Singh Chauhan (IIT, Kanpur)

Abstract

Compact Models have been the backbone of IC design. In this talk, I will discuss the history and future of compact models.

2B-5

15:10-15:35

Nanoelectronic Modeling (NEMO): From Esoteric Quantum Theory to Software that Helps Design Tomorrow's Atomic-scaled Transistors and Global Impact in nanoHUB

*Gerhard Klimeck (Purdue University)

*Junyi Gao, Wei-Hsiang Tseng, Yao-Wen Chang (National Taiwan University)

Keywords

Noisy intermediate-scale quantum (NISQ) computing, Qubit mapping, Graph matching, Graph isomorphism

*Zongwu Wang (Shanghai Jiao Tong University), Zhongyi Tang (Shanghai Qizhi Institute), Fangxin Liu, Chenyang Guan, Li Jiang, Haibing Guan (Shanghai Jiao Tong University)

Keywords

LLM, FPGA, Accelerator, Production Quantization

Aditya Das Sarma (University of Wisconsin at Madison), *Shui Jiang (The Chinese University of Hong Kong), Wan Luan Lee (University of Wisconsin at Madison), Tsung-Yi Ho (The Chinese University of Hong Kong), Tsung-Wei Huang (University of Wisconsin at Madison)

Keywords

power optimization, multi-bit flip-flop banking and debanking, ICCAD CAD Contest

Runzhi Wang, Prianka Sengupta, Cristhian Roman Vicharra (Texas A&M University), *Yiran Chen (Duke University), Jiang Hu (Texas A&M University)

Keywords

LLM-assisted circuit design prediction, electronic design automation, performance and power estimation

*Hyukjun Kweon (Department of Semiconductor Convergence Engineering SungKyunKwan University), Jongyeop Kim (Department of Electrical and Computer Engineering Sungkyunkwan University), Jeongwoo Park (Sungkyunkwan University)

Keywords

Neural Rendering, 3D Gaussian Splatting, Real-time Rendering, Scene Reconstruction, SLAM, Augmented Reality, Virtual Reality, Differentiable Rasterization, High-throughput Backpropagation

*Jiayi Tu (Southeast University), Jindong Tu (The Chinese University of Hong Kong, Shenzhen), Yuxuan Cai, Yi Zhang, Meng Zhang (Southeast University), Tinghuan Chen (The Chinese University of Hong Kong, Shenzhen)

Keywords

Design space exploration, Co-Adaptive Optimization, Closed-loop interaction, Automatic constraint tuning

*Jun-Wei Liang, Iris Hui-Ru Jiang, Kai-Hsiang Chiu (National Taiwan University)

Keywords

Programmable photonic circuits, Digital computation, Boolean logic

Hongduo Liu, Yuntao Lu, Mingjun Wang, Xufeng Yao, *Bei Yu (The Chinese University of Hong Kong)

Keywords

LLM-Assisted Circuit Verification

*Yuan-Hsiang Lu, Hao-Hsiang Hsiao (Georgia Institute of Technology), Yi-Chen Lu, Haoxing Ren (NVIDIA), Sung Kyu Lim (Georgia Institute of Technology)

Keywords

3D ICs, Congestion, Concurrent optimization, Metal layer sharing, Tier assignment

*Omar Ragheb (Fujitsu Consulting (Canada) Inc., University of Toronto), Jason Anderson (University of Toronto)

Keywords

CGRAs, Mapping, Scheduling

Abstract

Scheduling is a key aspect in mapping applications to coarse-grained reconfigurable architectures (CGRA). During scheduling, the number of pipeline registers on each path is determined to ensure that the data for an operation arrives at the correct cycle. Traditional scheduling methods, such as as soon as possible (ASAP) and as late as possible (ALAP), determine the pipeline registers without considering the placement of operations. This can restrict the mapping algorithm, forcing placement to accommodate scheduling and limiting routing options to satisfy both scheduling and placement. To overcome the limitations of traditional schedulers, we propose a method that adaptively adjusts the schedule during the initial routing phase. In this approach, the mapping algorithm begins with an ASAP schedule to establish initial schedule constraints. We then utilize simulated annealing for placement, and we employ a architecture-responsive scheduling algorithm post-placement to update the schedule of each edge based on the placement and generate an initial routing solution with overlaps. Afterwards, the PathFinder algorithm is applied to the initial routing solution, along with the generated schedule, to find a valid routing with no overlaps. Our results demonstrate that the architecture-responsive scheduling approach maintains a quality comparable to that of conventional ASAP scheduling. Furthermore, architecture-responsive scheduling enables generic mapping of applications onto restricted architectures that do not allow routes to bypass pipeline registers, a challenge that traditional schedulers do not address.

Keynote Session II

Keynote Addresses

08:20-09:50 | Wednesday, January 21, 2026 | Cinderella Ballroom 1/6/7/8

Yuan Xie
Fang Professor of Engineering
Chair Professor, Department of Electronic and Computer Engineering
The Hong Kong University of Science and Technology

08:20-09:05

Keynote Address

Déjà Vu: From 3D to Chiplet and PIM/NDP — A Historical Perspective

Biography

Dr. Yuan Xie is currently with The Hong Kong University of Science and Technology as Chair Professor in ECE department and FANG Professor of Engineering. He received B.S degree in Electronic Engineering from Tsinghua University and Ph.D. degree in Computer Engineering from Princeton University. Before joining HKUST, he has rich industry experience. He was with Alibaba DAMO Academy and T-Head Semiconductor, a Professor at the University of California, Santa Barbara (UCSB), a Professor at Pennsylvania State University, with AMD Research and was with IBM Microelectronics as an Advisory Engineer. Yuan Xie is a Fellow of IEEE, ACM, and AAAS, and a recipient of many awards.

Abstract

In this talk, the speaker reflects on a career journey marked by exploration at the intersection of technology and architecture, transitioning between academia and industry. The discussion highlights how technological advancements can drive architectural innovation, while architectural choices, in turn, influence the adoption and evolution of new technologies. Drawing from personal experience, the speaker offers a historical perspective on the dynamic interplay between 3D integration, chiplet-based design, and Processing-in-Memory (PIM) / Near-Data Processing (NDP) paradigms.

Charles Alpert
Cadence AI Fellow
Cadence Design Systems, Inc.

09:05-09:50

Keynote Address

Harnessing Agentic AI to Accelerate Designer Productivity

Biography

Dr. Charles (Chuck) Alpert is Cadence's AI Fellow and drives cross-functional Agentic AI solutions throughout Cadence's software stack. Prior to this, has led various pioneering teams in digital implementation, including Global Routing, Clock Tree Synthesis, Genus Synthesis, and Cerebrus AI. Charles has published over 100 papers and received over 100 patents in the EDA space. He is a Cadence Master inventor. He has served as Deputy-Editor-in-Chief for IEEE Transactions on Computer-Aided Design, chaired the IEEE/ACM Design Automation Conference, and earned IEEE Fellow. He received a B.S. and B.A. Degree from Stanford University and a Ph.D. in Computer Science from UCLA.

*Zhantong Zhu, Kangbo Bai, Tianyu Jia (Peking University)

Keywords

Design space exploration, High-performance CPU, LLM-aided methodology, CPU microarchitecture

*Jiaming Xu (Shanghai Jiao Tong University), Tongxin Xie (Tsinghua University), Yongkang Zhou, Jinhao Li, Yaoxiu Lian (Shanghai Jiao Tong University), Zhenhua Zhu, Yu Wang (Tsinghua University), Guohao Dai (Shanghai Jiao Tong University)

Keywords

Near-data Processing, LLM, GPU

*Haneen G. Hezayyin, Mahta Mayahinia, Mehdi Tahoori (Karlsruhe Institute of Technology)

Keywords

Computation-in-Memory, Non-volatile memories, Redox-based RAM, Process variations, Testing

Stef Cuyckens, Xiaoling Yi, Robin Geens, Joren Dumoulin, Martin Wiesner, *Chao Fang, Marian Verhelst (ESAT-MICAS, KU Leuven)

Keywords

neural processing unit

*Seohyun Kim, Shilong Zhang, Junha Jang, Youngsoo Shin (Korea Advanced Institute of Science and Technology)

Keywords

Curvilinear OPC, Mask fragmentation, Bayesian optimization

*Yibin Zhang, Zhiqiang Liu (Tsinghua University), Shan Shen (Nanjing University of Science and Technology), Chao Hu (EXCEEDA Inc.), Wenjian Yu (Tsinghua University)

Keywords

Mixed-Signal IC, Circuit Simulation, RC Reduction, Effective Resistance, Graph Spectral Sparsification

Abstract

The circuit simulation after parasitic extraction is very crucial for designing the sensitive mixed-signal integrated circuits (ICs). In order to speed up this time-consuming post-simulation task, fast and effective RC reduction technique is demanded. In this work, we propose a node elimination plus graph sparsification framework to realize the RC reduction for the mixed-signal ICs. The proposed method combines the Time-Constant Equilibration Reduction (TICER) algorithm, the efficient graph spectral sparsification and a novel graph sparsification technique keeping the most important off-tree edges, to separately sparsify capacitance and resistance networks in circuit. Experiments on realistic mixed-signal circuits have validated the efficiency and effectiveness of the proposed techniques, demonstrating remarkable advantage over existing methods. The results show that the proposed method can bring up to nearly 7X speedup to the post-simulation while ensuring good accuracy of the critical performance metrics.

Invited Talk I

12:30-12:50 | Wednesday, January 21, 2026 | Cinderella Ballroom 1/6/7/8

Shulin Zeng
General Manager of Shanghai Company, Infinigence-AI

12:30-12:50

Invited Talk

The Creativity Revolution in the Age of Intelligent Agents

Biography

Dr. Shulin Zeng is a founding member of Infinigence-AI and serves as General Manager of Shanghai company, leading the intelligent terminal business. He focuses on hardware-software co-optimization and AI accelerator design.
He received his B.Eng. (2018) and Ph.D. (2023) degrees from Tsinghua University, under Prof. Yu Wang. His first-author work won the FPGA 2025 Best Paper Award, marking the first Asia-Pacific team to receive this honor.
In industry, he led the development of an edge inference optimization engine achieving 3x end-to-end acceleration of large models on AI PCs, planned for deployment on 10M+ devices. He also proposed the world's first multimodal LPU IP, enabling single-FPGA inference of 7B models and text-to-video generation, delivering 4-6x energy-efficiency gains.

Tianqi Zhang, Flavio Ponzina, *Tajana Rosing (UCSD)

Keywords

Approximate Nearest Neighbor Search, Near Memory Processing, CXL

Abstract

Approximate Nearest Neighbor Search (ANNS) is a fundamental operation in vector databases, enabling efficient similarity search in high-dimensional spaces. While dense ANNS has been optimized using specialized hardware accelerators, sparse ANNS remains limited by CPU-based implementations, hindering scalability. We propose SpANNS, a near-memory processing architecture for sparse ANNS. SpANNS combines a hybrid inverted index with efficient query management and runtime optimizations, achieving 15.2x to 21.6x faster execution over the state-of-the-art CPU baselines, offering scalable and efficient solutions for sparse vector search.

5A-5

15:10-15:35

Paper ID 2869

NBCache: An Efficient and Scalable Non-Blocking Cache for Coherent Multi-Chiplet Systems

*Zhirong Ye (Sun Yat-sen University), Yongchang Zhang (Tsinghua University), Peilin Wang, Tao Lu (Sun Yat-sen University), Zhaolin Li (Tsinghua University), Zhiyi Yu, Mingyu Wang (Sun Yat-sen University)

Keywords

Multi-chiplet, Memory-level parallelism, Non-blocking cache, Directory-based coherence, MSHR

Guanxi Lu, Hao (Mark) Chen, Zhiqiang Que, Wayne Luk, *Hongxiang Fan (Imperial College London)

Keywords

large language models, model quantization, mixed precision, model compression, natural language processing, low-bit inference, post-training quantizationn

*Yogesh Verma, Mattis Hasler (barkhausen institut), Sebastian Haas, Friedrich Pauls (Barkhausen institut)

Keywords

Logic-Based Distributed Routing, Secure Zone Routing, NoC Security, Trustworthy Design, Tiled Architecture

*Tao Lin (Xepic. Inc.)

Biography

Tao Lin holds the position of R&D manager in Xepic. Inc., He received a Ph.D. degree from the University of Cincinnati, currently focused on formal methods and AI applications in formal verifications.

*Lixin Chen, Keyu Peng, Jinghui Zhou, Hao Gu, Wei Fu (Southeast University), Shuting Cai (Guangdong University of Technology), Ziran Zhu (Southeast University)

Keywords

Net Weighting, Timing-Driven Global Placement, Timing Optimization, RC-Based Delay Estimation

*Haisheng Zheng (Shanghai AI Laboratory), Zhuolun He, Shuo Yin (The Chinese University of Hong Kong), Yuzhe Ma (The Hong Kong University of Science and Technology (Guangzhou)), Bei Yu (The Chinese University of Hong Kong)

Keywords

Combinational Logic Simplification, Hardware Compiler, Logic Synthesis

*Lihua An, Jiayi Li, Pingqiang Zhou (Shanghaitech University)

Keywords

Neural Network, Computing in Memory, Thermal Effect, Reinforcement Learning, Quantization, ReRAM

*Junyi Wu, Jiaming Xu, Jinhao Li, Yongkang Zhou, Jiayi Pan, Xingyang Li, Guohao Dai (Shanghai Jiao Tong University)

Keywords

Algorithm, system, efficiency

*Zhixin Pan (Florida State University), Ziyu Shu (Stony Brook University), Linh Nguyen, Amberbir Alemayoh (Florida State University)

Keywords

Hardware Trojan, Machine Learning, Self-supervised Learning, Neural Architecture Search

*Xiaodong Meng (Amedac)

Biography

Meng Xiaodong is currently the Vice President of Advanced Manufacturing EDA Co., Ltd. He graduated from the Department of Physics of Tsinghua University and holds a master's degree from the Institute of Physics at the Chinese Academy of Sciences. As an expert in computational lithography, he has over 20 years of extensive experience in its R&D and application. Mr. Meng Xiaodong joined Semiconductor Manufacturing International Corporation (SMIC) in 2001 and held positions such as lithography process engineer and senior process integration engineer. In 2005, he joined Synopsys (Shanghai) and held positions such as senior engineer, project manager, senior manager, and the head of OPC in China, managing the OPC, precise lithography simulation, and mask data preparation for customers in the Asia-Pacific region. In 2019, Mr. Meng Xiaodong joined Advanced Manufacturing EDA Co.,Ltd. as the vice president of industrial promotion, responsible for promoting the deep integration of industry, academia, and research, facilitating the transformation of scientific and technological achievements, and building a cooperation platform between enterprises and universities and research institutions.

*Kairong Guo, Haoran Lu, Rui Guo, Jiarui Wang, Chunyuan Zhao, Heng Wu, Runsheng Wang, Yibo Lin (Peking University)

Keywords

standard cell, layout synthesis, transistor-level placement and routing

*Yushu Zhao, Yubin Qin, Yang Wang, Xiaolong Yang, Huiming Han, Shaojun Wei, Yang Hu, Shouyi Yin (Tsinghua University)

Keywords

Large language model, Mixture-of-experts, Offloading, Algorithm-system co-design

Abstract

Mixture-of-Experts (MoE) models have recently demonstrated exceptional performance across a diverse range of applications. The principle of sparse activation in MoE models facilitates an offloading strategy, wherein active experts are maintained in GPU HBM, while inactive experts are stored in CPU DRAM. The efficacy of this approach, however, is fundamentally constrained by the limited bandwidth of the CPU-GPU interconnect. To mitigate this bottleneck, existing approaches have employed prefetching to accelerate MoE inference. These methods attempt to predict and prefetch the required experts using specially trained modules. Nevertheless, such techniques are often encumbered by significant training overhead and have shown diminished effectiveness on recent MoE models with fine-grained expert segmentation. In this paper, we propose MoBiLE, a plug-and-play offloading-based MoE inference framework with mixture of big-little experts. It reduces the number of experts for unimportant tokens to half for acceleration while maintaining full experts for important tokens to guarantee model quality. Further, a dedicated fallback and prefetching mechanism is designed for switching between little and big experts to improve memory efficiency. We evaluate MoBiLE on four typical modern MoE architectures and challenging generative tasks. Our results show that MoBiLE achieves a speedup of 1.60x to 1.72x compared to the baseline on a consumer GPU system, with negligible degradation in accuracy.

Keynote Session III

Keynote Addresses

08:20-09:50 | Thursday, January 22, 2026 | Cinderella Ballroom 1/6/7/8

Jim Chang
TSMC Academician/Deputy Director
3DIC Design Methodology Development, TSMC

08:20-09:05

Keynote Address

Unlocking Hyper-Scale AI: Navigating the Future of 3DIC Design Solutions

Biography

Dr. Jim Chang leads the 3DIC design methodology development efforts in TSMC. With over two decades of semiconductor experience, Jim is a recognized expert in synthesis, physical optimization, detailed routing, and timing analysis. Prior to his current role, he spearheaded TSMC's design flow development and EDA certification program for advanced 7nm to 3nm technology nodes. His extensive background also includes R&D leadership positions at prominent EDA companies such as Plato, Cadence, Extreme DA, and Synopsys. Dr. Chang holds a Ph.D. in Electrical and Computer Engineering from the University of California, Santa Barbara.

Abstract

The era of hyper-scale AI demands a radical rethinking of 3DIC design, a paradigm shift unlocking unprecedented opportunities for architectural innovation for superior system performance. Yet, this explosion of possibility brings an exponential surge in design complexity, challenging even the most seasoned engineers.
This presentation will delve into the forefront of this revolution. We begin with a review of the TSMC 3DFabric™ family of solutions, specifically engineered to power the most advanced AI systems on the market today. We will then pivot to a comprehensive exploration of the critical 3DIC design challenges that emerge at this bleeding edge: from intricate 3D integration and feasibility assessment to robust implementation, power integrity, physical verification, thermal analysis, and substrate design optimization.
Join us to discover how a cohesive suite of solutions is forming the foundation for designing the AI systems of tomorrow - systems that will redefine what's possible. This is your essential guide to conquering complexity and harnessing the full potential of 3DIC for the next generation of intelligent machines.

Takefumi Miyoshi
Director at e-trees.Japan, Inc.
Adjunct Professor, The University of Osaka
Founder, QuEL, Inc.

09:05-09:50

Keynote Address

Design and Implementation of Control System for Quantum Computers

Biography

Dr. Takefumi Miyoshi received his Ph.D. from the Interdisciplinary Graduate School of Science and Engineering at Tokyo Institute of Technology in 2007. He is a Director at e-trees.Japan, Inc. and an Adjunct Professor at the Center for Quantum Information and Quantum Biology at The University of Osaka. Dr. Miyoshi is also one of the founders of QuEL, Inc., where he works as the CTO. His research interests include reconfigurable system, computer architecture, compiler, and quantum computing.

Quan Cheng, Haoyuan Li, Weirong Dong (Kyoto University), Mingqiang Huang, Longyang Lin (Southern University of Science and Technology), *Masanori Hashimoto (Kyoto University)

Keywords

Resource Analysis, PE array modeling, AI accelerator configuration, Matrix Multiplication

*Xin Chen (University of New Mexico)

Keywords

Formal verification, Reachability analysis, Time-varying system, Stochastic differential equation

*Yutao Dai (Beihang University), Shengbo Tong, Chunyan Pei (Tsinghua University), Zhuohua Liu (Beihang University), Wei Xing (University of Sheffield), Yi Liu (Beihang University), Rui Wang (Beihang University), Wenjian Yu (Tsinghua University)

Keywords

Preference Bayesian Optimization, Multi-FPGA partitioning, Parameter tuning, Design automation

*Xingquan Li, Weiguo Li (Pengcheng Laboratory), Xinhua Lai (University of Chinese Academy of Sciences), Junfeng Liu (Pengcheng Laboratory), Rui Wang (Shenzhen University)

Keywords

Chip layout, pre-training, foundation model, symbol representation, placement and routing

*Changzhen Han, Ke Chen (Nanjing University of Aeronautics and Astronautics), Bi Wu (Nanjing University of Aeronautics and Astronautics (NUAA)), Chenggang Yan, Weiqiang Liu (Nanjing University of Aeronautics and Astronautics)

Keywords

Hyperdimensional Computing, Image Classification, Edge Computing, Hardware Acceleration

*Xiaoxiao Liang, Yang Luo (HKUST(GZ)), Bei Yu (The Chinese University of Hong Kong), Yuzhe Ma (The Hong Kong University of Science and Technology (Guangzhou))

Keywords

Photonic integrated circuit, Curvilinear photonic design, Computational lithography, Optical proximity correction

*Xiang Li (Tsinghua Shenzhen International Graduate School), Wenbin Jia (Tsinghua University), Sheng Zhang (Tsinghua Shenzhen International Graduate School), Yongpan Liu (Tsinghua University)

Keywords

Computing in memory, Vision AI, LUT, Cache

*Manfred Schlägl, Andreas Hinterdorfer (Johannes Kepler University Linz), Daniel Große (Johannes Kepler University Linz & DFKI Bremen)

Keywords

Virtual Prototyping, System-level Evaluation, SystemC, CHERI, RISC-V

Xiangchen Meng, *Yangdi Lyu (The Hong Kong University of Science and Technology (Guangzhou))

Keywords

Federated Learning, Bit-Interleaved Packing, Homomorphic Encryption, Cross-Layer Co-Design

*Chen Wu (BTD Technology/EIT)

Biography

Chen Wu, got his PhD from UCLA. He is now an assistant researcher at Ningbo Institute of Digital Twin, Eastern Institute of Technology, and the director of Chiplet team at BTD. His research interests include AI Chips with architecture, instruction set and compiler designs, and EDA for Chiplets, with left-shift design methodologies, AI-driven multi-physics evaluation and early-stage design space exploration for chiplet-based design. He has published 30 papers, got one best paper reward, two best paper nominations, and 10 patents.

*Jinghui Zhou, Fuxing Huang, Lixin Chen, Xinglin Zheng, Ziran Zhu (Southeast University)

Keywords

global routing, timing driven, pattern routing, GPU acceleration

*Zongle Huang, Hongyang Jia (Tsinghua University), Kaiwei Zou (Capital Normal University), Yongpan Liu (Tsinghua University)

Keywords

NN inference, Multi-Chip-Module, Scheduling, Design space exploration

*Ming-Feng Wei (National Taiwan University), Yun-Chih Chen (National Tsing Hua University), Yuan-Hao Chang, Tei-Wei Kuo (National Taiwan University)

Keywords

ZNS SSDs, Zoned Namespaces, copy-on-write file systems, B-tree filesystem

Bo-Wen Chen, *Yong-Han Lin, Chien-Yu Lin, Yu-Min Lee (National Yang Ming Chiao Tung University)

Keywords

Thermal simulation, Finite difference method, Algebraic multigrid method, Advanced integrated circuits

Abstract

With the continuous scaling of advanced integrated circuits, thermal management has emerged as a critical design challenge due to increasing power densities and structural complexity. To ensure system reliability, especially during the sign-off stage, thermal simulation tools must provide both high spatial resolution and computational efficiency. In this work, we present ThPA, a high-performance thermal simulation framework based on finite-difference discretization and an efficient fine-grained iterative solver. Specifically, ThPA leverages an aggregation-based algebraic multigrid (AgAMG) preconditioned conjugate gradient (PCG) solver, which incorporates a novel double pairing aggregation strategy to reduce AgAMG setup overhead and accelerates convergence using Krylov-subspace-based multigrid cycles as preconditioner. Experimental results demonstrate that ThPA achieves up to a 46x speedup over commercial solvers, while maintaining a mean absolute temperature difference below 0.08 ℃ and a root-mean-square temperature difference under 0.11 ℃. These results validate the effectiveness of ThPA as a fast and accurate simulator in advanced IC designs.

Session 9D

Panel: AI Accelerators at a Crossroads: Who Will Power the Next Decade of AI?

15:55-18:00 | Thursday, January 22, 2026 | Sleeping Beauty 1/2
Moderator:
Li Jiang (Shanghai Jiao Tong University)

Panelist

*Jinmo Ahn, Jinoh Cho, Jaemin Seo, Jaeseung Lee, Jakang Lee, Seokhyeong Kang (Pohang University of Science and Technology)

Keywords

Gate Sizing, AI-EDA, Post Placement Optimization

*Yangbo Wei, Shaoqiang Lu (Shanghai Jiao Tong University), Junhong Qian (Southeast University), Lei He (Eastern institute of Technology, Ningbo), Qin Dongge, Xiao Shi (Southeast University), Chen Wu (Ningbo Institute of Digital Twin, Eastern Institute of Technology, Ningbo), Linfeng Zhang (Shanghai Jiao Tong University)

Keywords

dLLM, LLaDA, FPGA Accelerator

Abstract

Large Language Models (LLMs) are achieving unprecedented performance across diverse tasks, benefiting from autoregressive generation. However, this left-to-right decoding paradigm inherently limits contextual understanding quality. Diffusion-based LLMs (dLLMs) offer a promising alternative by iteratively refining sequences via denoising, enabling stronger bidirectional context modeling and improved generation quality. However, dLLMs face two main challenges: redundant computation and memory overhead in multi-step denoising, and excessive inference cost from over-denoising under fixed-step schedules. To address these issues, we propose dLLM-OPU, an FPGA overlay processor to accelerate dLLMs. Our solution features two key innovations: (1) a Region-Adaptive Caching for Dynamic Column Sparsity Framework that exploits temporal locality for selective recomputation without model retraining, and (2) a Token Entropy-based Early Stopping strategy that dynamically terminates the denoising process based on token-level convergence metrics. We implement these innovations through a specialized sparse processing element (PE) array that maximizes top-k sparsity utilization by minimizing idle cycles via row-column concatenation, complemented by an efficient cache management system that reduces memory access latency and a flexible entropy-based decoding unit.Implemented on a U200 FPGA, dLLM-OPU achieves 2.2x-5.1x speedup and 7.6x-20.3x energy efficiency over RTX4090 in LLaDA.

Tutorial I

Yiyu Shi (University of Notre Dame)

Xiaoming Chen (Institute of Computing Technology, Chinese Academy of Sciences) Jianlei Yang (Beihang University) Zhenhua Zhu (Tsinghua University)

Shigeru Yamashita (Ritsumeikan University) He Li (Southeast University) Zhiding Liang (CUHK) Robert Wille (Technical University of Munich)

Tutorial II

Chaojian Li (The Hong Kong University of Science and Technology) Zhongzhi Yu (NVIDIA Research) Zhiyao Xie (The Hong Kong University of Science and Technology)

Yun (Eric) Liang (Peking University) Youwei Xiao (Peking University) Yuyang Zou (Peking University)

Ravi Monani (Senior System Design Engineer, AMD; former Intel)

Opening and Keynote Session I

FinFET - from Lab to Foundry to EDA/Fabless

Edge AI: Everything, Everywhere, All at Once

Session 1A

Video-based Visible-Event Cross-modal Person Re-identification for Edge AI Surveillance Systems

REDM: Regression-Guided Diffusion Modeling for Universal Soft Sensor Enhancement in Semiconductor Process Control

Benchmarking Continual Learning on Netlists with Circuit-Targeted Graph Neural Networks

LiveHPS-Lite: A Lightweight LiDAR-based Motion Capture System for Edge Applications

Session 1B

ML-driven Design Technology Co-Optimization Framework for Advanced Technology Nodes

Standard Cell Layout Generation: Methodological Evolution and Architectural Impacts

Fast Timing Library Characterization Through Selective Use of Regression Models

Session 1C

HyFault: Targeted Fault Injection Attacks on Hyperdimensional Computing Accelerators

PIR-Cache: Mitigating Conflict-Based Cache Side-Channel Attacks via Partial Indirect Replacement

An Efficient Defense Method Based on Progressive Fault-Aware Training and JS Divergence-Guided TMR for DNNs against Bit-Flip Attacks

X-Matrix Shield: Defeating Tilted FIB and Rerouting Attacks through 3D-Interlaced Protection

Session 1D

Fault-tolerant State Preparation for Quantum Error Correction Codes: Leveraging Design Automation

Hardware-Efficient Union-Find Decoder Towards Scalable Topological Quantum Codes

Reinforcement Learning for Enhanced Advanced QEC Architectures Decoding

Quantum Instruction Set Architecture: The Good, the Bad, and the Future

Session 1E

Old School Never Die: A Classic Yet Novel Algorithm for Computing RC Current Response in VLSI

TargetFuzz: Enabling Directed Graybox Fuzzing via SAT-Guided Seed Generation

VeriRAG: A Knowledge Graph-Augmented RAG for Verilog and Assertion Generation

Session 1F

Spiking-NeRF: Neural Graphics Acceleration With Spiking Feature Encoding for Edge 3D Rendering

FlowQ: Fixed-point Low-precision Post-Training Quantization Framework for Efficient and Accurate SNN Inference

An Algorithm-Hardware Co-Design for Efficient and Robust Spiking Neural Networks via Sparsity

LOKI: a 0.266 pJ/SOP Digital SNN Accelerator with Multi-Cycle Clock-Gated SRAM in 22nm

Luncheon Talk I

When Moore Surpasses Mind: The Impact of 6 decades of Relentless Design Automation

Session 2A

PipeViT: Accelerating Vision Transformers via Intra-Layer Pipelining

ConfASR: A Conformer Block Accelerator for Speech Recognition Optimized for Edge Devices

LAPA: Log-Domain Prediction-Driven Dynamic Sparsity Accelerator for Transformer Model

BitStopper: An Efficient Transformer Attention Accelerator via Stage-fusion and Early Termination

JaneEye: A 12-nm 2K-FPS 18.9-μJ/Frame Event-based Eye Tracking Accelerator

Session 2B

FET100: Celebrating the Past and Inspiring the Future

The FET at 100: Old and Needing Assistance

CMOS 2.0: UnFETtering the Scaling of CMOS

Compact Modeling - A Bridge between Foundry and Circuit Design

Nanoelectronic Modeling (NEMO): From Esoteric Quantum Theory to Software that Helps Design Tomorrow's Atomic-scaled Transistors and Global Impact in nanoHUB

Session 2C

Scalable Optimization with GIS-PIM: A Generalized Integer-State Probabilistic Ising Machine

A Scalable and High-Quality Qubit Mapping and Shuttling Framework for Neutral Atom Quantum Devices

Quantum Oracle Synthesis from HDL Designs via Multi Level Intermediate Representation

Survival of the Optimized: An Evolutionary Approach to T-depth Reduction

Subgraph-based Qubit Mapping for Noisy Intermediate-Scale Quantum Computing

Session 2D

A 100V 86.2% Efficiency Fibonacci-Dickson Hybrid Boost Converter for Acoustic Screen Applications

A 5-to-1V DLDO-Hybrid-Sigma Converter Achieving Fast Transient for High-Density Power Delivery

Full-Stack System Design and Prototyping for Fully Programmable Electronic-Photonic Neurocomputing

Analysis and Design of Oblong Coils and Standard-Cell-Based Receiver for Area-Efficient Edge-Coupled Inductive Coupling Transceiver

A RHP-Zero-Free Hybrid Step-Up Converter With 95.1% Peak Efficiency for Fast-Transient Applications

A Relaxation Oscillator with 2.93µJ/cycle Energy Efficiency and 0.068% Period Jitter

TFLOP: Towards Energy-Efficient LLM Inference: An FPGA-Affinity Accelerator with Unified LUT-based Optimization

Session 2E

MF-ECC: Memory-Free Error Correction for Hyperdimensional Computing Edge Accelerators

Thermo-NAS: Thermal-resilient ultralow-cost IGZO-based Flexible Neuromorphic Circuits

FIawase: A SET Fault Injection Framework Towards Exhaustive System-Level Impact Evaluation

MASS: A Masking-aware Search Framework for Reliable QC-LDPC Code Construction in SSDs

TIMBER: A Fast Algorithm for Timing and Power Optimization using Multi-bit Flip-flops

Session 2F

GENIAL: Generative Design Space Exploration via Network Inversion for Low Power Algorithmic Logic Units

REvolution: An Evolutionary Framework for RTL Generation driven by Large Language Models

AC-Refiner: Efficient Arithmetic Circuit Optimization Using Conditional Diffusion Models

DeepCut: Structure-Aware GNN Framework for Efficient Cut Timing Prediction in Logic Synthesis

Lorecast: Layout-Aware Performance and Power Forecasting from Natural Language

Session 3A

Xiaoming Chen (Institute of Computing Technology, Chinese Academy of Sciences)
Jianlei Yang (Beihang University)
Zhenhua Zhu (Tsinghua University)

Shigeru Yamashita (Ritsumeikan University)
He Li (Southeast University)
Zhiding Liang (CUHK)
Robert Wille (Technical University of Munich)

Chaojian Li (The Hong Kong University of Science and Technology)
Zhongzhi Yu (NVIDIA Research)
Zhiyao Xie (The Hong Kong University of Science and Technology)

Yun (Eric) Liang (Peking University)
Youwei Xiao (Peking University)
Yuyang Zou (Peking University)