ASP-DAC 2026 Full Program

31st Asia and South Pacific Design Automation Conference
January 19-22, 2026

Back to Homepage of ASP-DAC 2026

Tuesday, January 20, 2026
08:05-08:20
Cinderella Ballroom 1/6/7/8
Opening
09:50-10:20
Coffee break
10:20-12:00
10:20-10:45
10:45-11:10
11:10-11:35
11:35-12:00
1A-1
Video-based Visible-Event Cross-modal Person Re-identification for Edge AI Surveillance Systems
1A-2
REDM: Regression-Guided Diffusion Modeling for Universal Soft Sensor Enhancement in Semiconductor Process Control
1A-3
Benchmarking Continual Learning on Netlists with Circuit-Targeted Graph Neural Networks
1A-4
LiveHPS-Lite: A Lightweight LiDAR-based Motion Capture System for Edge Applications
1B-1
ML-driven Design Technology Co-Optimization Framework for Advanced Technology Nodes
1B-2
Standard Cell Layout Generation: Methodological Evolution and Architectural Impacts
1B-3
Fast Timing Library Characterization Through Selective Use of Regression Models
1C-1
HyFault: Targeted Fault Injection Attacks on Hyperdimensional Computing Accelerators
1C-2
PIR-Cache: Mitigating Conflict-Based Cache Side-Channel Attacks via Partial Indirect Replacement
1C-3
An Efficient Defense Method Based on Progressive Fault-Aware Training and JS Divergence-Guided TMR for DNNs against Bit-Flip Attacks
1C-4
X-Matrix Shield: Defeating Tilted FIB and Rerouting Attacks through 3D-Interlaced Protection
1D-1
Fault-tolerant State Preparation for Quantum Error Correction Codes: Leveraging Design Automation
1D-2
Hardware-Efficient Union-Find Decoder Towards Scalable Topological Quantum Codes
1D-3
Reinforcement Learning for Enhanced Advanced QEC Architectures Decoding
1D-4
Quantum Instruction Set Architecture: The Good, the Bad, and the Future
1E-1
Old School Never Die: A Classic Yet Novel Algorithm for Computing RC Current Response in VLSI
1E-2
TargetFuzz: Enabling Directed Graybox Fuzzing via SAT-Guided Seed Generation
1E-3
VeriRAG: A Knowledge Graph-Augmented RAG for Verilog and Assertion Generation
1F-1
Spiking-NeRF: Neural Graphics Acceleration With Spiking Feature Encoding for Edge 3D Rendering
1F-2
FlowQ: Fixed-point Low-precision Post Training Quantization Framework for Efficient and Accurate SNN Inference
1F-3
An Algorithm-Hardware Co-Design for Efficient and Robust Spiking Neural Networks via Sparsity
1F-4
LOKI: a 0.266 pJ/SOP Digital SNN Accelerator with Multi-Cycle Clock-Gated SRAM in 22nm
13:30-15:35
13:30-13:55
13:55-14:20
14:20-14:45
14:45-15:10
15:10-15:35
2A-1
PipeViT: Accelerating Vision Transformers via Intra-Layer Pipelining
2A-2
ConfASR: A Conformer Block Accelerator for Speech Recognition Optimized for Edge Devices
2A-3
LAPA: Log-Domain Prediction-Driven Dynamic Sparsity Accelerator for Transformer Model
2A-4
BitStopper: An Efficient Transformer Attention Accelerator via Stage-fusion and Early Termination
2A-5
JaneEye: A 12-nm 2K-FPS 18.9-μJ/Frame Event-based Eye Tracking Accelerator
2B-1
FET100: Celebrating the Past and Inspiring the Future
2B-2
The FET at 100: Old and Needing Assistance
2B-3
CMOS 2.0: UnFETtering the scaling of CMOS
2B-4
Compact Modeling - A Bridge between Foundry and Circuit Design
2B-5
Nanoelectronic Modeling (NEMO): From Esoteric Quantum Theory to Software that Helps Design Tomorrow’s Atomic-scaled Transistors and Global Impact in nanoHUB
2C-1
Scalable Optimization with GIS-PIM: A Generalized Integer-State Probabilistic Ising Machine
2C-2
A Scalable and High-Quality Qubit Mapping and Shuttling Framework for Neutral Atom Quantum Devices
2C-3
Quantum Oracle Synthesis from HDL Designs via Multi Level Intermediate Representation
2C-4
Survival of the Optimized: An Evolutionary Approach to T-depth Reduction
2C-5
Subgraph-based Qubit Mapping for Noisy Intermediate-Scale Quantum Computing
Sleeping Beauty 1/2
2D - University Design Contest
2D-1
A 100V 86.2% Efficiency Fibonacci-Dickson Hybrid Boost Converter for Acoustic Screen Applications
2D-2
A 5-to-1V DLDO-Hybrid-Sigma Converter Achieving Fast Transient for High-Density Power Delivery
2D-3
Full-Stack System Design and Prototyping for Fully Programmable Electronic-Photonic Neurocomputing
2D-4
Analysis and Design of Oblong Coils and Standard-Cell-Based Receiver for Area-Efficient Edge-Coupled Inductive Coupling Transceiver
2D-5
A RHP-Zero-Free Hybrid Step-Up Converter With 95.1% Peak Efficiency for Fast-Transient Applications
2D-6
A Relaxation Oscillator with 2.93µJ/cycle Energy Efficiency and 0.068% Period Jitter
2D-7
TFLOP: Towards Energy-Efficient LLM Inference: An FPGA-Affinity Accelerator with Unified LUT-based Optimization
2E-1
MF-ECC: Memory-Free Error Correction for Hyperdimensional Computing Edge Accelerators
2E-2
Thermo-NAS: Thermal-resilient ultralow-cost IGZO-based Flexible Neuromorphic Circuits
2E-3
FIawase: A SET Fault Injection Framework Towards Exhaustive System-Level Impact Evaluation
2E-4
MASS: A Masking-aware Search Framework for Reliable QC-LDPC Code Construction in SSDs
2E-5
TIMBER: A Fast Algorithm for Timing and Power Optimization using Multi-bit Flip-flops
2F-1
GENIAL: Generative Design Space Exploration via Network Inversion for Low Power Algorithmic Logic Units
2F-2
REvolution: An Evolutionary Framework for RTL Generation Driven by Large Language Models
2F-3
AC-Refiner: Efficient Arithmetic Circuit Optimization Using Conditional Diffusion Models
2F-4
DeepCut: Structure-Aware GNN Framework for Efficient Cut Timing Prediction in Logic Synthesis
2F-5
Lorecast: Layout-Aware Performance and Power Forecasting from Natural Language
15:35-15:55
Coffee break
15:55-18:00
15:55-16:20
16:20-16:45
16:45-17:10
17:10-17:35
17:35-18:00
3A-1
DeepPiC: xPU-PIM Cluster Architecture with Adaptive Resource-Aware Task Orchestration for DeepSeek-Style MoE Inference
3A-2
MoEA: A Mixture of Experts Accelerator with Direct Token Access and Dynamic Expert Scheduling
3A-3
MoEA: A Mixed-Precision Edge Accelerator for CNN-MSA Models with Fine-Tuning Support
3A-4
Efficient CPU-GPU Collaborative Inference for MoE-based LLMs on Memory-Limited Systems
3A-5
BOA-3DGS: Backward-Striding Optimized Accelerator for Reduced Memory Contention in 3D Gaussian Splatting Training
3B-1
Scalarium: A Unified Scala-based Co-Simulation Framework for Agile Chip Development
3B-2
VFlow: Discovering Optimal Agentic Workflows for Verilog Generation
3B-3
SemanticBBV: A Semantic Signature for Cross-Program Knowledge Reuse in Microarchitecture Simulation
3B-4
CausalTuner: Will Causality Help High-Dimensional EDA Tool Parameter Tuning
3B-5
Synergistic Bayesian Optimization and Reinforcement Learning with Bidirectional Interaction for Efficient VLSI Constraint Tuning
3C-1
Mastering the Exponential Complexity of Exact Physical Simulation of Silicon Dangling Bonds
3C-2
Built-In Self-Test for Locating Leakage Defects on Continuous-Flow Microfluidic Chips
3C-3
Accessible Ratio-Specific Mixing: Single-Pressure-Driven Multi-Reagent Mixer Design and Synthesis for 3D-Printed Microfluidics
3C-4
ENLighten: Lighten the Transformer, Enable Efficient Optical Acceleration
3C-5
DCPPC: Digital Computation in Programmable Photonic Circuits
3D-1
SuperSAGA: A Supervisor-Subordinate Agentic workflow for the Generation of Assertions
3D-2
Understanding and Predicting Vmin Failures in Power Delivery Networks through Multi-Order Droop Signatures
3D-3
AssertMiner: Module-Level Spec Generation and Assertion Mining using Static Analysis Guided LLMs
3D-4
LLM-Assisted Circuit Verification: A Comprehensive Survey
3E-1
HeteroLatch: A CPU-GPU Heterogeneous Latch-Aware Timing Analysis Engine
3E-2
Novel Multi-Corner Delay Padding using Path Relationship Analysis and Dual Decomposition
3E-3
GNN-Based Timing Yield Prediction From Statistical Static Timing Analysis
3E-4
MIMIC: Machine Intelligence for Scalable Generation of Synthetic Timing Cone Datasets
3E-5
Differentiable Tier Assignment for Timing and Congestion-Aware Routing in 3D ICs
3F-1
Automatic Recursion Elimination using Recurrence Relations for Synthesis of Stack-free Hardware
3F-2
FIFOAdvisor: A DSE Framework for Automated FIFO Sizing of High-Level Synthesis Designs
3F-3
HLS-Timer: Fine-Grained Path-Level Timing Estimation for High-Level Synthesis
3F-4
FESTAL: Dataflow Accelerator Synthesis Framework with Graph-Based Fusion for FPGA
3F-5
ARCS: Architecture-Responsive CGRA Scheduling
18:15-20:15
Cinderella Ballroom 2/3/5: ACM SIGDA SRF
Wednesday, January 21, 2026
09:50-10:20
Coffee break
10:20-12:00
10:20-10:45
10:45-11:10
11:10-11:35
11:35-12:00
4A-1
CoLoRA: A Collaborative Scheduling Framework for Multi-Tenant LoRA LLM Inference
4A-2
AutoVeriFix: Automatically Correcting Errors and Enhancing Functional Correctness in LLM-Generated Verilog Code
4A-3
HDLxGraph: Bridging Large Language Models and HDL Repositories via HDL Graph Databases
4A-4
Chat-A^2: An LLM-aided Design Space Exploration Framework for High-Performance CPU Design
4B-1
BitROM: Weight Reload-Free CiROM Architecture Towards Billion-Parameter 1.58-bit LLM Inference
4B-2
BALANCE: Bit and Layer-Aware Lightweight ECC Design Method for In-Flash Computing Based LLM Inference Accelerator
4B-3
BLADE: Boosting LLM Decoding's Communication Efficiency in DRAM-based PIM
4B-4
SpAct-NDP: Efficient LLM Inference via Sparse Activation on NDP-GPU Heterogeneous Architecture
4C-1
Quantifying Compiler-induced Reliability Loss in Software-Implemented Hardware Fault Tolerance
4C-2
WARP: Workload-Aware Reference Prediction for Reliable Multi-Bit FeFET Readout under Charge-Trapping Degradation
4C-3
PV-ReCAM: Process Variation-Aware Testing for ReRAM-based Content Addressable Memory
4D-1
When Posit Meets Microscaling: Energy Efficient Posit-Based Processing Element for Edge AI Computation
4D-2
When Low-Rank Meets Mixed-Precision: A Unified, Training-Free Framework for Efficient LLM Compression
4D-3
Precision-Scalable Microscaling Datapaths with Optimized Reduction Tree for Efficient NPU Integration
4E-1
Understand and Detect: Lithographic Hotspot Detection by the Interpretable Graph Attention Network
4E-2
Beyond Labels: Data-Efficient Wafer Yield Prediction with TabESA
4E-3
Code, Not Canvas: Multi-Agent Layout Generation Beyond Vision Models
4E-4
Integrated Re-Fragmentation and Curve Correction for Curvilinear Optical Proximity Correction
4F-1
MuaLLM: A Multimodal Large Language Model Agent for Circuit Design Assistance with Hybrid Contextual Retrieval-Augmented Generation
4F-2
MOSTAR: Multi-Stage Hierarchical Bayesian Optimization for Substructure-Aware High-Dimensional Analog Circuit Sizing
4F-3
Effective RC Reduction via Graph Sparsification for Accurate Post-Simulation of Mixed-Signal ICs
13:30-15:35
13:30-13:55
13:55-14:20
14:20-14:45
14:45-15:10
15:10-15:35
5A-1
CADC: Crossbar-Aware Dendritic Convolution for Efficient In-memory Computing
5A-2
M3DKV: Monolithic 3D Gain Cell Memory Enabled Efficient KV Cache & Processing
5A-3
MemSearch: An Efficient Memristive In-memory Search Engine with Configurable Similarity Measures
5A-4
SpANNS: Optimizing Approximate Nearest Neighbor Search for Sparse Vectors Using Near Memory Processing
5A-5
NBCache: An Efficient and Scalable Non-Blocking Cache for Coherent Multi-Chiplet Systems
5B-1
A Low-Power 12-lead Arrhythmia Detection SoC Featuring a Reconfigurable CNN and Mixed-Precision Computing
5B-2
A Precision-Scalable Accelerator for Compressive Hyperspectral Image Reconstruction with a Lightweight DUN
5B-3
Enhancing Trustworthiness Using Mixed Precision: Benchmarking, Opportunities and Challenges
5C-1
GEmFuzz: Uncovering System-Level Vulnerabilities in SoCs via Emulation-Based Grey-Box Fuzzing
5C-2
Pack Defender: Proactive Defense Against Packet Attacks in NoCs Using an XGBoost-RNN Model
5C-3
Silentflow: Leveraging Trusted Execution for Resource-Limited MPC via Hardware-Algorithm Co-design
5C-4
ANIMo: Accelerating Nested Isolation with Monitor-free Domain Transition
5C-5
Reinforced Logic-Based Distributed Routing within Isolated Secure Zones
5D-1
Intelligent Chip Design with Agentic AI EDA
5D-2
The Al Transformation of Semiconductor EDA
5D-3
From Algorithmic Optimization to Autonomous Agents: Redefining EDA Workflows with AI
5D-4
From Generation to Verified Synthesis: Bridging Industrial Reality via C-Guided Agents
5D-5
SLEG: A LLM-based SVA Evaluation and Generation System
5E-1
MdpoPlanner: Mask-Driven Floorplan via Reinforcement Learning-Based Placement Order
5E-2
C3PO: Commercial-Quality Global Placement via Coherent, Concurrent Timing, Routability, and Wirelength Optimization
5E-3
A Timing-Driven Hierarchical Macro Placement Framework for Large-Scale Complex IP Blocks
5E-4
Accelerating Electrostatics-based Global Placement with Enhanced FFT Computation
5E-5
Comprehensive Delay-Aware Net Weighting Framework for Timing-Driven Global Placement
5F-1
DCLOG: Don't Cares-based Logic Optimization using Pre-training Graph Neural Networks
5F-2
SOFA-H: Post-Synthesis Area Optimization via Functionally Encoded, Net-Driven Subgraph Mining and SAT-Based Hypercell Remapping
5F-3
Formalization of Rectification Learning for Economic Design Updates
5F-4
PhyMap: A Physically-Aware Incremental Mapping Framework with On-the-fly Post-Layout Critical Path Tracking
5F-5
CombRewriter: Enabling Combinational Logic Simplification in MLIR-Based Hardware Compiler
15:35-15:55
Coffee break
15:55-18:00
15:55-16:20
16:20-16:45
16:45-17:10
17:10-17:35
17:35-18:00
6A-1
SCION: A Comprehensive Simulation Framework for Charge-based In-Memory Computing for Rapid Evaluation of Hardware Non-idealities and DNN Accuracy
6A-2
SABCIM: Self-Adaptive Biasing Scheme for Accurate and Efficient Analog Compute-in-Memory
6A-3
CDACiM: A Charge-Domain Compute-in-Memory Macro for FP/INT MAC Operations with Reconfigurable Capacitor Digital-Analog-Converter
6A-4
Learnable Center-Based Quantization for Efficient Analog PIM with Reduced ADC Precision
6A-5
RL-Guided Thermal-Aware Quantization for Efficient and Robust ReRAM CIM Systems
6B-1
SnipSnap: A Joint Compression Format and Dataflow Co-Optimization Framework for Efficient Sparse LLM Accelerator Design
6B-2
ALMA: Adaptive Co-optimization of Loop-Memory Uneven Mappings and Architectures for DNN Accelerators
6B-3
MARCO: Hardware-Aware Neural Architecture Search for Edge Devices with Multi-Agent Reinforcement Learning and Conformal Filtering
6B-4
NetTLM-DSE: Design Space Exploration for DNN Layer-Pipeline Spatial Mappings
6B-5
BalanceGS: Algorithm-System Co-design for Efficient 3D Gaussian Splatting Training on GPU
6C-1
LumiLock: LUT-based Multi-Key Logic Locking
6C-2
SCPrompt: Semantic Compression and Prompt-Guided LLM Reasoning for RTL Trojan Detection
6C-3
Can Large Language Models Unlock Logic Locking?
6C-4
GALA: An Explainable GNN-based Approach for Enhancing Oracle-Less Logic Locking Attacks Using Functional and Behavioral Features
6C-5
SAND: A Self-supervised and Adaptive NAS-Driven Framework for Hardware Trojan Detection
6D-1
How AI is Supercharging Digital lmplementation
6D-2
Shaping a Full-custom Design Ecosystem with industry-Academia-Research Collaboration
6D-3
AI-driven Analog and Custom Design Solution
6D-4
Exploring the Application of Machine Learning in Accelerating Model and Mask Optimization
6E-1
An Effective Placement Framework for Designs with Half-Row-Extended Cells
6E-2
DUALPlace: Reinforcement Learning based Mixed-size Placement with Multi Modal Cross Attention
6E-3
SMT-Based Optimal Transistor Folding and Placement for Standard Cell Layout Generation
6E-4
Synthesis of CFET Standard Cells Utilizing Backside Interconnects Towards Improving Pin Accessibility
6E-5
Standard Cell Layout Synthesis for Dual-Sided 3D-Stacked Transistors
6F-1
pHNSW: PCA-Based Filtering to Accelerate HNSW Approximate Nearest Neighbor Search
6F-2
Boosting Scalability and Performance: Macro Placement for Flexible 3D-Stacked ML Accelerators
6F-3
AutoCT: Hybrid Compressor Tree Optimization via Reinforcement Learning with Graph Modeling
6F-4
Chiplet-NAS: Chiplet-aware Neural Architecture Search for Efficient AI inference on 2.5D Integration
6F-5
MoBiLE: Efficient Mixture-of-Experts Inference on Consumer GPU with Mixture of Big Little Experts
18:30-20:30
Banquet
Thursday, January 22, 2026
09:50-10:20
Coffee break
10:20-12:00
10:20-10:45
10:45-11:10
11:10-11:35
11:35-12:00
7A-1
StabiFreeze: Early Stopping for Training Binary Neural Networks via Internal Dynamics Stabilization
7A-2
Constrained NAS via Symbolic Expressions in Declarative Hierarchical Search Spaces
7A-3
HERO: Hardware-Efficient RL-based Optimization Framework for NeRF Quantization
7A-4
Gundam: A Generalized Unified Design and Analysis Model for Matrix Multiplication on Edge
7B-1
TTI: An Instruction Set Supporting Priority-Inversion-Free Time-Triggered Preemptive Scheduling in Real-Time Embedded Systems
7B-2
DRAPP: An end-to-end Latency Evaluation Tool for Containerized ROS Applications
7B-3
LaPOD: Latency Prediction for Real-Time LiDAR Object Detection
7B-4
Flowmap Overapproximation for Linear Time-Varying Stochastic Systems
7C-1
Timing-Aware Optimization of Die-Level Routing and TDM Assignment for Multi-FPGA Systems
7C-2
Φ-BO: Physics-Informed Bayesian Optimization for Multi-Port Decoupling Capacitor Placement in 2.5-D Chiplets
7C-3
Automated Parameter Tuning for Multi-FPGA Partitioning: A Preference-Guided Approach
7D-1
Advancing General Sparse Matrix Solvers via Subtree-Based Task Scheduling and Randomized Linear Algebra
7D-2
HeteroSTA: A CPU-GPU Heterogeneous Static Timing Analysis Engine with Holistic Industrial Design Support
7D-3
IR Drop-Aware ECO: A Fast Approach to Minimize Layout and Timing Disturbance
7D-4
iPCL: Pre-training for Chip Layout
7E-1
NEURAL: An Elastic Neuromorphic Architecture with Hybrid Data-Event Execution and On-the-fly Attention Dataflow
7E-2
Mask-based Meta-Learning for Stuck-at Faults Tolerance in ReRAM Computing Systems
7E-3
LMESN: A Leakage-Driven MOSFET Reservoir for Scalable and Ultra-Low-Power Temporal Inference
7E-4
BIHDC: A Retrainable Fully-Binary Hyperdimensional Computing Accelerator for Edge FPGAs
Sleeping Beauty 5
7F (T6-A) RF and Photonic IC
7F-1
MOTIF-RF: Multi-template On-chip Transformer Synthesis Incorporating Frequency-domain Self-transfer Learning for RFIC Design Automation
7F-2
Efficient RF Passive Components Modeling with Bayesian Online Learning and Uncertainty Aware Sampling
7F-3
PrometheusFree: Concurrent Detection of Laser Fault Injection Attacks in Optical Neural Networks
7F-4
BEAM: Bidirectional MEEF-Driven Mask Optimization for Curvilinear Photonic Design
12:00-13:30
Lunch & Invited Talk (Sponsored by infinigence-ai)
13:30-15:35
13:30-13:55
13:55-14:20
14:20-14:45
14:45-15:10
15:10-15:35
8A-1
OAH-CIM: Outlier-Aware Hybrid RRAM-SRAM CIM Accelerator with Variation-Robust Sparsity
8A-2
FlexMem: High-Parallel Near-Memory Architecture for Flexible Dataflow in Fully Homomorphic Encryption
8A-3
Data Flow-Aware Weight Remapping for Efficient Fault Tolerance in ReRAM-Based Accelerators
8A-4
MIREDO: MIP-Driven Resource-Efficient Dataflow Optimization for Computing-in-Memory Accelerator
8A-5
Hy2S-CIM: Hybrid-Cache-LUT FP/INT-CIM with 2-Stage Alignment and Area-efficient LUT for High Precision Vision AI Tasks
8B-1
DOME: A Domain-Orchestrated Multi-GPU Optical Network for Rack-Scale Systems
8B-2
H²NoC: A Hybrid NoC Architecture for FPGAs with Hardened Interconnects
8B-3
CommILP: Synthesizing Communication Infrastructure for Domain Computing Platforms
8B-4
ReadyPower: A Reliable, Interpretable, and Handy Architectural Power Model Based on Analytical Framework
8B-5
A RISC-V CHERI VP: Enabling System-Level Evaluation of the Capability-Based CHERI Architecture
8C-1
EP-HDC: Hyperdimensional Computing with Encrypted Parameters for High-Throughput Privacy-Preserving Inference
8C-2
RTL Verification for Secure Speculation Using Cascaded Two-Phase Information Flow Tracking
8C-3
Exploiting Feature-driven Approximation to Preserve Privacy in Machine Learning based Health Monitoring Systems
8C-4
Safeguarding Neural Network IPs from Scan Chain based Model Extraction Attacks
8C-5
FedBit: Accelerating Privacy-Preserving Federated Learning via Bit-Interleaved Packing and Cross-Layer Co-Design
8D-1
Implementation of Shift-Left Integrated STCO Design Method for Advanced Packaging
8D-2
Integrated Full-custom 3DIC Design Methodology and Flow
8D-3
Al for EDA: Chiplets Simulation New Era
8D-4
Breakthroughs in Chiplet-Based Electromagnetic Designs: Efficient Simulation and Al-Driven Optimization
8D-5
A Shift-Left Design Methodology for Chiplet Architecture: Automation, Multi-Physics Modeling and Converaence
8E-1
DPO-3D: Differentiable Power Delivery Network Optimization via Flexible Modeling for Routability and IR-Drop Tradeoff in Face-to-Face 3D ICs
8E-2
Enhancing Pin Accessibility Through Pin Pattern Migration and Optimization Across Cell Boundaries
8E-3
Topological Optimization-Based Layer Assignment Method for Fan-Out Wafer-Level Packaging
8E-4
Au-MEDAL: Adaptable Grid Router with Metal Edge Detection And Layer Integration
8E-5
GPU-Accelerated Global Routing with Balanced Timing and Congestion Optimization
8F-1
Fine-Grained Parallelization of FHE Workloads in Multi-GPU Systems
8F-2
Exploiting the Irregular Input Sparsity in Systolic Array-based DNN Accelerators via Local Soft Pooling
8F-3
Input Reuse, Weight-Stationary Dataflow and Mapping Strategy for Depthwise Convolution in Computing-in-Memory Neural Network Accelerators
8F-4
A Unified Compute-In-Memory Framework for Multisensory Emotion Recognition
8F-5
Scope: A Scalable Merged Pipeline Framework for Multi-Chip-Module NN Accelerators
15:35-15:55
Coffee break
15:55-18:00
15:55-16:20
16:20-16:45
16:45-17:10
17:10-17:35
17:35-18:00
9A-1
UltraMalloc: Efficient FPGA-based Memory Allocation Framework Optimized for HBM
9A-2
Viper: An ILP-Based Vectorization Framework for Fully Homomorphic Encryption
9A-3
WPU: A Pipelined WebAssembly Processing Unit for Embedded IoT Systems
9A-4
SARA: A Stall-Aware Memory Allocation Strategy for Mixed-Criticality Systems
9A-5
Zone-aware metadata placement in B-tree filesystem
Snow White 2
/
9C-1
Full-Chip Thermal Map Estimation by Multimodal Data Fusion via Denoising Diffusion
9C-2
Graph Attention-Based Current Crowding Analysis at TSV Interfaces in 3D Power Delivery Networks
9C-3
Efficient Simulation of IC Packages with TEC Based on Adaptive Segmented Method and Spatially-Aware Thermal Neural Network
9C-4
Domain Transformation and Decomposition Method for Composable Thermal Modeling and Simulation of Chiplet-Based 2.5D Integrated System
9C-5
ThPA: Thermal Simulation for Advanced ICs
9E-1
DRLPlace: A Deep Reinforcement Learning-based Irregular and High-Density Printed Circuit Board Placement Method
9E-2
CNN-Assisted Low-Power Clock Tree Synthesis for 3D ICs
9E-3
DALI-PD: Diffusion-based Synthetic Layout Heatmap Generation for ML in Physical Design
9E-4
Partitioning-free 3D-IC Floorplanning
9E-5
A Heterogeneous Graph-based Gate Sizer Integrating Graph Attention Network and Transformer
9F-1
Platinum: Path-Adaptable LUT-Based Accelerator Tailored for Low-Bit Weights Matrix Multiplication
9F-2
ROMA: a Read-Only-Memory-based Accelerator for QLoRA-based On-Device LLM
9F-3
Noise-Agnostic One-Shot Training and Retraining for Robust DNN Inferencing on Analog Compute-in-Memory Systems
9F-4
Activation-free Implicit Neural Representation via Finite-State-Machine Based Stochastic Computing
9F-5
dLLM-OPU: An FPGA Overlay Processor for Accelerated Diffusion Large Language Models