

#### LLSM: LLM-enhanced Logic Synthesis Model with EDA-guided CoT Prompting, Hybrid Embedding and AIG-tailored Acceleration

Shan Huang\*, Jinhao Li\*, Zhen Yu, Jiancai Ye, Jiaming Xu, Ningyi Xu, Guohao Dai \*Equal contribution Shanghai Jiao Tong University Correspondence to: Guohao Dai <daiguohao@sjtu.edu.cn> ASP-DAC 2025

#### Backgrounds and Motivations

Related Works

Challenges and Techniques

• Overview

- EDA-guided CoT Prompting
- Text-Circuit Hybrid Embedding
- EDA-Tailored Acceleration

Experiment Results

Extension Works

# **Electronic Design Automation(EDA)**

EDA refers to the use of EDA software tools to complete the functional design, synthesis, verification, physical design of VLSI chips.

• Key objective: Optimize the Power, Performance, Area(PPA) of the chip.



#### **Importance of logic synthesis**

Logic synthesis is time-consuming (50%) and has high capital cost (55%) in EDA process.



[1] https://eda360insider.wordpress.com/2012/02/27/system-eda-tools-attack-todays-great-bugaboo-for-soc-realization-the-software-development-overhang/

# **Logic Synthesis**

Logic synthesis is iterative in chip design. Predicting synthesis results can reduce iteration overhead.



#### Backgrounds and Motivations

#### ➢Related Works

Challenges and Techniques

- Overview
- EDA-guided CoT Prompting
- Text-Circuit Hybrid Embedding
- EDA-Tailored Acceleration
- Experiment Results

Extension Works

# **GNN-based methods for Logic Synthesis**

GNN model circuits as graphs and extract graph-level features for predicting PPA, but face the inherent problems



[1] Akansha S. Over-squashing in graph neural networks: A comprehensive survey[J]. arXiv preprint arXiv:2308.15568, 2023.

[2] Rusch T K, Bronstein M M, Mishra S. A survey on oversmoothing in graph neural networks[J]. arXiv preprint arXiv:2303.10993, 2023.

#### **Transformer-based methods for Logic Synthesis**

Transformer flats circuit to sequence, but faces scalability problems and cannot be applied to large graphs



[1] Xu, Ceyu, Chris Kjellqvist, and Lisa Wu Wills. "SNS's not a synthesizer: a deep-learning-based synthesis predictor." Proceedings of the 49th Annual International Symposium on Computer Architecture. 2022.

[2] https://www.nvidia.cn/data-center/technologies/blackwell-architecture/

[3] https://www.apple.com.cn/newsroom/2024/05/apple-introduces-m4-chip/

# Backgrounds and MotivationsRelated Works

#### Challenges and Techniques

- Overview
- EDA-guided CoT Prompting
- Text-Circuit Hybrid Embedding
- EDA-Tailored Acceleration
- Experiment Results
- Extension Works

#### **Overview**



# **Technique 1: EDA-guided CoT Prompting**

# Challenge LLMs lack the knowledge to analyze RTL code, and it's expensive to train or fine-tune



[1] Chang, Kaiyan, et al. "Data is all you need: Finetuning Ilms for chip design via an automated design-data augmentation framework." Proceedings of the 61st ACM/IEEE Design Automation Conference. 2024.

[2] Liu, Mingjie, et al. "Chipnemo: Domain-adapted Ilms for chip design." arXiv preprint arXiv:2311.00176 (2023).

# **Technique 1: EDA-guided CoT Prompting**



# **Technique 2: Text-Circuit Hybrid Embedding**

Challenge Closed LLM results in the inability to extract feature embeddings and circuit summary cannot be directly input into downstream models



#### **Technique 2: Text-Circuit Hybrid Embedding**







[1] NVIDIA sparse computing library, https://docs.nvidia.com/cuda/cusparse/index.html





Backgrounds and MotivationsRelated Works

Challenges and Techniques

• Overview

- EDA-guided CoT Prompting
- Text-Circuit Hybrid Embedding
- EDA-Tailored Acceleration

#### Experiment Results

Extension Works

# **Experiment Setup**

#### ≻Setup

- GPU: A100, nvcc 11.8, Pytorch 2.0.1, PyG v2.5.3
- Dataset: OpenABC, 23 IP, 1500 logic synthesis flow
- Baseline:
  - OpenABC
  - LOSTIN
- LM-model
  - Mamba-130m
  - DeBERTa-base
- Training
  - 20 epochs
  - Learning rate(0.1 for LM, 0.01 for GNN)

| IP                                                                                                      | Characteristics of Benchmarks                         |                                                     |                                                           |                                                              |                                                            |                                             |          |                                   |
|---------------------------------------------------------------------------------------------------------|-------------------------------------------------------|-----------------------------------------------------|-----------------------------------------------------------|--------------------------------------------------------------|------------------------------------------------------------|---------------------------------------------|----------|-----------------------------------|
|                                                                                                         | PI                                                    | PO                                                  | Ν                                                         | Е                                                            | I                                                          | D                                           |          | Function                          |
| spi [18]<br>i2c[18]<br>ss_pcm[18]<br>usb_phy[18]<br>sasc[18]<br>wb_dma[18]<br>simple_spi[18]<br>pci[18] | 254<br>177<br>104<br>132<br>135<br>828<br>164<br>3429 | 238<br>128<br>90<br>90<br>125<br>702<br>132<br>3157 | 4219<br>1169<br>462<br>487<br>613<br>4587<br>930<br>19547 | 8676<br>2466<br>896<br>1064<br>1351<br>9876<br>1992<br>42251 | 5524<br>1188<br>434<br>513<br>788<br>4768<br>1084<br>25719 | 35<br>15<br>10<br>10<br>9<br>29<br>12<br>29 |          | Communica<br>tion/Bus<br>Protocol |
| wb_conmax[18]<br>ethernet[18]<br>ac97 ctrl[18]                                                          | 2122<br>10731<br>2339                                 | 2075<br>10422<br>2137                               | 47840<br>67164<br>11464                                   | 97755<br>144750<br>25065                                     | 42138<br>86799<br>14326                                    | 24<br>34                                    |          |                                   |
| mem_ctrl[18]<br>bp_be[19]<br>vga_lcd[18]                                                                | 1187<br>11592                                         | 962<br>8413                                         | 16307<br>82514<br>105334                                  | 37146<br>173441                                              | 18092<br>109608<br>141037                                  | 36<br>86                                    | B<br>Wis | Controller                        |
| des3_area[18]<br>aes[18]<br>sha256[20]<br>aes_xcrypt[21]<br>aes_secworks[22]                            | 303<br>683<br>1943<br>1975<br>3087                    | 64<br>529<br>1042<br>1805<br>2604                   | 4971<br>28925<br>15816<br>45840<br>40778                  | 10006<br>58379<br>32674<br>93485<br>84160                    | 4686<br>20494<br>18459<br>36180<br>45391                   | 30<br>27<br>76<br>43<br>42                  |          | Crpto                             |
| fir 20<br>iir 20<br>jpeg 18<br>idft 20<br>dft 20                                                        |                                                       |                                                     |                                                           | 9467<br>14397<br>234331<br>520523<br>527509                  |                                                            | 43                                          |          | DSP                               |
| tv80[18]<br>tiny_rocket[15]<br>fpu[23]<br>picosoc[23]<br>dynamic_node[15]                               |                                                       | 361<br>4181<br>409<br>10797<br>2575                 | 11328<br>52315<br>29623<br>82945<br>18094                 | 23017<br>108811<br>59655<br>176687<br>38763                  | 11653<br>67410<br>37142<br>107637<br>23377                 | 54<br>80<br>819<br>43<br>33                 |          | Processor                         |

[1] Chowdhury A B, Tan B, Karri R, et al. Openabc-d: A large-scale dataset for machine learning guided integrated circuit synthesis[J]. arXiv preprint arXiv:2110.11292, 2021.

#### **Evaluation Result**



#### **Speedup Result**



AIG-Tailored SpMM kernel achieves an average of 1.74× speedup compared with cuSPARSE An average of end-to-end 1.37× speedup compared with PyG

Backgrounds and MotivationsRelated Works

Challenges and Techniques

• Overview

- EDA-guided CoT Prompting
- Text-Circuit Hybrid Embedding
- EDA-Tailored Acceleration

#### Experiment Results

#### >Extension Works

#### **Extension: AIG-based GAT acceleration**

>Thread workload reallocation and skip redundant computing



**1.54x** average speedup and **46.8%** memory usage reduction over PyG

[1] Zhang, Hengrui, et al. "Understanding gnn computational graph: A coordinated computation, io, and memory perspective." Proceedings of Machine Learning and Systems 4 (2022): 467-484.





#### LLSM: LLM-enhanced Logic Synthesis Model with

#### EDA-guided CoT Prompting, Hybrid Embedding and AIG-tailored Acceleration

Shan Huang, supervised by Prof. Guohao Dai

ironheart@sjtu.edu.cn, daiguohao@sjtu.edu.cn

