# MetRex: A Benchmark for Verilog Code Metric Reasoning using LLMs

Manar Abdelatty, Jingxiao Ma, Sherief Reda

Brown University, Providence, RI



## Early PPA Estimation Of Verilog Designs

• **Motivation**: Provide designer with early feedback on the quality (power, performance, area) of their designs by avoiding expensive synthesis time.



| RTL Style 3                                                                                                                                                                                                                                                                                                                                                               | RTL Style 4                                                                                                                                                                                                                                                                                                                                                               |
|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| <pre>1 module CarryLookAheadAdder ( 2 input [7:0] A, input [7:0] B, input cin 3 output [7:0] Sum, output cout 4 ); 6 for (i = 0; i &lt; 8; i = i + 1) begin 7 assign g[i] = a[i] &amp; b[i]; 8 assign p[i] = a[i] ^ b[i]; 9 end 10 assign c[1] = g[0]   (p[0] &amp; c[0]); 11 12 assign c[1] = g[0]   (p[0] &amp; c[0]); 13 assign s = a ^ b ^ c[7:0]; 14 endmodule</pre> | <pre>1 module CarryLookAheadAdder ( 2 input [7:0] A, input [7:0] B, input cir 3 output [7:0] Sum, output cout 4 ); 6 for (i = 0; i &lt; 8; i = i + 1) begin 7 assign g[i] = a[i] &amp; b[i]; 8 assign p[i] = a[i] ^ b[i]; 9 end 10 assign c[1] = g[0]   (p[0] &amp; c[0]); 11 12 assign c[1] = g[0]   (p[0] &amp; c[0]); 13 assign s = a ^ b ^ c[7:0]; 14 defadule </pre> |
| 14 endmodule<br>15 endmodule                                                                                                                                                                                                                                                                                                                                              | 14 endmodule<br>8 assign cout = carry[8];<br>9 endmodule                                                                                                                                                                                                                                                                                                                  |

**Faster Design Cycles** 



## **Previous Work: Metric Estimation Using Machine Learning**





[1] P. Sengupta, et al. "How Good Is Your Verilog RTL Code? A Qauick Answer from Machine Learning," 2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD), San Diego, CA, USA.

[2] Fang, Wenji, et al. "MasterRTL: A Pre-Synthesis PPA Estimation Framework for Any RTL Design." 2023 IEEE/ACM International Conference on Computer Aided Design (ICCAD). IEEE, 2023.

## **Previous Work: Metric Estimation Using Machine Learning**

- → Intermediate Formats: Have to convert RTL code to intermediary format like Abstract Syntax Tree (ASTs) or Simple Operator Graphs (SoG).
- → Manual Feature Extraction: Extract manually engineered features from the intermediate format; extracted features constitute the input to the ML model.

## What LLMs Could Offer ?

Process RTL code directly (a <u>lossless</u> representation):

- Eliminate the need for manual feature extraction and conversion into intermediary format.
- LLM can autonomously extract relevant patterns from RTL code.
- Provide a potentially faster and insightful analysis.



### Introducing MetRex

• LLM-based Verilog code metric reasoning

## How effectively can Large Language Models (LLMs) reason about post-synthesis PPA metrics of Verilog Designs ?

## **Dataset Collection & Cleaning**

- Syntax errors and Synthesis warning were automatically fixed by LLM.
- All 25,868 Designs are syntax-error free & *synthesizable*.

| Source                                    | Designs <sup>1</sup><br>(Count) | Complexity (Code Length)<br>{Min, Median, Max} |
|-------------------------------------------|---------------------------------|------------------------------------------------|
| RTL-Coder <sup>2</sup> [21]               | 18,450                          | {3, 29, 918}<br>{5, 69, 27,025}                |
| VeriGen [5]<br>ISCAS'89 [22]              | 7,292<br>29                     | {53, 530, 54,778}                              |
| ISCAS'85 [22]<br>OpenCores [23]           | 10<br>54                        | $\{17, 1225, 3925\}\ \{1, 103, 2716\}$         |
| <u>NVLDA [24]</u>                         | 33                              | {19, 1333, 42,051}                             |
| MetRex (ours) 25,868 (Train<br>138 (Test) |                                 | {3, 35, 54,778}                                |
|                                           |                                 |                                                |

Table 1. HDL design sources.

#### <u>25,868</u> Designs Syntax-Error Free & Synthesizable



[1] Liu, Shang, et al. "Rtlcoder: Outperforming gpt-3.5 in design rtl generation with our open-source dataset and lightweight solution." 2024 IEEE LLM Aided Design Workshop (LAD). IEEE, 2024.

[2] Thakur, Shailja, et al. "Verigen: A large language model for verilog code generation." ACM Transactions on Design Automation of Electronic Systems 29.3 (2024): 1-31.

## **MetRex Dataset**

- 25,868 Verilog Designs
- Synthesized using Skywater 130nm and TSMC 65nm
- Annotated with their post-synthesis metrics on:
  - Area
  - Delay
  - Static Power



## Chain Of Thoughts (CoT) Prompting

Bridge the gap between the final metrics and input RTL code



Wei, Jason, et al. "Chain-of-thought prompting elicits reasoning in large language models." Advances in neural information processing systems 35 (2022): 24824-24837.

## MetRex Dataset

- 25,868 Verilog Designs
- Annotated with their post-synthesis metrics on:
  - Area
  - Delay
  - Static Power

# RTL 1 module adder ( 2 input a, b, cin, 3 output cout, sum 4 ); 5 assign sum = a ^ b ^ cin; 6 assign cout = a & b | a & cin | b & cin; 7 endmodule

#### Synthesis Thought

After synthesis, this design has 1 maj3\_1, 1 xor3\_1.

#### Area Thought

Area of **maj3\_1** is **10.01**. Area of **xor3\_1** is **23.77**. In total, we can compute 1\*10.01 + 1\*23.77 = 33.78. Thus, total area is **33.78**.

#### **Delay Thought**

The critical path goes through the following gates: **xor3\_1** with a fanout of 2 and a load capacitance of **0.01**. Delay of **xor3\_1** is **0.34**. In total, we can compute **0.34** = **0.34**. Thus, total delay is **0.34**.

#### Static Power Thought

Leakage power of **maj3\_1** is **2.09**. Leakage power of **xor3\_1** is **15.22**. In total, we can compute  $1^*2.09 + 1^*15.22 = 17.31$ . Thus, the total static power is **17.31**.





## **Experimental Results**

• Instruction Tuning



## **Experimental Setup: Evaluation Set**

- Evaluation Set: Derived from the VerilogEval benchmark\*
- Synthesis Tool:
  - Yosys
  - Sky130nm Tech.

**Table 2.** Test dataset derived from the VerilogEval benchmark [3], categorized by difficulty level.

| Difficult       | y Description                                                                                              | #                               | Gate Count<br>{Min, Med, Max}                                                                                                         |
|-----------------|------------------------------------------------------------------------------------------------------------|---------------------------------|---------------------------------------------------------------------------------------------------------------------------------------|
| Level-1<br>(L1) | Basic logic gates<br>Multi-bit gates<br>1-bit comb. circuits<br><b>Total #</b>                             | 10<br>9<br>5<br><b>23</b>       |                                                                                                                                       |
| Level-2<br>(L2) | Adder circuits<br>Multi-bit comb. circuits<br>Flip-Flop registers<br>Basic Seq. circuits<br><b>Total #</b> | 4<br>23<br>14<br>2<br><b>43</b> | $ \{2, 6, 15\} \\ \{1, 3, 11\} \\ \{1, 3, 24\} \\ \{8, 8, 8\} \\ \{1, 3, 24\} $                                                       |
| Level-3<br>(L3) | Finite state machines<br>Counters<br>Complex comb. logic<br>Advanced Seq. circuits<br><b>Total #</b>       | 24<br>9<br>29<br>9<br><b>72</b> | $\begin{array}{l} \{3,11,57\} \\ \{10,14,48\} \\ \{1,7,580\} \\ \{11,67,607\} \\ \{\textbf{1,}\textbf{14},\textbf{607}\} \end{array}$ |

\*Liu, Mingjie, et al. "Verilogeval: Evaluating large language models for verilog code generation." 2023 IEEE/ACM International Conference on Computer Aided Design (ICCAD). IEEE, 2023.

## Experimental Setup: acc@k





## Results

|                        | X Not fine tuned ✓ Finetuned with CoT |                     |                    | Percentage of designs with MRE<br>less-than threshold value t |        |                 |        |        |                       |        |        |        |
|------------------------|---------------------------------------|---------------------|--------------------|---------------------------------------------------------------|--------|-----------------|--------|--------|-----------------------|--------|--------|--------|
| Margin (t) Model Finet |                                       |                     | <b>Finetuned</b> ? | tuned ?Area (acc@k) ↑                                         |        | Delay (acc@k) ↑ |        |        | Static Power (acc@k)↑ |        |        |        |
| MRE                    | $MRE \le t$                           |                     | <u> </u>           | acc@1                                                         | acc@5  | acc@10          | acc@1  | acc@5  | acc@10                | acc@1  | acc@5  | acc@10 |
|                        |                                       | Mixtral-MetRex-8x7b | ×                  | 19.6%                                                         | 19.6%  | 21.0%           | 23.9%  | 22.5%  | 22.5%                 | 20.3%  | 18.8%  | 19.6%  |
| 10%                    | _                                     |                     | $\checkmark$       | 43.5%                                                         | 46.4%  | 45.7%           | 42.8%  | 45.7%  | 45.7%                 | 39.1%  | 39.9%  | 38.4%  |
|                        | 7                                     |                     | Δ                  | +23.9%                                                        | +26.8% | +24.6%          | +18.8% | +23.2% | +23.2%                | +18.8% | +21.0% | +18.8% |
| 1                      | //                                    | LLama3-MetRex-8b    | ×                  | 17.4%                                                         | 18.8%  | 18.1%           | 20.3%  | 23.9%  | 22.5%                 | 15.9%  | 15.2%  | 15.2%  |
|                        |                                       |                     | $\checkmark$       | 58.0%                                                         | 58.0%  | 58.7%           | 47.8%  | 47.1%  | 47.8%                 | 42.0%  | 42.8%  | 41.3%  |
|                        |                                       |                     | Δ                  | +40.6%                                                        | +39.1% | +40.6%          | +27.5% | +23.2% | +25.4%                | +26.1% | +27.5% | +26.1% |
| -                      |                                       |                     | ×                  | 25.4%                                                         | 26.1%  | 26.1%           | 31.9%  | 29.0%  | 29.7%                 | 25.4%  | 26.1%  | 28.3%  |
|                        | _                                     | Mixtral-MetRex-8x7b | $\checkmark$       | 58.0%                                                         | 61.6%  | 60.9%           | 50.7%  | 53.6%  | 55.8%                 | 53.6%  | 54.3%  | 52.2%  |
| 20%                    | 7                                     |                     | Δ                  | +32.6%                                                        | +35.5% | +34.8%          | +18.8% | +24.6% | +26.1%                | +28.3% | +28.3% | +23.9% |
|                        | /0                                    |                     | х                  | 22.5%                                                         | 23.9%  | 22.5%           | 25.4%  | 28.3%  | 28.3%                 | 22.5%  | 21.7%  | 21.0%  |
|                        |                                       | LLama3-MetRex-8b    | $\checkmark$       | 73.2%                                                         | 76.1%  | 74.6%           | 61.6%  | 64.5%  | 63.8%                 | 52.2%  | 49.3%  | 47.1%  |
|                        |                                       |                     | Δ                  | +50.7%                                                        | +52.2% | +52.2%          | +36.2% | +36.2% | +35.5%                | +29.7% | +27.5% | +26.1% |

#### MRE error less than t

- → Finetuning improves results **37.0%**, **25.3%**, **25.7%** on area, delay, static power respectively.
- → Best performing models achieves **73.2%**, **61.6%**, **and 52.2%** on area, delay, static power respectively within MRE threshold of **20%**.

## **Architectural Choices Evaluation**

|                                                                                                                                                                                                                                           | <b>User Instruction</b><br>Estimate area for the given RTL design. Reason about the<br>number and type of gates that would be present after synthesis.                                                                                                                                                                                                                    |                                                                                       | Reason about the number and                                                                                                                                 |  |  |
|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------|--|--|
| User Instruction<br>Estimate area for the given RTL design.<br>type of gates that would be present aft<br>1 module BasicAdder (<br>2 input [7:0] A, input [7:0] B,<br>3 output [7:0] Sum<br>4 );<br>5 assign Sum = A + B;<br>15 endmodule | <pre>1 module CarryLookAheadAdder ( 2     input [7:0] A, input [7:0] B, input cin 3     output [7:0] Sum, output cout 4 ); 6     for (i = 0; i &lt; 8; i = i + 1) begin 7        assign g[i] = a[i] &amp; b[i]; 8        assign p[i] = a[i] ^ b[i]; 9     end 10     assign c[1] = g[0]   (p[0] &amp; c[0]); 11     12     assign c[1] = g[0]   (p[0] &amp; c[0]); </pre> | [7:0] B, in<br>put cout<br>adder ()<br>_adder_c0 (<br>_adder_c1 (<br>;<br>m_high_c1 : | <pre>resent after synthesis. [7:0] B, input cin put cout adder (); _adder_c0 (); _adder_c1 (); ; n_high_c1 : sum_high_c0; put_high_c1 : cout_high_c0;</pre> |  |  |
| LLM Answer                                                                                                                                                                                                                                | <pre>13 assign s = a ^ b ^ c[7:0]; 14 endmodule</pre>                                                                                                                                                                                                                                                                                                                     |                                                                                       |                                                                                                                                                             |  |  |
| After synthesis, this design has 6<br>nor2_1, 1 o21ai_0, 13 xnor2_1, 2 x<br>10.01. Area of nand2_1 is 3.75<br>compute 6*10.01 + 2*3.75 + = 20<br>Thus, the total area is <b>207.71</b>                                                    |                                                                                                                                                                                                                                                                                                                                                                           | 2_0, 2 cl<br>_1, 1 o21<br>xor2_1. A                                                   | a21oi_1, 1 a221oi_1, 1<br>kinv_1, 1 maj3_1, 9<br>ai_0, 1 o311a_1, 1 o31a_1,<br>area of a21oi_1 is 5.0.<br>n total, we can compute                           |  |  |
|                                                                                                                                                                                                                                           | total, we can compute 8*10.01 = 230.24 Thus, the total area is 230.24                                                                                                                                                                                                                                                                                                     | 66.48                                                                                 | Ground Truth: 265.23                                                                                                                                        |  |  |
|                                                                                                                                                                                                                                           | 0.24                                                                                                                                                                                                                                                                                                                                                                      |                                                                                       |                                                                                                                                                             |  |  |

## Comparison to Regression Based Models: acc@1



- → LLM outperform under tight error margins 5% and 10% error margins.
- → Regression models perform better under more relaxed error constraints (20% error margin)

## **Comparison to Regression Based Models: Inference Runtime**

#### Conversion to Simple Operator Graph (SOG) and feature extraction



2x Faster than logic synthesis, and 1.7x Faster than regression based models.

## Conclusion

• We introduced an open-source dataset *MetRex*, 25,868 synthesizable designs, annotated with their post-synthesis metrics.

• We showed that supervised finetuning can improve LLMs performance on the metric estimation task by **37.0%**, **25.3%**, **25.7%** on area, delay, static power respectively.

• Best performing models achieves **73.2%**, **61.6%**, **and 52.2%** on area, delay, static power respectively within MRE threshold of **20%**.

# Thank You!



https://github.com/scale-lab/MetRex



