# **ASP-DAC 2025**

# WITCH: WelghTed Coding Scheme for Crosstalk Reduction in High Bandwidth Memory

Seoyoon Jang<sup>\*1</sup>, **Sangouk Jeon<sup>\*1</sup>**, Kwanghyun Shin<sup>1</sup>, Dongkwon Lee<sup>1</sup>, Hankyu Chi<sup>2</sup>, Wookjin Shin<sup>2</sup>, Chanhyun Pyo<sup>2</sup>, Jaeha Kim<sup>1</sup>, Dongsuk Jeon<sup>1</sup> \*(ECA) <sup>1</sup>Seoul National University, <sup>2</sup>SK hynix

Email: sangouk.jeon@snu.ac.kr



#### **Background: HBM-Integrated System Overview**



## **Background: HBM Performance Evolution**

# Bandwidth ↑ I/O speed ↑ → Crosstalk ↑ ↑ Chanel density ↑



**Background: Crosstalk** 

**Crosstalk:** Caused by capacitive and inductive coupling



Aggressor **transits** → Victim affected

# **Background: Crosstalk Reduction Methods**

- XTC (Crosstalk Cancellation)
  - Generate anti-crosstalk signals to cancel out the crosstalk
  - Significant crosstalk reduction
  - Large Area Overhead
    - 100~300 % Area Overhead

- CAC (Crosstalk Avoidance Code)
  - Using different number-based system, removing worst-case transition pattern
    - Ex. Fibonacci Number system (FNS-based)
  - Crosstalk minimized to a certain level & Low bit efficiency
  - Small Area Overhead 😳

#### **Background: Crosstalk Level**

Capacitive coupling: Case I < Case II < Case III

**Inductive** coupling: Case I > Case II > Case III





# **Channel Structure**

Grid Pattern: Shielding and Signal Channels Placed Alternatively



# **Channel Structure**

Binary system  $\rightarrow$  Each channel has a **base of**  $2^n$ 



# **Data Transmission Example**

Ex) 0000\_0000\_0111 is transmitted on the channel for the value 7



# **Concept of WITCH**

Crosstalk level varies with the position within the channel Worst case: Victim and 4 aggressors undergo transition simultaneously → Always in group A



Our approach: Eliminate such worst cases through coding!

# **Overall Architecture**



# **Proposed Coding Method Example**

Design the base of the number system so that a specific pattern triggers a carry



Suppose we want to eliminate the simultaneous transition of three channels.

# Proposed Coding Method Example

Design the base of the number system so that a specific pattern triggers a carry

Make the base of next channel = 7 (=1+2+4),

Since encoding is performed from the MSB part, the case of [1, 1, 1] does not occur

# **Proposed Coding Method Example**



Actual data sent to the channel:  $Z_i = Enc(X_i) \oplus Z_{i-1}$ 

Absence of [1,1,1] in  $Enc(X_i)$  is guaranteed  $\rightarrow Z_i$  and  $Z_{i-1}$  can't differ in all three channels

 $\rightarrow$  No case of all three channels transitioning simultaneously.

Goal: eliminate worst cases in group A



Our number system eliminates the case of all five channels are 1



Group (A, B) → Symbol, sharing a single base Each symbol P<sub>i</sub> can represent {0,1,2,3} - Channel A = 2, Channel B = 1

Carry occurs when  $p_{i+1} = 3 \& p_i \ge 2$ 

 $\rightarrow$  Avoids consecutive  $A_{i+1}, B_{i+1}, A_i$  being all 1



Simultaneous transition of  $A_{i+1}$ ,  $B_{i+1}$ ,  $A_i$  eliminated

→ Worst case (victim and four surrounding channels transition) is also eliminated

Worst case pattern removed



# WITCH – AS (Additional Shielding)

Further reduce crosstalk level with additional shielding



$$s_1 = 1,$$
  
 $s_{2i-1} = s_{2i-2} + 7s_{2i-3},$   
 $s_{2i} = 16s_{2i-1}$ 

# WITCH – AS (Additional Shielding)

A carry occurs when the victim and three or more surrounding channels are all 1



$$s_1 = 1,$$
  
 $s_{2i-1} = s_{2i-2} + 7s_{2i-3},$   
 $s_{2i} = 16s_{2i-1}$ 

# WITCH – AS (Additional Shielding)



# Eye Diagram Simulation



# **Eye Diagram Simulation**



Bit efficiency

# WITCH has highest bit efficiency 4.2% and 20.8% Improvement



Encoder's long critical path  $\rightarrow$  bottleneck for high-speed operation



Compare & Reduce (number system conversion) → Serial operation

- Perform the number system conversion without such compare & reduce processes
- The optimized encoder performs encoding in three steps:
  - Step 1: Bit-by-Bit direct encoding into a new number system
  - Step 2: Add them symbol by symbol
  - Step 3: Round up

- Perform the number system conversion without such compare & reduce processes
- The optimized encoder performs encoding in three steps:
  - Step 1: Bit-by-Bit direct encoding into a new number system
  - Step 2: Add them symbol by symbol
  - Step 3: Round up



- Perform the number system conversion without such compare & reduce processes
- The optimized encoder performs encoding in three steps:
  - Step 1: Bit-by-Bit direct encoding into a new number system
  - Step 2: Add them symbol by symbol
  - Step 3: Round up



- Perform the number system conversion without such compare & reduce processes
- The optimized encoder performs encoding in three steps:
  - Step 1: Bit-by-Bit direct encoding into a new number system
  - Step 2: Add them symbol by symbol
  - Step 3: Round up



# **Hardware Optimization Result**

- Shorter critical path, area and power benefits at high frequencies
  - Synthesized in 28nm CMOS



# **Interleaved Architecture**

#### Multiple encoders can be interleaved to support higher data rates



| Frequency(GHz)         | 1     | 2     | 3     | 4     | 5     |
|------------------------|-------|-------|-------|-------|-------|
| Critical path(ns)      | 0.68  | 0.48  | 0.31  | 0.24  | 0.18  |
| Area(μm <sup>2</sup> ) | 222   | 241   | 283   | 390   | 420   |
| Power(mW)              | 0.185 | 0.431 | 0.789 | 1.362 | 1.921 |

Operates at high frequencies

# Conclusion

- Proposal of a new coding scheme
  - Achieving highest bit efficiency, with comparable crosstalk reduction
- Evaluate the performance of the coding scheme
  - Eye-opening using real channel models
  - Theoretical analysis
- Optimized encoder architecture
  - Significantly reduces overhead when applied in practice

# Thank you

For Your Attention