



Agency for Science, Technology and Research

SINGAPORE



### NANYANG TECHNOLOGICAL UNIVERSITY

# SonicFFT: A system architecture for ultrasonic based FFT acceleration

Darayus Adil Patel<sup>1</sup>, Viet Phuong Bui<sup>2</sup>, Kevin Tshun Chuan Chai<sup>3</sup>, Amit Lal<sup>4</sup>, Mohamed M. Sabry Aly<sup>1</sup>

<sup>1</sup> Nanyang Technological University, Singapore
 <sup>2</sup> A\*STAR Institute of High Performance Computing, Singapore
 <sup>3</sup> A\*STAR Institute of Microelectronics, Singapore
 <sup>4</sup> Cornell University, USA

#### ASPDAC 2022

# **FFT Applications**



£CS

2/25

# **Current FFT Implementations**



3/25









**Compact-modelling & System Architecture** 

03

02

SonicFFT Data Mapping methodology

64 Evaluation Framework

OS Results & Analysis







# In-silica Ultrasonic FFT Computation



• Receiver Plane Intensity =  $\mathcal{F}$  (EM wave distribution at input plane)



# **In-silica Ultrasonic FFT Computation**

ECS
8/25

- Ultrasonic wave propagation in silicon for FFT computation
- 2D FFT computational complexity of O(N) instead of  $O(N^2 log N)$











02

Compact-modelling & System Architecture



SonicFFT Data Mapping methodology





# **Compact Model of Wavefront Computing (WFC) Accelerator**



• <u>Data Buffers</u>: SRAM based local accelerator memory

10/25

- <u>DAC & ADC</u>: No. of DACs & ADCs increase linearly with array size
- <u>Transducer Array</u>: Array size ( $\delta \times \delta$ ) determines WFC accelerator computation capacity
- <u>Transmission Medium</u>: Fused silica transmission medium
- Lens: Ideal / Multi-phase Fresnel Lens

# WFC Accelerator Latency: Constituent components



 Accelerator latency dominated by transmission medium for large array sizes



# WFC Accelerator Power: Constituent components



• Accelerator power dominated by transducers for large array sizes



# **System Architecture**





\* PCIe 5.0 Data Interface









Compact-modelling & System Architecture



SonicFFT Data Mapping methodology





# SonicFFT: Data Mapping Methodology



€CS 15/25

# SonicFFT: Data Mapping Methodology

ECS 16/25

- Each stage of the CT Algorithm computes a different size DFT
- Preliminary stages mapped to WFC accelerator
- Final stage twiddle multiplications & additions mapped to host processor











Compact-modelling & System Architecture



SonicFFT Data Mapping methodology





### **Evaluation Framework**



€CS 18/25

# System parameters for SonicFFT evaluation



#### **Baseline**



- <u>Hardware</u>: Octa-core Processor + 1GB main memory
- <u>Software</u>: FFTW Library





- Hardware: WFC accelerator interfaced with Octa-core Processor + 1GB main memory
- <u>Software</u>: Custom mapping software









Compact-modelling & System Architecture



SonicFFT Data Mapping methodology





### **Results**



|              | WFC Config           |      |      |       |       |       | 128           | 256   | 512   | 1024   | 2048   | 4096   |
|--------------|----------------------|------|------|-------|-------|-------|---------------|-------|-------|--------|--------|--------|
|              | $\rightarrow$        | 4×4  | 8×8  | 16×16 | 32×32 | 64×64 | ×             | ×     | ×     | ×      | ×      | ×      |
| ↓ N×N Config |                      |      |      |       |       |       | 128           | 256   | 512   | 1024   | 2048   | 4096   |
| 64×64        | Speedup (×)          | 0.77 | 0.95 | 1.02  | 1.04  | 1.05  | -             | -     | -     | -      | -      | -      |
|              | Energy Reduction (×) | 0.77 | 0.95 | 1.02  | 1.04  | 1.05  | ` <u>-</u> `` | -     | -     | -      | -      | -      |
|              | EDP Gain (×)         | 0.59 | 0.9  | 1.03  | 1.08  | 1.10  | -             | -     | -     | -      | -      | -      |
| 128×128      | Speedup (×)          | -    | 0.83 | 1.04  | 1.12  | 1.15  | 1.17          | -     | -     | -      | -      | -      |
|              | Energy Reduction (×) | -    | 0.83 | 1.03  | 1.12  | 1.15  | 1.17          | -     | -     | -      | -      | -      |
|              | EDP Gain (×)         | -    | 0.69 | 1.07  | 1.26  | 1.33  | 1.37          | -     | -     | -      | -      | -      |
| 256×256      | Speedup (×)          | -    | -    | 1.02  | 1.31  | 1.45  | 1.52          | 1.58  | -     | -      | -      | -      |
|              | Energy Reduction (×) | -    | -    | 1.02  | 1.30  | 1.44  | 1.52          | 1.57  | -     | -      | -      | -      |
|              | EDP Gain (×)         | -    | -    | 1.04  | 1.69  | 2.09  | 2.31          | 2.49  | -     | -      | -      | -      |
| 512×512      | Speedup (×)          | -    | -    | -     | 1.66  | 2.18  | 2.57          | 2.86  | 3.22  | -      | -      | -      |
|              | Energy Reduction (×) | -    | -    | -     | 3.11  | 2.80  | 2.52          | 2.15  | 1.63  | -      | -      | -      |
|              | EDP Gain (×)         | -    | -    | -     | 2.71  | 4.68  | 6.48          | 7.99  | 10.02 | -      | -      | -      |
| 1024×1024    | Speedup (×)          | -    | -    | -     | -     | 3.41  | 4.61          | 5.97  | 7.52  | 10.8   | -      | -      |
|              | Energy Reduction (×) | -    | -    | -     | -     | 3.33  | 4.46          | 5.67  | 6.91  | 9.06   | -      | -      |
|              | EDP Gain (×)         | -    | -    | -     | -     | 11.34 | 20.55         | 33.85 | 51.95 | 97.69  | -      | -      |
| 2048×2048    | Speedup (×)          | -    | -    | -     | -     | -     | 5.76          | 7.78  | 11    | 16     | 39.2   | -      |
|              | Energy Reduction (×) | -    | -    | -     | -     | -     | 5.66          | 7.48  | 9.98  | 12.96  | 19.4   | -      |
|              | EDP Gain (×)         | -    | -    | -     | -     | -     | 32.60         | 58.20 | 109.3 | 207.91 | 761.08 | -      |
| 4096×4096    | Speedup (×)          | -    | -    | -     | -     | -     | -             | 7.67  | 10.2  | 14.9   | 23.8   | 117.6  |
|              | Energy Reduction (×) | -    | -    | -     | -     | -     | -             | 7.44  | 9.50  | 12.46  | 15.38  | 19.69  |
|              | FDP Gain (x)         | -    | -    | -     | -     | _     | _             | 57.06 | 97.08 | 185.5  | 366.53 | 2317.3 |

# **Results: Constituent Components**





# Latency comparison with State of the Art

23/25



H. Cılasun et al., "CRAFFT: High Resolution FFT Accelerator In Spintronic Computational RAM"
 X. Chen, et al, "A Variable-Size FFT Hardware Accelerator Based on Matrix Transposition"
 Xiaohui Li & Ellen Blinka; Texas Instruments White Paper: Very large FFT for TMS320C6678 processors

# **Energy comparison with State of the Art**



24/25

H.Cılasun et al., "CRAFFT: High Resolution FFT Accelerator In Spintronic Computational RAM"
 X. Chen, et al, "A Variable-Size FFT Hardware Accelerator Based on Matrix Transposition"

# **EDP comparison with State of the Art**



25/25

H.Cılasun et al., "CRAFFT: High Resolution FFT Accelerator In Spintronic Computational RAM"
 X. Chen, et al, "A Variable-Size FFT Hardware Accelerator Based on Matrix Transposition"





Speaker Contact Details: Dr. Darayus Adil Patel : dpatel@ntu.edu.sg