

22<sup>nd</sup> Asia and South Pacific Design Automation Conference (ASP-DAC), Chiba, Japan, Jan. 16-19, 2017



## Towards Scalable and Efficient GPU-Enabled Slicing Acceleration in Continuous 3D Printing

Aosen Wang<sup>1</sup>, Chi Zhou<sup>2</sup>, Zhanpeng Jin<sup>3</sup> and Wenyao Xu<sup>1</sup> <sup>1</sup>CSE of SUNY Buffalo <sup>2</sup>ISE of SUNY Buffalo <sup>3</sup>ECE of SUNY Binghamton





#### Continuous 3D Printing

Continuous 3D printing is a recent technical breakthrough in additive manufacturing [2015]. (Carbon3D)



\* This picture comes from internet: https://techcrunch.com/2015/08/20/ with-100m-in-funding-carbon3d-will-make-3d-manufacturing-a-reality/

## Principle of Continuous 3D Printing (Carbon 3D)



## Speedup of Carbon 3D is mainly from Manufacturing (wet part)

 Dry Part (Prefabrication)
Computing unit slices of the layer images. ✓ Wet Part (Manufacturing) Mechanical operations to fabricate 3D object from liquid materials. 5

#### Prefabrication V.S. Manufacturing





## Background (Slicing)

- □ The task in prefabrication includes three sequential procedures, i.e., slicing, path planning and support generation. Slicing dominates time efficiency in "dry part".
- □ In continuous 3D printing, image-mask-projection based slicing algorithm is employed. This pixel-independent processing enables massive parallel acceleration.



## Methodology (Slicing Algorithm Analysis)



The contract of the interse of the int

image pixel center and the triangles from STL file.

- The trunk sorting sorts the out-of-order intersection points by ascending order using the bubble sorting in the trunk of each pixel.
- The binary value of each pixel on projected images is identified by incremental updating, so that the topology information is extracted for binary slicing image.



## **GPU-Enabled Slicing-I** (Pixelwise Parallel Slicing)

By the sequential algorithm analysis, we exploit the pixelwise parallelism based on  $\succ$ GPGPU architecture.



## GPU-Enabled Slicing-I (Pixelwise Parallel Slicing)

- ➢ By the sequential algorithm analysis, we exploit the pixelwise parallelism based on GPGPU architecture.
- The entire processing in all three functional modules for one pixel is assigned to a specific thread.



## GPU-Enabled Slicing-I (Pixelwise Parallel Slicing)

- ➢ By the sequential algorithm analysis, we exploit the pixelwise parallelism based on GPGPU architecture.
- The entire processing in all three functional modules for one pixel is assigned to a specific thread.
- ➢ Fully use of precious shared memory on GPU to reduce time-consuming global memory intersections.



## GPU-Enabled Slicing-II (Fully Parallel Slicing)



- > PPS still has serial computing components.
- ➢ FPS explores the massive thread concurrency

## GPU-Enabled Slicing-II (Fully Parallel Slicing)



**Global Memory** 

**Global Memory** 

- > PPS still has serial computing components.
- > FPS explores the massive thread concurrency
- This method increases global memory accessing pattern, but is scalable for large-size problem.
- The issue of multi-thread memory writing conflict arises and can be addressed by atomic operation based critical area.

| Host (CPU)                   | STL File<br>Triangle Mesh Statistics                | STL File<br>Triangle Mesh Statistics            |
|------------------------------|-----------------------------------------------------|-------------------------------------------------|
| GPU                          |                                                     |                                                 |
| Ray-Triangle<br>Intersection |                                                     |                                                 |
| Trunk<br>Sorting             |                                                     |                                                 |
| Layer<br>Extraction          |                                                     |                                                 |
| Host (CPU)                   | Output Slicing Images<br>Pixelwise Parallel Slicing | Output Slicing Images<br>Fully Parallel Slicing |









PPS: all tasks in fast shared memory, less global memory access, no multi-thread conflict.
FPS: recycle-free processing, atomic operation based critical area to address conflict issue.



#### **Experimental Setup**

- ➢ We use cycle-accurate simulators for CPU and GPU computing platforms
- Sniper is a typical simulator for x86 architecture and GPGU-Sim is a good simulating tool to check statistics of GPGPU architecture.
- Sniper is configured as Intel Xeon X5550 with 2.66GHz frequency while GPGPU-Sim is configured as Nvidia Geforce GTX480 with 700MHz.
- ➢ We choose four representative 3D objects: Club, Android, Ring and Bunny. They have different triangle mesh size, as 3290, 10926, 33730 and 69664.









#### **Experiment:** Time Efficiency



Fully parallel slicing achieves the best performance in three schemes.

Considering the processing frequency difference, PPS gains one order of magnitude improvement and FPS even obtains two orders acceleration.

#### **Experiment: Scalability**





#### Conclusions

- □ We investigated slicing algorithm acceleration on GPGPU architecture for continuous 3D printing.
- □ We developed pixelwise parallel slicing and fully parallel slicing implementations.
- Experiments demonstrate the effectiveness and scalability of our implementation.

In the future:

- ✤ We will design new implementations on the new hardware platform, such as FPGA or more powerful GPU.
- ✤ We will exploit pipeline property between prefabrication and manufacturing.

# Q & A



Address comments to wenyaoxu@buffalo.edu