Tutorials

ASP-DAC 2018 offers attendees a set of two-hour intense introductions to specific topics. Each tutorial will be presented twice a day to allow attendees to cover multiple topics. If you register for tutorials, you have the option to select three out of the six topics.

  • Date: Monday, January 22, 2017 (9:30 - 17:15)
Room 302 Room 401 Room 402A Room 402B
9:30 - 11:30 Tutorial-1
Machine Learning and Deep Learning
Tutorial-4
Harnessing Data Science for the HW Verification Process
Tutorial-5
Flash Memory Design and Management
Tutorial-6
IC Design and Technology Co-Optimizations in Extreme Scaling
12:45 - 14:45 Tutorial-1
Machine Learning and Deep Learning
Tutorial-2
Accelerating Deep Neural Networks on FPGAs
Tutorial-3
Machine Learning for Reliability Monitoring, Mitigation and Adaptation
Tutorial-6
IC Design and Technology Co-Optimizations in Extreme Scaling
15:15 - 17:15 Tutorial-4
Harnessing Data Science for the HW Verification Process
Tutorial-2
Accelerating Deep Neural Networks on FPGAs
Tutorial-3
Machine Learning for Reliability Monitoring, Mitigation and Adaptation
Tutorial-5
Flash Memory Design and Management

Tutorial-1: Machine Learning and Deep Learning

Speaker:
Jinjun Xiong (IBM T. J. Watson Research Center)

Abstract:

Machine learning and deep learning has attracted a lot of attention from industry, media, academia and government alike, and its impact to business and industries can’t be over emphasized. The subject is broad with many on-going research and development. I plan to present an effective tutorial on such a broad subject in a two-hour duration to the DA community. My plan is to teach some of the most important fundamental techniques that are proven to be common and universal to many popular machine learning and deep learning algorithms. Covered topics will include the general iterative algorithm for solving unconstrained optimization problems, gradient descent and stochastic gradient descent methods, fundamental concepts in machine learning (such as training, testing, and cross validation, bias, variance), differences between machine learning, AI, and data mining, popular machine learning algorithms such as perceptron, logistic regression, decision tree and random forest, and deep learning algorithm such as ANN and CNN. A central theme to all the algorithmic coverage is a common set of techniques that are proven to be critical for their deep understanding. If time permits, I will also share some of my experience in applying those techniques to various industry solutions and how that relates to my deep DA roots.

Biography:

Jinjun Xiong Dr. Jinjun Xiong is Program Director for Cognitive Computing Systems Research at the IBM Thomas J. Watson Research Center. He is responsible for defining the scientific agenda and strategic directions for advanced cognitive computing systems research across industries, academia and governmental agencies. In that capacity, he co-directs the IBM-Illinois Center for Cognitive Computing Systems Research (C3SR). Prior to that role, Dr. Xiong was a manager of the Smarter Energy group, responsible for the IBM Research’s Big Bet Program on Smarter Energy Research, including its strategies and execution. He also pioneered the statistical data-driven methodology for improving VLSI chip yields, and developed many novel variational-aware design optimization techniques that were used by multiple generations of IBM's high performance ASIC and server designs. Dr. Xiong received his Ph.D. degree in Electrical Engineering from University of California, Los Angeles in 2006, and since then has been a Research Staff Member with IBM Thomas J. Watson Research Center. His research interests include cognitive computing, big data analytics, deep learning, smarter energy, and application of cognitive computing for industrial solutions.


Tutorial-2: Accelerating Deep Neural Networks on FPGAs

Speakers:
Zhiru Zhang, School of ECE, Cornell University
Deming Chen, Dept. of ECE, University of Illinois at Urbana‐Champaign

Abstract:

Deep neural networks (DNNs) are now the state­of­the­art for solving a rich variety of problems in artificial intelligence, from computer vision and image analysis, to speech recognition, machine translation, and game playing. DNNs outperform older methods in accuracy, but require vast amounts of computation and memory. As a result, the training and inference of modern deep learning algorithms is almost exclusively done on large clusters of CPUs and GPUs. One additional advantage of such platforms is the availability of compatible deep learning frameworks such as Caffe, Theano, or TensorFlow, which allow users to make use of the latest models or to train a custom network with engineering effort.
  While CPU and GPU clusters are currently the go­to platforms for DNNs and many other machine learning applications, a customized hardware solution on FPGA can offer significant improvements in energy efficiency without losing adaptability to changing applications. Along this line, Microsoft has employed FPGAs at datacenter scale for cost­effective acceleration of deep learning. More recently, they have built an earth­scale FPGA­based network, enabling network flows to be programmably transformed at line rate [Caulfield16]. There are several other major efforts using FPGAs in the cloud, including IBM SuperVesselCloud [IBM15] and the new Intel CPU­FPGA deep learning accelerator card DLIA. Intel estimates that FPGAs will run 30% of data center servers by the year 2020 [Intel16]. Recent studies have shown that FPGAs can outperform GPUs for large neural networks [Han17,Zhang17]. For example, in [Zhang17], our implementation on a Xilinx VC709 FPGA achieved 3.1x speedup compared to an NVIDIA K80 GPU with 17x more energy efficiency. However, for designers who are not familiar with FPGA optimization opportunities or mainly use low­level RTL code to design an FPGA solution, there still remains a sizable gap between GPU and FPGA platforms in programming effort. This is especially distressing given the rate of algorithmic innovation in deep learning. Fortunately, we see a steady improvement in FPGA design automation tools over the past decade. In particular, high­level synthesis (HLS) tools such as Xilinx Vivado HLS and Intel FPGA SDK for OpenCL enable a user to write code in a high­level programming language. These tools have the potential to critically reduce time­to­market on new accelerator designs and thus reduce the aforementioned innovation gap. We believe there is a critical need now to demonstrate the power of FPGA­specific design optimizations and HLS­centric design methodologies for deep learning to the design automation community and inspire more researchers and students to explore this exciting and fast­growing topic.
  The proposed tutorial will encompass two segments. The first segment will provide a survey on the recent attempts in closing the gap between GPU and FPGA platforms in both DNN performance and design effort. The second segment will discuss the challenges and opportunities in domain­specific optimization and synthesis for FPGA­based DNN acceleration.

  • Segment 1 (Zhiru Zhang): We will begin with a brief overview of the modern FPGA architectures and toolflow. Afterwards we will survey a number of recent FPGA implementations of deep learning algorithms, including the regular floating­point CNNs as well as the more recent low­precision neural nets [Zhao17]. We will also cover the high­level synthesis design methodology for productive development of deep learning on FPGAs.
  • Segment 2 (Deming Chen): This segment will demonstrate how to use IPs and domain­specific HLS techniques developed for deep neural networks to target optimization goals for large neural nets. Meanwhile, we will discuss several related critical issues, such as effective IP integration, high level modeling, and design space exploration for large neural network designs on FPGAs. Several important factors that define deep neural network as a domain for FPGAs will be also discussed to evaluate their specific advantages.

References:

  • [IBM15] “New OpenPOWER Cloud Boosts Ecosystem for Innovation and Development,” https://www­03.ibm.com/press/us/en/pressrelease/47082.wss
  • [Intel16] “The First Chip from Intel’s Altera Buy Will Be out in 2016,” https://fortune.com/2015/11/18/intel­xeon­fpga­chips
  • [Caulfield16] A. Caulfield, et.al., “A Cloud­scale Acceleration Architecture”, IEEE/ACM Int’l Symp. on Microarchitecture (MICRO), Dec. 2016.
  • [Han17] S. Han, J. Kang, H. Mao, Y. Li, D. Xie, H. Luo, Y. Wang, H. Yang, W. J. Dally, “ESE: Efficient Speech Recognition Engine for Sparse LSTM on FPGA”, Int’l Symp. on Field­Programmable Gate Arrays (FPGA), Feb. 2017.
  • [Zhang17] X. Zhang, X. Liu, A. Ramachandran, C. Zhuge, O. Peng, Z. Cheng, K. Rupnow, and D. Chen, “High­Performance Video Content Recognition with Long­term Recurrent Convolutional Network for FPGA,” Int’l Conf. on Field­Programmable Logic and Applications (FPL), Sep. 2017.
  • [Zhao17] R. Zhao, W. Song, W. Zhang, T. Xing, J.­H. Lin, M. Srivastava, R. Gupta, and Z. Zhang, “Accelerating Binarized Convolutional Neural Networks with Software­Programmable FPGAs,” Int’l Symp. on Field­Programmable Gate Arrays (FPGA), Feb. 2017.

Biographies:

Deming Chen Deming Chen is a professor in the ECE department of University of Illinois at Urbana‐Champaign. He is a research professor in the Coordinated Science Laboratory and an affiliate professor in the CS department. His current research interests include system­level and high­level synthesis, computational genomics, GPU and reconfigurable computing, and hardware security. He has given ~90 invited talks sharing these research results worldwide. Dr. Chen is a technical committee member for a series of top conferences and symposia on EDA, FPGA, low­power design, and VLSI systems design. He also served as Program or General Chair, TPC Track Chair, Session Chair, Panelist, Panel Organizer, or Moderator for some of these conferences. He is an associated editor for several IEEE and ACM journals. He received the NSF CAREER Award in 2008, and six Best Paper Awards for ASPDAC'09, SASP'09, FCCM'11, SAAHPC'11, CODES+ISSS'13, and ICCAD'15. He received the ACM SIGDA Outstanding New Faculty Award in 2010, and IBM Faculty Award in 2014 and 2015. He is the Donald Biggar Willett Faculty Scholar. He is included in the List of Teachers Ranked as Excellent in 2008. He was involved in two startup companies previously, which were acquired. In 2016, he co­founded a new startup, Inspirit IoT, Inc., for design and synthesis for machine learning targeting the IoT industry. Inspirit IoT recently received an NSF SBIR (Small Business Innovation Research) Award from the US government.

Zhiru Zhang Zhiru Zhang is an assistant professor in the School of ECE at Cornell University and a member of the Computer Systems Laboratory. His current research focuses on high­level design automation for heterogeneous computing. His work has been recognized with a best paper award from TODAES (2012), the Ross Freeman award for technical innovation from Xilinx (2012), an NSF CAREER award (2015), a DARPA Young Faculty Award (2015), the IEEE CEDA Ernest S. Kuh Early Career Award (2015). He co­founded AutoESL Design Technologies, Inc. to commercialize his PhD research on high­level synthesis. AutoESL was acquired by Xilinx in 2011 and the AutoESL tool is now known as Vivado HLS after the acquisition.


Tutorial-3: Machine Learning for Reliability Monitoring, Mitigation and Adaptation

Speaker:
Mehdi B. Tahoori, Karlsruhe Institute of Technology (KIT)

Abstract:

With increasing the complexity of digital systems and the use of advanced nanoscale technology nodes, various process and runtime variabilities threaten the correct operation of these systems. The interdependence of these reliability detractors and their dependencies to circuit structure as well as running workloads makes it very hard to derive simple deterministic models to analyze and target them. As a result, machine learning techniques can be used to extract useful information which can be used to effectively monitor and improve the reliability of digital systems. These learning schemes are typically performed offline on large data sets in order to obtain various regression models which then are used during runtime operation to predict the health of the system and guide appropriate adaptation and countermeasure schemes. The purpose of this tutorial is to discuss and evaluate various learning schemes in order to analyze the reliability of the system due to various runtime failure mechanisms which originate from process and runtime variabilities such as thermal and voltage fluctuations, device and interconnect aging mechanisms, as well as radiation-induced soft errors.

  • Overview of important unreliability sources in advanced nano-scale technology nodes (15min)
  • Modeling of unreliability sources (15min). The dependency of unreliability sources on different parameters such as, temperature, supply voltage and running workload. Why modeling is not enough for reliability prediction?
  • Monitoring unreliability sources (10min). Overview of different monitoring approaches and their pros and cons:
    • In-situ Sensors
    • Replica Circuits
    • Machine Learning Techniques
  • Machine-learning based monitoring (40min).
    • Monitoring Aging effect: Employing machine learning to find a small set of so called Representative Critical Gates (RCG) or Representative Timing-critical flip-flops (RTFF) the workload of which is correlated with the degradation of the entire circuit.
    • Monitoring Soft error: Employing machine learning to predict the soft-error vulnerability of circuit/memory based on monitoring the signal probabilities (SPs) of a small set of flip-flops.
    • Monitoring Voltage droop: Employing machine learning to predict voltage droop and its effect on circuit timing based on monitoring a sequence of circuit inputs.
  • Learning-based adaptation and mitigation techniques (40min). Using proactive monitoring and approaches the reliability can be predicted before an error happens. This means that mitigation and adaptation actions can be applied in a timely manner. In this part, the adaptation and mitigation techniques are overviewed.

Biography:

Mehdi Tahoori Mehdi Tahoori is a full professor and Chair of Dependable Nano-Computing (CDNC) at the Institute of Computer Science & Engineering (ITEC), Department of Computer Science, Karlsruhe Institute of Technology (KIT), Germany. He received his PhD and M.S. degrees in Electrical Engineering from Stanford University in 2003 and 2002, respectively, and a B.S. in Computer Engineering from Sharif University of Technology in Iran, in 2000. In 2003, he joined the Electrical and Computer Engineering Department at the Northeastern University as an assistant professor where he promoted to the rank of associate professor with tenure in 2009. From August to December 2015, he was a visiting professor at VLSI Design and Education Center (VDEC), University of Tokyo, Japan. From 2002 to 2003, he was a Research Scientist with Fujitsu Laboratories of America, Sunnyvale, CA, in the area of advanced computer-aided research, engaged in reliability issues in deep-submicrometer mixed-signal very large- scale integration (VLSI) designs.
  Prof. Tahoori was a recipient of the National Science Foundation Early Faculty Development (CAREER) Award. He has been a program committee member, organizing committee member, track and topic chair, as well as workshop, panel, and special session organizer of various conferences and symposia in the areas of VLSI design automation, testing, reliability, and emerging nanotechnologies, such as ITC, VTS, DAC, ICCAD, DATE, ETS, ICCD, ASP-DAC, GLSVLSI, and VLSI Design. He is currently an associate editor for IEEE Design and Test Magazine (D&T), coordinating editor for Springer Journal of Electronic Testing (JETTA), associate editor of VLSI Integration Journal, and associate editor of IET Computers and Digital Techniques. He was an associate editor of ACM Journal of Emerging Technologies for Computing. He received a number of best paper nominations and awards at various conferences and journals, including ICCAD 2015 and TODAES 2017. He is the Chair of the ACM SIGDA Technical Committee on Test and Reliability.


Tutorial-4: Harnessing Data Science for the HW Verification Process

Speaker:
Avi Ziv (IBM Research)
Raviv Gal (IBM Research)

Abstract:

Modern verification is a highly automated process that involves many tools and subsystems. These verification tools, which we sometime refer to as data sources, produce large amount of data that is essential for understanding the state and progress of the verification process. The growing complexity in the verification and the amount of data it produces, and the complex relations between the data sources calls for data science techniques such as statistics, data visualization, data mining, and machine learning to extract the essence of the input data and present it to the users in a simple and clear manner.
  The goal of the tutorial is to teach the audience how to harness the powers of data science into tools and systems that improve the verification process and assist verification teams in understanding and managing these processes. The tutorial begins with a brief overview of the challenges and benefits of building a system that stores, processes and analyze the verification data of verification projects. We then describe various components and aspects in such a system, starting from the decision of what data to collect, via methods and schemas for storing the data, to techniques to analyze the data and display the analysis results. The main focus of the tutorial is on specific analysis techniques that are accompanied by concrete examples. We conclude the tutorial with open research problems that can lead to the “holy grail” of cognitive verification.
  The tutorial uses, as an example, the IBM Verification Cockpit, a platform for collecting, processing and analyzing verification data. The IBM Verification Cockpit is being used by all major hardware system designs projects in IBM.

Biography:

Avi Ziv Dr. Avi Ziv is a Research Staff Member in the Hardware Verification Technologies Department at the IBM Research Laboratory in Haifa, Israel. Since joining IBM in 1996, Avi has been working on developing technologies and methodologies for various topics of simulations-based functional verification including stimuli generation, checking, functional coverage and coverage directed generation. In recent years, the main focus of Avi’s work is bringing data science to the hardware verification world.
  Avi received the B.Sc. degree in Computer Engineering from the Technion-Israel Institute of Technology in 1990 and the M.Sc. and Ph.D. degrees in Electrical Engineering from Stanford University in 1992 and 1995, respectively. He is the author of more than 50 papers and the inventor of more than 10 patents, mostly in the area of functional verification.

Raviv Gal Raviv Gal joined IBM Haifa Research Lab in 2011 to the Verification and Analytics area, and is a project leader in utilizing data science for hardware verification. Before this, Raviv worked 12 years in Marvell Israel where he performed several verification roles in formal verification and dynamic verification, lead project verification and was the verification leader of the packet processors product line.
  Raviv received his BA in Mathematic and CS, and MA in CS from Tel-Aviv university.


Tutorial-5: Flash Memory Design and Management

Speaker:
Yuan-Hao Chang (Institute of Information Science, Academia Sinica)

Abstract:

Embedded systems, especially battery-powered consumer electronics and mobile computing systems such as smartphones and IoT devices, usually adopt flash-memory devices as their storage systems. However, due to the shrinking of the fabrication process and the advances of manufacturing technology, the density and capacity of flash-memory chips grow dramatically in recent years. In particular, the flash memory goes to multiple level cells from single level cells with shrinking cell size, and the flash organization goes to 3D from 2D. However, such a design trend brings performance, reliability, and endurance issues to flash memory, especially when flash memory has been adopted in more and more different application scenarios. To resolve these issues, various flash management strategies have been proposed and investigated in the past decade.
  In this tutorial, we will start from a basic introduction to the organization/structure and the development history of flash memory. Next, the application domains and challenges of flash memory will be introduced, and the most well-known management strategies to tackle the design issues of flash memory will be introduced. Finally, the future development trend and challenges of flash memory will be discussed with some design examples to discuss the future research directions. The target audiences of this tutorial are (1) young students/researchers who are interested in system design, and (2) system researchers who are interested in non-volatile memory technology with prime focus on the embedded software management/design for memory/storage sub-systems in embedded systems.

Biography:

Yuan-Hao Chang Yuan-Hao Chang (Johnson Chang) received his Ph.D. degree from National Taiwan University in 2009. He joined Institute of Information Science, Academia Sinica, Taipei, Taiwan as an assistant research fellow between Aug. 2011 and Mar. 2015, and was promoted as an associate research fellow since Mar. 2015. Previously, he was a software/firmware design engineer in VeriFone, a division of Hewlett-Packard, Taipei, Taiwan.
   Dr. Chang has published more than one hundred research papers in major highly-reputable international journals and conferences. They were mainly published in top journals (e.g., IEEE TC, IEEE TVLSI, IEEE TCAD, ACM TECS, ACM TODAES, and ACM TOS) and conferences (e.g., ACM/IEEE DAC, ACM/IEEE ICCAD, ACM/IEEE ISLPED and ACM/IEEE CODES). He has been granted for dozens of US and Taiwan patents. His research received best paper nominations from top conferences (i.e., ACM/IEEE DAC 2016, ACM/IEEE DAC 2014, ACM/IEEE CODES 2014, and ACM/IEEE DAC 2007) and important conference ACM/IEEE ASP-DAC 2016. He also received Ta-You Wu Memorial Award from Ministry of Science and Technology (MOST) in 2015, Outstanding Youth Electrical Engineer Award from Chinese Institute of Electrical Engineering (CIEE) in 2014, and Excellent Research Project Award from National Program for Intelligent Electronics (NPIE) of Ministry of Science and Technology (MOST) in 2014.
   Dr. Chang also serves as a program committee member for many important conferences and workshops (e.g., ACM/IEEE DAC, ACM/IEEE ASP-DAC, and IEEE ICDCS) and as a reviewer for premier journals (e.g., IEEE TC/TCAD/TKDE/TMSCS/TR/TVLSI and ACM TECS/TODAES/TOS). He serves as a program co-chair of IEEE Non-Volatile Memory Systems and Applications Symposium (NVMSA) 2017, and a local co-chair of ACM/IEEE International Symposium on Low Power Electronics and Design (ISLPED) 2017. His research interests include memory/storage systems, operating systems, embedded systems, and real-time systems. He is a Senior Member of IEEE and a Lifetime Member of ACM.


Tutorial-6: IC Design and Technology Co-Optimizations in Extreme Scaling

Speaker:
David Z. Pan (The University of Texas at Austin)

Abstract:

As the semiconductor industry enters the era of extreme scaling, IC design and manufacturing challenges are exacerbated, which calls for increasing design and technology co-optimization (DTCO). DTCO requires cross-layer information feed-forward and feed-back, to enable the overall design and manufacturing closure and optimization. This tutorial will present some key challenges and practices how to enable such DTCO, from mask synthesis, to standard cell design, and to physical design. The topics which will be covered include: machine learning based lithography hotspot detection and mask synthesis, standard cell pin access and routing planning/optimizations as well as placement under multiple patterning lithography (MPL). As new process technologies are proposed (e.g., new transistor structures, new materials) and new design requirements pop up (e.g., hardware security), we expect to see many new opportunities for synergistic design and technology co-optimizations. We will show some case studies on these emerging DTCO issues and conclude with future research directions. The outline of the tutorial is as follows.

  • Introduction
  • Lithography hotspot detection using machine learning and deep learning
  • Machine learning based mask synthesis (OPC and SRAF insertion)
  • Standard cell pin access and routing planning/optimization under MPL
  • MPL aware detailed placement
  • DTCO with selective etching and DSA
  • DTCO for hardware security
  • Conclusion

Biography:

David Z Pan David Z. Pan received his BS degree from Peking University in 1992, and his PhD degree in Computer Science from UCLA in 2000. He was a Research Staff Member at IBM T. J. Watson Research Center from 2000 to 2003. He is currently a full professor and holder of the Engineering Foundation Endowed Professorship #1 at the Department of Electrical and Computer Engineering, University of Texas at Austin. He has published over 290 refereed journal/conference papers and 8 US patents, and graduated over 20 PhD students. He has served in many premier journal editorial boards and conference committees, including various leadership roles. He has received a number of awards, including the SRC Technical Excellence Award, 14 Best Paper Awards, DAC Top 10 Author Award in Fifth Decade, ASP-DAC Frequently Cited Author Award, Communications of ACM Research Highlights, ACM/SIGDA Outstanding New Faculty Award, NSF CAREER Award, NSFC Overseas and Hong Kong/Macau Scholars Collaborative Research Award, SRC Inventor Recognition Award three times, IBM Faculty Award four times, UCLA Engineering Distinguished Young Alumnus Award, UT Austin RAISE Faculty Excellence Award, many international CAD contest awards, among others. He is a Fellow of IEEE and SPIE.

Last Updated: Jan 18, 2018