Special Sessions
- Date: 21-23, 2014
- Place: Room 302
Date/Time | Title | |
1S-1 | 10:40 - 11:05 Tuesday, January 21, 2014 | Normally-Off Computing: Towards Zero Stand-by Power Management |
1S-2 | 11:05 - 11:30 Tuesday, January 21, 2014 | Novel Nonvolatile Memory Hierarchies to Realize "Normally-Off Mobile Processors" |
1S-3 | 11:30 - 11:55 Tuesday, January 21, 2014 | Normally-Off MCU Architecture for Low-Power Sensor Node |
1S-4 | 11:55 - 12:20 Tuesday, January 21, 2014 | Normally-Off Technologies for Healthcare Appliance |
2S-1 | 13:50 - 14:20 Tuesday, January 21, 2014 |
Applying VLSI EDA to Energy Distribution System Design |
2S-2 | 14:20 - 14:50 Tuesday, January 21, 2014 | A Model-Based Design of Cyber-Physical Energy Systems |
2S-3 | 14:50 - 15:20 Tuesday, January 21, 2014 | The Data Center as a Grid Load Stabilizer |
3S-1 | 15:50 - 16:20 Tuesday, January 21, 2014 | A Silicon Nanodisk Array Structure Realizing Synaptic Response of Spiking Neuron Models with Noise |
3S-2 | 16:20 - 16:50 Tuesday, January 21, 2014 | Energy Efficient In-Memory Machine Learning for Data Intensive Image-Processing by Non-Volatile Domain-Wall Memory |
3S-3 | 16:50 - 17:20 Tuesday, January 21, 2014 | Lessons from the Neurons Themselves |
4S-1 | 10:10 - 10:40 Wednesday, January 22, 2014 | SDG2KPN: System Dependency Graph to Function-Level KPN Generation of Legacy Code for MPSoCs |
4S-2 | 10:40 - 11:10 Wednesday, January 22, 2014 | Low Power Design of the Next-Generation High Efficiency Video Coding |
4S-3 | 11:10 - 11:40 Wednesday, January 22, 2014 | Mapping Complex Algorithm into FPGA with High Level Synthesis |
4S-4 | 11:40 - 12:10 Wednesday, January 22, 2014 | Leveraging Parallelism in the Presence of Control Flow on CGRAs |
5S-1 | 13:50 - 14:20 Wednesday, January 22, 2014 | Soft Error Resiliency Characterization on IBM BlueGene/Q Processor |
5S-2 | 14:20 - 14:50 Wednesday, January 22, 2014 | Resiliency for Many-Core System on a Chip |
5S-3 | 14:50 - 15:20 Wednesday, January 22, 2014 | Rethinking Error Injection for Effective Resilience |
6S-1 | 15:50 - 16:20 Wednesday, January 22, 2014 | Accurate and Inexpensive Performance Monitoring for Variability-Aware Systems |
6S-2 | 16:20 - 16:50 Wednesday, January 22, 2014 | Quantifying Workload Dependent Reliability in Embedded Processors |
6S-3 | 16:50 - 17:20 Wednesday, January 22, 2014 | QED Post-Silicon Validation and Debug: Frequently Asked Questions |
7S-1 | 10:10 - 10:40 Thursday, January 23, 2014 | Spiking Brain Models: Computation, Memory and Communication Constraints for Custom Hardware Implementation |
7S-2 | 10:40 - 11:10 Thursday, January 23, 2014 | Advanced Technologies for Brain-Inspired Computing |
7S-3 | 11:10 - 11:40 Thursday, January 23, 2014 | GPGPU Accelerated Simulation and Parameter Tuning for Neuromorphic Applications |
7S-4 | 11:40 - 12:10 Thursday, January 23, 2014 | A Scalable Custom Simulation Machine for the Bayesian Confidence Propagation Neural Network Model of the Brain |
8S-1 | 13:50 - 14:15 Thursday, January 23, 2014 |
An Overview of Spin-Based Integrated Circuits |
8S-2 | 14:15 - 14:40 Thursday, January 23, 2014 | Advances in Spintronics Devices for Microelectronics - from Spin-Transfer Torque to Spin-Orbit Torque |
8S-3 | 14:40 - 15:05 Thursday, January 23, 2014 | Hybrid CMOS/Magnetic Process Design Kit and SOT-Based Non-Volatile Standard Cell Architectures |
8S-4 | 15:05 - 15:30 Thursday, January 23, 2014 | Architectural Aspects in Design and Analysis of SOT-Based Memories |
9S-1 | 15:50 - 16:30 Thursday, January 23, 2014 | The Role of Photons in Cryptanalysis |
9S-2 | 16:30 - 17:10 Thursday, January 23, 2014 | SPADs for Quantum Random Number Generators and Beyond |
9S-3 | 17:10 - 17:50 Thursday, January 23, 2014 | Quantum Key Distribution with Integrated Optics |
Session 1S
Special Session: Normally-Off Computing: Towards Zero Stand-by Power Management
Time: 10:40 - 12:20 Tuesday, January 21, 2014
Organizer: Hiroshi Nakamura (University of Tokyo, Japan)
Title |
(Invited Paper) Normally-Off Computing Project : Challenges and Opportunities |
Author |
*Hiroshi Nakamura, Takashi Nakada, Shinobu Miwa (The University of Tokyo, Japan) |
Abstract |
Normally-Off is a way of computing which aggressively powers off components of computer systems when they need not to operate. Simple power gating cannot fully take the chances of power reduction because volatile memories lose data when power is turned off. Recently, new non-volatile memories (NVMs) have appeared. High attention has been paid to normally-off computing using these NVMs. In this paper, its expectation and challenges are addressed with a brief introduction of our project started in 2011. |
Title |
(Invited Paper) Novel Nonvolatile Memory Hierarchies to Realize "Normally-Off Mobile Processors" |
Author |
*Shinobu Fujita, Kumiko Nomura, Hiroki Noguchi, Susumu Takeda, Keiko Abe (Toshiba Corporation, Japan) |
Abstract |
This paper presents novel processor architecture for HP-processor with nonvolatile/volatile hybrid cache memory. By simulations of high-performance (HP)-processor using MTJs, it has been clarified that total power of the HP-processor using perpendicular-(p-)STT-MRAM can be reduced by over 90 % with little degradation of processor performance. The presented architecture with nonvolatile memory hierarchy will realize the “normally-off computers”. |
Title |
(Invited Paper) Normally-Off MCU Architecture for Low-Power Sensor Node |
Author |
*Masanori Hayashikoshi, Yohei Sato, Hiroshi Ueki, Hiroyuki Kawai, Toru Shimizu (Renesas Electronics Corporation, Japan) |
Abstract |
The production volume of sensor nodes is much increased with the development of cyber-physical systems. Therefore, it becomes important how to reduce the power consumption of huge sensor nodes. In this work, normally-off architecture of microcontroller for future low-power sensor node is proposed. To realize true low-power effects with normally-off computing technology, a co-design of hardware and software technology is much important. In this work, the power consumption of sensor nodes is possible to reduce of around 70%. |
Title |
(Invited Paper) Normally-Off Technologies for Healthcare Appliance |
Author |
*Shintaro Izumi, Hiroshi Kawaguchi, Yoshimoto Masahiko (Kobe University, Japan), Yoshikazu Fujimori (Rohm, Japan) |
Abstract |
Battery mass and power consumption of wearable system must be reduced because the key factors affecting wearable system usability are miniaturization and weight reduction. This report describes a wearable biosignal monitoring system using normally-off technologies to minimize the power consumption. Especially we focused on daily-life monitoring and electrocardiograph processor. Our system employs FeRAM and Near Field Communication (NFC). A robust heart rate monitor and Cortex M0 core are used to on-node processing for logging data reduction. |
Session 2S
Special Session: EDA for Energy
Time: 13:50 - 15:30 Tuesday, January 21, 2014
Organizer: Fadi Kurdahi (University of California, Irvine, U.S.A.), Sani Nassif (IBM Austin Research Lab, U.S.A.), Mohammad Al Faruque (University of California, Irvine, U.S.A.)
2S-1 (Time: 13:50 - 14:20)
Title |
(Invited Paper) Applying VLSI EDA to Energy Distribution System Design |
Author |
*Sani Nassif, Gi-Joon Nam, Jerry Hayes (IBM Austin Research Laboratory, U.S.A.), Sani Fakhouri (University of California, Irvine, U.S.A.) |
Abstract |
Energy distribution networks refer to that part of the electricity network that delivers power to homes and business. It is reported that significant amounts of energy are being wasted simply due to inefficiencies in this network. Further, this domain is rapidly changing with new types of loads such as electric vehicles or the spread of new types of energy sources such as photo-voltaic and wind. In this paper, we demonstrate a comprehensive design automation capability for energy distribution networks leading to much more flexible yet effective system. The new system's capabilities include power load distribution and transfers, equipment upgrading, geospatial-aware network optimization, outage identification, contingency planning and loss analysis/reduction. These features are enabled by advanced simulation, analysis and optimization engines that are adapted from those available in the traditional VLSI design automation area. The paper will conclude with potential future research directions that require further innovations in energy distribution networks. |
Title |
(Invited Paper) A Model-Based Design of Cyber-Physical Energy Systems |
Author |
Mohammad Abdullah Al Faruque, *Fereidoun Ahourai (University of California, Irvine, U.S.A.) |
Abstract |
Cyber-Physical Energy Systems (CPES) are an amalgamation of both power gird technology, and the intelligent communication and co-ordination between the supply and the demand side through distributed embedded computing. Through this combination, CPES are intended to deliver power efficiently, reliably, and economically. The design and development work needed to either implement a new power grid network or upgrade a traditional power grid to a CPES-compliant one is both challenging and time consuming due to the heterogeneous nature of the associated components/subsystems. The Model Based Design (MBD) methodology has been widely seen as a promising solution to address the associated design challenges of creating a CPES. In this paper, we demonstrate a MBD method and its associated tool for the purpose of designing and validating various control algorithms for a residential microgrid. Our presented co-simulation engine GridMat is a MATLAB/Simulink toolbox; the purpose of it is to co-simulate the power systems modeled in GridLAB-D as well as the control algorithms that are modeled in Simulink. We have presented various use cases to demonstrate how different levels of control algorithms may be developed, simulated, debugged, and analyzed by using our GridMat toolbox for a residential microgrid. |
Title |
(Invited Paper) The Data Center as a Grid Load Stabilizer |
Author |
Hao Chen, Michael C. Caramanis, *Ayse K. Coskun (Boston University, U.S.A.) |
Abstract |
To accommodate the increasing presence of volatile and intermittent renewable energy sources in power generation, independent system operators (ISO) offer opportunities for demand side regulation service (RS) so as to stabilize the grid load. These power market features allow the demand side to earn monetary credits by modulating its power consumption dynamically following an RS signal broadcast by ISO. This paper studies the capacities and benefits of a major potential demand side, the data center, to provide RS. We propose a dynamic control policy that modulates the data center power consumption in response to ISO requests by leveraging server power capping techniques and various server power states. Results demonstrate that using our policy, data centers can provide fast reserves in quantities that are substantial proportions (around 50%) of their average energy consumption, with no major deterioration in quality of service (QoS). By doing so, data centers decrease their energy costs around 50%, while providing the ISOs and the society in general with cost effective demand side reserves that render massive renewable generation adoption affordable. |
Session 3S
Special Session: Neuron Inspired Computing using Nanotechnology
Time: 15:50 - 17:30 Tuesday, January 21, 2014
Organizer: Kevin Cao (Arizona State University, U.S.A.), Sarma Vrudhula (Arizona State University, U.S.A.)
Title |
(Invited Paper) A Silicon Nanodisk Array Structure Realizing Synaptic Response of Spiking Neuron Models with Noise |
Author |
*Takashi Morie, Haichao Liang, Yilai Sun, Takashi Tohara (Kyushu Institute of Technology, Japan), Makoto Igarashi, Seiji Samukawa (Tohoku University, Japan) |
Abstract |
In the implementation of spiking neuron models, which can achieve realistic neuron operation, generation of post-synaptic potentials (PSPs) is an essential function. We have already proposed a new nanodisk array structure for generating PSPs using delay in electron hopping among nanodisks. Generated PSPs have fluctuation caused by stochastic electron movement. Noise or fluctuation is effectively used in neural processing. In this paper, we review our proposed structure and show fluctuation controllability based on single-electron circuit simulation. |
Title |
(Invited Paper) Energy Efficient In-Memory Machine Learning for Data Intensive Image-Processing by Non-Volatile Domain-Wall Memory |
Author |
*Hao Yu, Yuhao Wang, Shuai Chen, Wei Fei (Nanyang Technological University, Singapore), Chuliang Weng, Junfeng Zhao, Zhulin Wei (Huawei Shannon Laboratory, China) |
Abstract |
Image processing in conventional logic-memory I/O-integrated systems will incur significant communication congestion at memory I/Os for excessive big image data at exa-scale. This paper explores an in-memory machine learning on neural network architecture by utilizing the newly introduced domain-wall nanowire, called DW-NN. We show that all operations involved in machine learning on neural network can be mapped to a logic-in-memory architecture by non-volatile domain-wall nanowire. Domain-wall nanowire based logic is customized for in machine learning within image data storage. As such, both neural network training and processing can be performed locally within the memory. The experimental results show that system throughput in DW-NN is improved by 11.6x and the energy efficiency is improved by 92x when compared to conventional image processing system. |
Title |
(Invited Paper) Lessons from the Neurons Themselves |
Author |
*Louis Scheffer (Howard Hughes Medical Institute, U.S.A.) |
Abstract |
Natural neural circuits, optimized by millions of years of evolution, are fast, low power, and robust, all characteristics we would love to have in systems we ourselves design. Recently there have been enormous advances in understanding how neurons implement computations within the brain of living creatures. Can we use this new-found knowledge to create better artificial system? What lessons can we learn from the neurons themselves, that can help us create better neuromorphic circuits? |
Session 4S
Special Session: Design Automation Methods for Highly-Complex Multimedia Systems
Time: 10:10 - 12:15 Wednesday, January 22, 2014
Organizer: Sri Parameswaran (University of New South Wales, Australia)
4S-1 (Time: 10:10 - 10:40)
Title |
(Invited Paper) SDG2KPN: System Dependency Graph to Function-Level KPN Generation of Legacy Code for MPSoCs |
Author |
Jude Angelo Ambrose, Jorgen Peddersen (University of New South Wales, Australia), Alvin Labios, Yusuke Yachide (Canon Information Systems Research Australia (CiSRA), Australia), *Sri Parameswaran (University of New South Wales, Australia) |
Abstract |
The Multiprocessor System-on-Chip (MPSoC) paradigm as a viable implementation platform for parallel processing has expanded to encompass embedded devices. The ability to execute code in parallel gives MPSoCs the potential to achieve high performance with low power consumption. In order for sequential legacy code to take advantage of the MPSoC design paradigm, it must first be partitioned into data flow graphs (such as Kahn Process Networks --- KPNs) to ensure the data elements can be correctly passed between the separate processing elements that operate on them. Existing techniques are inadequate for use in complex legacy code. This paper proposes SDG2KPN, a System Dependency Graph to KPN conversion methodology targeting the conversion of legacy code. By creating KPNs at the granularity of the function-/procedure-level, SDG2KPN is the first of its kind to support shared and global variables as well as many more program patterns/application types. We also provide a design flow which allows the creation of MPSoC systems utilizing the produced KPNs. We demonstrate the applicability of our approach by retargeting several sequential applications to the Tensilica MPSoC framework. Our system parallelized AES, an application of 950 lines, in 4.8 seconds, while H.264, of 57896 lines, took 164.9 seconds to parallelize. |
Title |
(Invited Paper) Low Power Design of the Next-Generation High Efficiency Video Coding |
Author |
*Muhammad Shafique, Jörg Henkel (Karlsruhe Institute of Technology, Germany) |
Abstract |
This paper provides a comprehensive analysis of the computational complexity, temperature, and memory access behavior for the next-generation High Efficiency Video Coding (HEVC) standard. We highlight the associated design challenges and present several low-power algorithmic and architectural techniques for developing power-efficient HEVC-based multimedia system. We explore the interplay between the algorithms and architectures to provide high power efficiency while leveraging the application-specific knowledge and video content characteristics. |
Title |
(Invited Paper) Mapping Complex Algorithm into FPGA with High Level Synthesis |
Author |
*Kazutoshi Wakabayashi, Takashi Takenaka, Hiroaki Inoue (NEC Corp., Japan) |
Abstract |
This presentation discusses on the comparison between “Reconfigurable Chip with High Level Synthesis” and “CPU, GPCPU with compiler such as CUDA” from the compiler perspective. Initially, we introduce several demands for acceleration with FPGA to achieve low latency calculation and control. As an application example, we show a High Frequency Trading. We accelerate it by FPGA NIC with C-based and SQL-based HLS, and show the necessity of high level language customizable reconfigurable chip. Then, we illustrate the difference of FPGA and processor (CPU, GPGPU) with the “FSM+Datapath” model and examine how the architecture difference affects delay and parallelism of operations. Next, we discuss parallelization of operations, threads with High Level Synthesis for FPGA and software compiler for processors. The main advantage of the former method is it is able to parallelize operations beyond control dependencies while the latter method has to obey control dependencies. Finally, some experimental results prove that “FPGA and HLS” generate better performance than a processor for control intensive algorithm. |
Title |
(Invited Paper) Leveraging Parallelism in the Presence of Control Flow on CGRAs |
Author |
Jihyun Ryoo, Kyuseung Han, *Kiyoung Choi (Seoul National University, Republic of Korea) |
Abstract |
Coarse-Grained Reconfigurable Architectures (CGRAs) are suitable for accelerating data-intensive applications in embedded systems due to high performance and power efficiency. However, as application programs become complex having more control flows in them, it becomes harder to accelerate such programs on CGRAs. Previous researches on this issue have focused on correct execution of control flows rather than their acceleration. This paper reveals how control flows degrade the performance of programs and proposes a software approaches to accelerating control flows by exploiting parallelism residing in each conditionals as well as among conditionals. Experiments show that our proposed techniques improve performance by 2.51 times on average. |
Session 5S
Special Session: Billion Chips of Trillion Transistors
Time: 13:50 - 15:30 Wednesday, January 22, 2014
Organizer: Chen-Yong Cher (IBM TJ Watson Research Center, U.S.A.)
5S-1 (Time: 13:50 - 14:20)
Title |
(Invited Paper) Soft Error Resiliency Characterization on IBM BlueGene/Q Processor |
Author |
*Chen-Yong Cher, K. Paul Muller, Ruud A. Haring, David L. Satterfield, Thomas E. Musta, Thomas M. Gooding, Kristan D. Davis, Marc B. Dombrowa, Gerard V. Kopcsay, Robert M. Senger, Yutaka Sugawara, Krishnan Sugavanam (IBM T. J. Watson Research Center, U.S.A.) |
Abstract |
Fault injection through accelerated irradiation is an effective way to evaluate the overall soft error resiliency of microprocessors. In this work, we report on irradiation experiments on a Blue Gene/Q (BG/Q) compute processor chip running selected applications. Blue Gene/Q is the third generation of IBM’s massively parallel, energy efficient Blue Gene series of supercomputers. In the experiments, we found 26 code fails that are relevant for the calculation of the mean-time-between-failures (MTBF) for a 20 PetaFLOP, 96 rack system running a comparable workload mix. The expected MTBF for check-stops due to cosmic radiation and alpha particles from chip packaging materials is calculated to be 51 days for sea-level at New York City running the application mix studied. If the most vulnerable application is run exclusively, the projected MTBF is 35 days. These are outstanding results for a machine of this magnitude. The beaming experiment and projected MTBF validate the necessity to include autonomous hardware detection and recovery at the cost of design effort, silicon area and power. |
Title |
(Invited Paper) Resiliency for Many-Core System on a Chip |
Author |
*Tanay Karnik, James Tschanz, Nitin Borkar, Jason Howard, Sriram Vangal, Vivek De, Shekhar Borkar (Intel Corporation, U.S.A.) |
Abstract |
Resilient techniques are commonly employed for dynamic and static variation tolerance. In this paper, we present an adaptive clocking technique that achieves 31% throughput increase with 15% energy reduction, and an adaptive interconnect fabric technique that increases bandwidth by 63% with 14.6% energy reduction. We also discuss variations in many-core microprocessors and some techniques to enable a resilient many-core system on a chip. |
Title |
(Invited Paper) Rethinking Error Injection for Effective Resilience |
Author |
Shahrzad Mirkhani (University of Texas, U.S.A.), Hyungmin Cho, Subhasish Mitra (Stanford University, U.S.A.), *Jacob Abraham (University of Texas, U.S.A.) |
Abstract |
Soft errors, caused by radiation, have become a major challenge in today’s computer systems and networking equipment, making it imperative that systems be designed to be resilient to errors. Error injection is a powerful approach to evaluate system resilience, and current practice is to inject errors in architectural registers of processors, program variables of applications, or storage elements in the hardware model. This paper, using answers to frequently asked questions, discusses the need for rethinking conventional approaches to error injection, showing data from recent research and our simulation results. Approaches to improving current error injections are also suggested. |
Session 6S
Special Session: Overcoming Major Silicon Bottlenecks: Variability, Reliability, Validation and Debug
Time: 15:50 - 17:30 Wednesday, January 22, 2014
Organizer: Subhasish Mitra (Stanford University, U.S.A.)
Title |
(Invited Paper) Accurate and Inexpensive Performance Monitoring for Variability-Aware Systems |
Author |
Liangzhen Lai, *Puneet Gupta (UCLA, U.S.A.) |
Abstract |
Designing reliable integrated systems has become a major challenge with shrinking geometries, increasing fault rates and devices which age substantially in their usage life. The proposed research is motivated by the observation many of the infield failures are delay failures and several variability signatures are also delay-related. The origins of temporal delay fluctuations include manufacturing variability, voltage/temperature changes, negative or positive bias temperature instability-related Vth degradation, etc. Since the actual delay changes depend on process variations as well as workload, on-chip monitoring may be the best way of predicting them. There is a need to monitor circuit performance during manufacturing as well as at runtime to predict achievable performance and warn against impending failures. Adaptive mechanisms in hardware and/or software can optimize the trade o between errors, energy and performance based on the feedback from runtime circuit performance monitors. This paper presents approaches for automated synthesis of design-dependent performance monitors. These monitors can be used to predict impending delay failures relatively inexpensively. For low-overhead monitoring, we propose multiple designdependent ring oscillators (DDROs) as smart canary structures which can reliably predict achievable chip frequency but with margins for local variations. Early silicon results indicate that DDROs can reduce delay monitoring error by 35% compared to conventional ring oscillators. To further improve the prediction (albeit at a higher overhead), we propose in-situ slack monitors (SlackProbe) which can match local variations as well at overheads much smaller than monitoring all sequential elements. SlackProbe reduces the number of monitors required by over 15X with 5% additional delay margin in several commercial processor benchmarks. Finally, we show an example of software testbed that demonstrates a variability-aware system that utilizes the hardware monitors and operates with both hardware and software adaptation. |
Title |
(Invited Paper) Quantifying Workload Dependent Reliability in Embedded Processors |
Author |
*Vikas Chandra (ARM, U.S.A.) |
Abstract |
With nearly three decades of continued CMOS scaling, the devices have now been pushed to their physical and reliability limits. Scaling to sub-20nm technology nodes changes the nature of reliability effects from abrupt functional problems to progressive degradation of the performance characteristics of devices and system components. The impact of unreliability results in time-dependent variability, directly translating into design uncertainty in manufactured chips. Further, application workloads can significantly affect the overall system reliability. In this work, we have analyzed aging effects on various design hierarchies of an embedded processor in 28nm running real-world applications. We have also quantified the dependencies of aging effects on switching-activity and power-state of workloads. Implementation results show that the processor timing degradation can vary from 2% to 11%, depending on the workload. |
Title |
(Invited Paper) QED Post-Silicon Validation and Debug: Frequently Asked Questions |
Author |
David Lin, *Subhasish Mitra (Stanford University, U.S.A.) |
Abstract |
During post-silicon validation and debug, one or more manufactured integrated circuits (ICs) are tested in actual system environments to detect and fix design flaws (bugs). According to several industrial reports, the costs of post-silicon validation and debug are rising faster than design costs. Hence, new systematic techniques are essential to overcome the rising costs of existing post-silicon validation and debug techniques. QED, an acronym for Quick Error Detection, is such a technique that effectively overcomes several post-silicon validation and debug challenges. QED systematically creates a wide variety of validation tests to quickly detect bugs, not only inside processor cores, but also in uncore components (i.e., components in an SoC that are neither processor cores nor co-processors) of multi-core system-on-chips. In this paper, we present a brief overview of QED through a series of frequently asked questions. |
Session 7S
Special Session: Brain Like Computing: Modelling, Technology, and Architecture
Time: 10:10 - 12:15 Thursday, January 23, 2014
Chair: Ahmed Hemani (KTH, Sweden)
Title |
(Invited Paper) Spiking Brain Models: Computation, Memory and Communication Constraints for Custom Hardware Implementation |
Author |
*Anders Lansner, Ahmed Hemani, Nasim Farahini (KTH, Sweden) |
Abstract |
We estimate the computational capacity required to simulate in real time the neural information processing in the human brain. We show that the computational demands of a detailed implementation are beyond reach of current technology, but that some biologically plausible reductions of problem complexity can give performance gains between two and six orders of magnitude, which put implementations within reach of tomorrow’s technology. |
Title |
(Invited Paper) Advanced Technologies for Brain-Inspired Computing |
Author |
*Fabien Clermidy, Rodolphe Heliot, Alexandre Valentian (CEA-LETI, France), Christian Gamrat, Olivier Bichler, Marc Duranton (CEA-LIST, France), Bilel Blehadj, Olivier Temam (INRIA, France) |
Abstract |
This paper aims at presenting how new technologies can overcome classical implementation issues of Neural Networks. Resistive memories such as Phase Change Memories and Conductive-Bridge RAM can be used for obtaining low-area synapses thanks to programmable resistance also called Memristors. Similarly, the high capacitance of Through Silicon Vias can be used to greatly improve analog neurons and reduce their area. The very same devices can also be used for improving connectivity of Neural Networks as demonstrated by an application. Finally, some perspectives are given on the usage of 3D monolithic integration for better exploiting the third dimension and thus obtaining systems closer to the brain. |
Title |
(Invited Paper) GPGPU Accelerated Simulation and Parameter Tuning for Neuromorphic Applications |
Author |
Kristofor D. Carlson, Michael Beyeler, *Nikil Dutt, Jeffrey L. Krichmar (UC Irvine, U.S.A.) |
Abstract |
Neuromorphic engineering takes inspiration from biology to design brain-like systems that are extremely low-power, fault-tolerant, and capable of adaptation to complex environments. The design of these artificial nervous systems involves both the development of neuromorphic hardware devices and the development neuromorphic simulation tools. In this paper, we describe a simulation environment that can be used to design, construct, and run spiking neural networks (SNNs) quickly and efficiently using graphics processing units (GPUs). We then explain how the design of the simulation environment utilizes the parallel processing power of GPUs to simulate large-scale SNNs and describe recent modeling experiments performed using the simulator. Finally, we present an automated parameter tuning framework that utilizes the simulation environment and evolutionary algorithms to tune SNNs. We believe the simulation environment and associated parameter tuning framework presented here can accelerate the development of neuromorphic software and hardware applications by making the design, construction, and tuning of SNNs an easier task. |
Title |
(Invited Paper) A Scalable Custom Simulation Machine for the Bayesian Confidence Propagation Neural Network Model of the Brain |
Author |
Nasim Farahini, *Ahmed Hemani, Anders Lansner (KTH, Sweden), Fabian Clermidy (CEA-LETI, France), Christer Svensson (Linköping University, Sweden) |
Abstract |
A multi-chip custom digital super-computer called eBrain for simulating Bayesian Confidence Propagation Neural Network (BCPNN) model of the human brain has been proposed. It uses Hybrid Memory Cube (HMC), the 3D stacked DRAM memories for storing synaptic weights that are integrated with a custom designed logic chip that implements the BCPNN model. In 22nm node, eBrain executes BCPNN in real time with 740 TFlops/s while accessing 30 TBs synaptic weights with a bandwidth of 112 TBs/s while consuming less than 6 kWs power for the typical case. This efficiency is three orders better than general purpose supercomputers in the same technology node. |
Session 8S
Special Session: Design Flow for Integrated Circuits using Magnetic Tunnel Junction Switched by Spin Orbit Torque
Time: 13:50 - 15:30 Thursday, January 23, 2014
Organizer: Mehdi Tahoori (Karlsruhe Institute of Technology, Germany)
8S-1 (Time: 13:50 - 14:15)
Title |
(Invited Paper) An Overview of Spin-Based Integrated Circuits |
Author |
Wang Kang (University Beihang, China/IEF, Université Paris-Sud, France), *Weisheng Zhao, Zhaohao Wang, Jacques-Olivier Klein, Yue Zhang, Djaafar Chabi (IEF, Université Paris-Sud, France), Youguang Zhang (Univ. Beihang, China), Dafiné Ravelosona, Claude Chappert (IEF, Université Paris-Sud, France) |
Abstract |
Conventional CMOS integrated circuits suffer from serve power and scalability challenges as technology node scales into ultra-deep-micron technology nodes. Alternative approaches beyond charge-only based circuits. In particular, spin-based devices or integrated circuits show promising merits to overcome these issues by adding the spin freedom of electrons to the electronic circuits. Spintronics has now become a hot topic in both academics and industrials. This paper overviews the status and prospects of spin-based integrated circuits under intense investigation and address particularly their merits and challenges for practical applications. |
Title |
(Invited Paper) Advances in Spintronics Devices for Microelectronics - from Spin-Transfer Torque to Spin-Orbit Torque |
Author |
*Shunsuke Fukami, Hideo Sato, Michihiko Yamanouchi, Shoji Ikeda, Fumihiro Matsukura, Hideo Ohno (Tohoku University, Japan) |
Abstract |
Recent advances in spintronics devices make it possible to open a new era of microelectronics. In this paper, we review the spintronics devices utilizing spin-transfer torques (STTs) and spin-orbit torques (SOTs) developed in recent years. The progresses of two-terminal STT device with CoFeB-MgO based magnetic tunnel junction (MTJ), three-terminal magnetic domain wall (DW) motion device with Co/Ni multilayer, and three-terminal SOT device with Cu-based channel are described. Integrated circuits with the developed spintronics devices are also reviewed. |
Title |
(Invited Paper) Hybrid CMOS/Magnetic Process Design Kit and SOT-Based Non-Volatile Standard Cell Architectures |
Author |
*Gregory Di Pendina, Kotb Jabeur, Guillaume Prenat (Spintec Laboratory, CEA-INAC/CNRS/UJF/G-INP, France) |
Abstract |
This paper gives an overview of hybrid CMOS/magnetic logic circuit design. We describe the magnetic devices, the expected advantages of using them beside CMOS to help to circumvent the incoming limits of VLSI circuits and the tools required to design such circuits, including Process Design Kit (PDK) and Standard Cells (SC). As a case of study, we particularly focus on a new and promising device technology based on Spin Orbit Torque (SOT) effect. |
Title |
(Invited Paper) Architectural Aspects in Design and Analysis of SOT-Based Memories |
Author |
Rajendra Bishnoi, Mojtaba Ebrahimi, Fabian Oboril, *Mehdi Tahoori (Karlsruhe Institute of Technology, Germany) |
Abstract |
Magnetic Random Access Memory (MRAM) and in particular SOT-MRAM is a promising emerging memory technology because of its various advantages. In this work, we provide an analysis of SOT-MRAM at circuit- and architecture-level, and compare SOT-MRAM with several other technologies. Our architecture-level analysis shows that a hybrid-combination of SRAM and SOT-MRAM for the L1- and L2-cache, respectively, can significantly reduce area and energy while the performance slightly increases. |
Session 9S
Special Session: The Role of Photons in Harming or Increasing Security
Time: 15:50 - 17:30 Thursday, January 23, 2014
Organizer: Francesco Regazzoni (University of Lugano, Switzerland), Edoardo Charbon (Delft University of Technology, Netherlands)
9S-1 (Time: 15:50 - 16:30)
Title |
(Invited Paper) The Role of Photons in Cryptanalysis |
Author |
*Juliane Krämer (University Berlin, Germany), Michael Kasper (Fraunhofer Institute for Secure Information Technology, Germany), Jean-Pierre Seifert (University Berlin) |
Abstract |
Photons can be exploited to reveal secrets of security ICs like smartcards, secure microcontrollers, and cryptographic coprocessors. One such secret is the secret key of cryptographic algorithms. This work gives an overview about current research on revealing these secret keys by exploiting the photonic side channel. Different analysis methods are presented. It is shown that the analysis of photonic emissions also helps to gain knowledge about the attacked device and thus poses a threat to modern security ICs. The presented results illustrate the differences between the photonic and other side channels, which do not provide fine-grained spatial information. It is shown that the photonic side channel has to be addressed by software engineers and during chip design. |
Title |
(Invited Paper) SPADs for Quantum Random Number Generators and Beyond |
Author |
Samuel Burri (EPFL, Switzerland), Damien Stucki (ID Quantique, Switzerland), Yuki Maruyama (Delft University of Technology, Netherlands), Claudio Bruschini (EPFL, Switzerland), Edoardo Charbon (Delft University of Technology, Netherlands), *Francesco Regazzoni (ALaRI - USI, Switzerland) |
Abstract |
This paper explores the design of a QRNG based on a massively parallel array of SPAD. The matrix comprises 512x128 independent cells that convert photons onto a raw bit-stream of random bits. The sequences are read out in a 128-bit parallel bus, concatenated, and pipelined onto a de-biasing filter. Reported results, achieved on the manufactured devices, show that the architecture can reach up to 5 Gbit/s while consuming 25pJ/bit, demonstrating scalability and performance for any RNG based on SPADs. |
Title |
(Invited Paper) Quantum Key Distribution with Integrated Optics |
Author |
*Mirko Lobino (Griffith University, Australia), Anthony Laing (University of Bristol, U.K.), Pei Zhang (Xi'an Jiaotong University, U.K.), Kanin Aungskunsiri, Enrique Martin-Lopez (University of Bristol, U.K.), Joachim Wabnig (Nokia Research Centre, U.K.), Richard W. Nock, Jack Munns, Damien Bonneau, Pisu Jiang (University of Bristol, U.K.), Hong Wei Li (Nokia Research Centre, U.K.), John G. Rarity (University of Bristol, U.K.), Antti O. Niskanen (Nokia Research Centre, U.K.), Mark G. Thompson, Jeremy L. O'Brien (University of Bristol, U.K.) |
Abstract |
We report on a quantum key distribution (QKD) experiment where a client with an on-chip polarisation rotator can access a server through a telecom-fibre link. Large resources such as photon source and detectors are situated at server-side. We employ a reference frame independent QKD protocol for polarisation qubits and show that it overcomes detrimental effects of drifting fibre birefringence in a polarisation maintaining fibre. |