Title | Allocation of FPGA DSP-Macros in Multi-Process High-Level Synthesis Systems |
Author | *Benjamin Carrion Schafer (The Hong Kong Polytechnic University, Hong Kong) |
Page | pp. 616 - 621 |
Keyword | High-Level Synthesis, Design Space Exploration, DSP-macros, FPGAs |
Abstract | High-Level Synthesis (HLS) is a single process synthesis method that has shown to produce very good results compared to hand coded RTL, especially for DSP-related applications. At the same time FPGAs are reaching capacities that allow entire systems to be implemented on them. Most of these systems are also DSP-related and make intensive use of the FPGAs’ embedded hardmacros (e.g. DSP-blocks). This works presents a method to efficiently allocate DSP-macros in multi-process systems created using HLS in order to minimize the overall area. The proposed method calculates the area sensitivity of each process when its multiply-accumulate (MAC) operations are either mapped onto the FPGA’s hardmacro or its configurable resources and allocates the available hardmacros across all processes. Experimental results show that our method creates very good results compared to the optimal solution at a negligible running time. |
Slides |
Title | Array Scalarization in High Level Synthesis |
Author | Preeti Ranjan Panda, *Namita Sharma (Indian Institute of Technology Delhi, India), Arun Kumar Pilania, Gummidipudi Krishnaiah, Sreenivas Subramoney, Ashok Jagannathan (Intel Technology India Pvt. Ltd., India) |
Page | pp. 622 - 627 |
Keyword | High level synthesis, Behavioral Synthesis, Array Scalarization |
Abstract | Parallelism across loop iterations present in behavioral specifications can typically be exposed and optimized using well known techniques such as Loop Unrolling. However, since behavioral arrays are usually mapped to memories (SRAM) during synthesis, performance bottlenecks arise due to memory port constraints. We study array scalarization, the transformation of an array into a group of scalar variables. We propose a technique for selectively scalarizing arrays for improving the performance of synthesized designs by taking into consideration the latency benefits as well as the area overhead caused by using discrete registers for storing array elements instead of denser SRAM. Our experiments on several benchmark examples indicate promising speedups of more than 10x for several designs due to scalarization. |
Slides |
Title | Data Compression via Logic Synthesis |
Author | *Luca Amaru, Pierre-Emmanuel Gaillardon (EPFL-LSI, Switzerland), Andreas Burg (EPFL-TCL, Switzerland), Giovanni De Micheli (EPFL-LSI, Switzerland) |
Page | pp. 628 - 633 |
Keyword | Logic Synthesis, Data Compression |
Abstract | Nowadays, most software and hardware applications are committed to reduce the footprint and resource usage of data. In this general context, lossless data compression is a beneficial technique that encodes information using fewer (or at most equal number of) bits as compared to the original representation. A traditional compression flow consists of two phases: data decorrelation and entropy encoding. Data decorrelation, also called entropy reduction, aims at reducing the autocorrelation of the input data stream to be compressed in order to enhance the efficiency of entropy encoding. Entropy encoding reduces the size of the previously decorrelated data by using techniques such as Huffman coding, arithmetic coding, and others. When the data decorrelation is optimal, entropy encoding produces the strongest lossless compression possible. While efficient solutions for entropy encoding exist, data decorrelation is still a challenging problem limiting ultimate lossless compression opportunities. In this paper, we use logic synthesis to remove redundancy in binary data aiming to unlock the full potential of lossless compression. Embedded in a complete lossless compression flow, our logic synthesis based methodology is capable to identify the underlying function correlating a data set. Experimental results on data sets deriving from different causal processes show that the proposed approach achieves the highest compression ratio compared to state-of-art compression tools such as ZIP, bzip2 and 7zip. |
Slides |
Title | Synthesis of Power- and Area-Efficient Binary Machines for Incompletely Specified Sequences |
Author | *Nan Li, Elena Dubrova (Royal Institute of Technology, Sweden) |
Page | pp. 634 - 639 |
Keyword | LFSR, NLFSR, binary machine, LBIST, PRNG |
Abstract | Binary Machines (BMs) are a generalization of Linear Feedback Shift Registers (LFSRs) in which a current state is a nonlinear function of the previous state. It is known how to construct a BM generating a given completely specified binary sequence. In this paper, we present an algorithm which can efficiently handle the case of incompletely specified sequences. Our experimental results show that it significantly outperforms the approaches based on all-0 or random fill in both area and power dissipation. On average, it reduces dynamic power dissipation twice compared to all-0 fill approach and 6 times compared to random fill approach. The presented algorithm can potentially be useful for many applications, including Logic Built-In Self Test (LBIST). |
Slides |
Title | Multi-Mode Trace Signal Selection for Post-Silicon Debug |
Author | *Min Li, Azadeh Davoodi (University of Wisconsin - Madison, U.S.A.) |
Page | pp. 640 - 645 |
Keyword | post-silicon debug, trace buffers |
Abstract | Trace buffers are used during post-silicon debug to allow restoring the internal signals of a chip via online tracing of a few state elements within a capture window. In this work, we show that the quality of restoration corresponding to a set of trace signals, selected for a single operating mode, may significantly degrade over the remaining operating modes of a design. This is the first work to study the multi-mode trace signal selection problem in order to maximize the restoration over all the operating modes. |
Slides |