# Session 8B: Design Space Exploration

## 8B-1 (Time: 13:30 - 13:55)

 Title ReSP: A Non-Intrusive Transaction-Level Reflective MPSoC Simulation Platform for Design Space Exploration Author Giovanni Beltrame (European Space Agency, The Netherlands), Cristiana Bolchini, *Luca Fossati, Antonio Miele, Donatella Sciuto (Politecnico di Milano, Italy) Abstract This paper presents ReSP, a multi-processor simulation platform based on SystemC and Python (which provides the platform with reflective capabilities). The designer has an easy way to specify the architecture of a system, simulate and perform automatic analysis on it. The overhead associated with Python intermediate layer is around 1%. The advantages of our approach are: (a) easy integration of external IPs (b) fine grain simulation control (c) effortless integration of tools for system analysis and design space exploration. Slides

## 8B-2 (Time: 13:55 - 14:20)

 Title Collaborative Hardware/Software Partition of Coarse-Grained Reconfigurable System Using Evolutionary Ant Colony Optimization Author *Dawei Wang, Sikun Li, Yong Dou (Nat’l Univ. of Defense Tech., China) Abstract The flexibility, performance and cost effectiveness of reconfigurable architectures have lead to its widespread use for embedded applications. Reconfigurable system design is very complex for multi-fields experts to collaborate on application algorithm design, hardware/software co-design and system decision. However, existing reconfigurable system design methods and environments can only support hardware/software co-design, ignoring the collaboration between multi-field experts. This paper presents a collaborative partition approach of coarse-grained reconfigurable system design using evolutionary ant colony optimization. We create a distributed collaborative design environment for system decision engineers, software designers, hardware designers and application algorithm developers. The method not only utilizes the advantages of ant colony optimization for searching global optimal solutions, but also provides a framework for multi-field experts to work collaboratively. Experimental results show that the method improves the quality and speed of hardware/software partition for coarse-grained reconfigurable system design. Slides

## 8B-3 (Time: 14:20 - 14:45)

 Title Design Space Exploration for a Coarse Grain Accelerator Author *Farhad Mehdipour, Hamid Noori (Kyushu Univ., Japan), Morteza Saheb Zamani (Amirkabir Univ. of Technology, Iran), Koji Inoue, Kazuaki Murakami (Kyushu Univ., Japan) Abstract In the design process of a reconfigurable accelerator employing in an embedded system, multitude parameters may result in remarkable complexity and a large design space. Design space exploration as an alternative to the quantitative approach can be employed to find a right balance between the different design parameters. In this paper, a hybrid approach is introduced to analytically explore the design space for a coarse grain accelerator and determine a wise design point exploiting data extracted from applications, quantitatively. It also provides flexibility for taking into account new design constraints as well as new characteristics of applications. Furthermore, this approach is a methodological approach which reduces the design time and results in a point which satisfies the design goals. Slides

## 8B-4 (Time: 14:45 - 15:10)

 Title Efficient Symbolic Multi–Objective Design Space Exploration Author *Martin Lukasiewycz, Michael Glaβ, Christian Haubelt, Jürgen Teich (Univ. of Erlangen-Nuremberg, Germany) Abstract Nowadays many design space exploration tools are based on Multi–Objective Evolutionary Algorithms (MOEAs). Beside the advantages of MOEAs, there is one important drawback as MOEAs might fail in design spaces containing only a few feasible solutions or as they are often afflicted with premature convergence, i.e., the same design points are revisited again and again. Exact methods, especially Pseudo Boolean solvers (PB solvers) seem to be a solution. However, as typical design spaces are multi–objective, there is a need for multi–objective PB solvers. In this paper, we will formalize the problem of design space exploration as multi–objective 0–1 ILP. We will propose (1) a heuristic approach based on PB solvers and (2) a complete multi–objective PB solver based on a backtracking algorithm that incorporates the non–dominance relation from multi–objective optimization and is restricted to linear objective functions. First results from applying our novel multi–objective PB solver to synthetic problems will show its effectiveness in small sized design spaces as well as in large design spaces only containing a few feasible solutions. For non–linear and large problems, the proposed heuristic approach is outperforming common MOEA approaches. Finally, a real world example from the automotive area will emphasize the efficiency of the proposed algorithms. No Slides

## 8B-5 (Time: 15:10 - 15:23)

 Title Scalable Unified Dual-Radix Architecture for Montgomery Multiplication in GF(P) and GF(2n) Author *Kazuyuki Tanimura, Ryuta Nara, Shunitsu Kohara, Kazunori Shimizu, Youhua Shi, Nozomu Togawa, Masao Yanagisawa, Tatsuo Ohtsuki (Waseda Univ., Japan) Abstract Modular multiplication is the most dominant arithmetic operation in elliptic curve cryptography (ECC), which is a type of public-key cryptography. Montgomery multiplication is commonly used as a technique for the modular multiplication and required scalability since the bit length of operands varies depending on the security levels. Also, ECC is performed in $GF(P)$ or $GF(2^n)$, and unified architectures for $GF(P)$ and $GF(2^n)$ multiplier are needed. However, in previous works, changing frequency or dual-radix architecture is necessary to deal with delay-time difference between $GF(P)$ and $GF(2^n)$ circuits of the multiplier because the critical path of $GF(P)$ circuit is longer. This paper proposes a scalable unified dual-radix architecture for Montgomery multiplication in $GF(P)$ and $GF(2^n)$. The proposed architecture unifies $4$ parallel radix-$2^{16}$ multipliers in $GF(P)$ and a radix-$2^{64}$ multiplier in $GF(2^n)$ into a single unit. Applying lower radix to $GF(P)$ multiplier shortens its critical path and makes it possible to compute the operands in the two fields using the same multiplier at the same frequency so that clock dividers to deal with the delay-time difference are not required. Moreover, parallel architecture in $GF(P)$ reduces the clock cycles increased by dual-radix approach. Consequently, the proposed architecture achieves to compute $GF(P)$ $256$-bit Montgomery multiplication in $0.23\mu s$. Slides