# Improving XOR-Dominated Circuits by Exploiting Dependencies between Operands #### Ajay K. Verma and Paolo Ienne Processor Architecture Laboratory (LAP) & Centre for Advanced Digital Systems (CSDA) Ecole Polytechnique Fédérale de Lausanne (EPFL) # Logic Synthesis: Limited Dependency Exploitation Logic synthesis tools are extremely good at optimizing the Boolean expressions containing AND, OR and NOT gates. $$ab(a+b)+ca+c \longrightarrow a+c$$ In case of XOR gate either expand the XOR gate in terms of AND and OR gate, or replace the XOR expression by a new variable. Expanding XOR gates might increase the expression size exponentially, and the expansion cannot be restored due to shortcomings of algebraic factoring. Only dependencies among the operands of AND and OR gates are utilized. # Multiplier: XOR Gates with Correlated Operands # Multiplier: XOR Gates with Correlated Operands 4 #### **Outline** - Related work. - Problem formulation. - Basic Idea. - Expanding XOR gates selectively. - Analysis and improvement of basic idea. - Broadening of selection criteria. - Results. - Conclusions. #### Related Work - Optimization of general XOR-dominated circuits. - Optimization of Reed-Muller form [Mishchenko01, ...] - 2-SPP form optimization [Bernasconi06] - BDD-based logic optimization [Sasao93, Yang00, ...] - Optimization of column compressors. - Carry-save addition [Wallace64] - Optimization using various size counters [Song91] - Proper scheduling of counters (TGA) [Oklobdzija96, Stelling98, ...] 6 ## Why Not Expand All XOR's $$P = a_0b_3a_3b_0 + a_0b_3a_1b_2 + a_1b_2a_3b_0$$ $$Q = a_2b_1(a_0b_3 \oplus a_1b_2 \oplus a_3b_0)$$ $$R = P \oplus Q \longrightarrow \overline{PQ}(P + Q)$$ Before expansion : 0.37 ns (138.2 µm²) After expansion : 0.26 ns (146.9 µm²) $$P = a_0a_1 + a_1a_2 + a_2a_3 + a_3a_4$$ $$Q = b_0b_1 + b_1b_2 + b_2b_3 + b_3b_4$$ $$R = P \oplus Q$$ Before expansion: 0.22 ns (58.8 µm<sup>2</sup>) After expansion : 0.27 ns (221.2 $\mu\text{m}^2$ ) Selective Expansion of XOR's is the key. #### **Problem Statement** Given a circuit consisting of AND, OR, NOT, and XOR gates, find a list of XOR gates which should be expanded to achieve smallest critical path delay after logic synthesis. Correlation factor: XOR gates with sufficient correlation between its operands must be expanded in order to reduce delay. ## Selective Expansion $$AB = 0 \to A \oplus B = A + B$$ Extremely correlated operands - Small expression: An expression that can be computed quickly. - XOR expansion: $$A \oplus B = \overline{AB}(A+B)$$ Relative sizes of (AB) and (A+B) are a good measure of correlation between A and B. # Selective Expansion Contd. $$P = a_0b_3a_3b_6 + a_0b_1a_1b_2 + a_1b_2a_3b_0$$ $$Q = a_2b_1(a_0b_2) a_1b_2 \oplus a_3b_0)$$ $$R = P \oplus Q$$ Quickly computable compared to P and Q Before expansion : 0.37 ns (138.2 $\mu$ m<sup>2</sup>) After expansion : 0.26 ns (146.9 $\mu$ m<sup>2</sup>) $$P = a_0 a_1 + a_1 a_2 + a_2 a_3 + a_3 a_4$$ $$Q = b_0 b_1 + b_2 b_2 + b_2 b_3 + b_3 b_4$$ $$R = P \oplus Q$$ $$P + Q = a_0 a_1 + a_1 a_2 + a_2 a_3 + a_3 a_4 + a_1 b_2 + b_2 b_3 + b_3 b_4$$ $$b_0 b_1 + b_1 b_2 + b_2 b_3 + b_3 b_4$$ $$PQ = \sum_{i=0}^{i=3} \sum_{j=0}^{j=3} a_i a_{i+1} b_j b_{j+1}$$ Slow computation compared to P and Q Before expansion : 0.22 ns (58.8 $\mu$ m<sup>2</sup>) After expansion : 0.27 ns (221.2 $\mu$ m<sup>2</sup>) ## Criteria for XOR Expansion ``` isExpansionUseful (operand A, operand B) \varepsilon = 1 - \frac{\min(D_{AB}, D_{A+B})}{\max(D_A, D_B)}; // correlation factor. \Delta = \frac{AR(D_{AB} + D_{A+B})}{AR(D_A + D_B)} - 1; // area penalty. // correlation between the operands must be significant as well as // expansion should not have huge area overhead. if (\varepsilon < \varepsilon_{\text{threshold}} \text{ or } \Delta > \Delta_{\text{threshold}}) return false: // area penalty per unit correlation must be small. if (\Delta / \epsilon > \kappa) return false; return true; ``` # Correlation between XOR-Operands and Rest of the Function Can Be Useful Local correlation: correlation between the operands of XOR. $$A \oplus B = \overline{AB}(A+B)$$ Global correlation: correlation between the rest of the expression and the operands of XOR. $$(A \oplus B) + C = (AB \to C) \cdot (A + B + C)$$ $$(A \oplus B) + C = (\overline{A} \cdot \overline{B} \to C) \cdot (\overline{AB} + C)$$ ## Global Correlation Example #### **Comparator function** (1) $$(A > B) \equiv (a_n > b_n) + (a_n = b_n)(a_{n-1} > b_{n-1}) + (a_n = b_n)(a_{n-1} = b_{n-1})(a_{n-2} > b_{n-2}) + \dots$$ $(A > B) \equiv (a_n \overline{b_n}) + (a_n \oplus \overline{b_n})(a_{n-1} \overline{b_{n-1}}) + (a_n \oplus \overline{b_n})(a_{n-1} \oplus \overline{b_{n-1}})(a_{n-2} \overline{b_{n-2}}) + \dots$ (2) $$(A > B) \equiv (a_n > b_n) + (a_n \ge b_n)(a_{n-1} > b_{n-1}) + (a_n \ge b_n)(a_{n-1} \ge b_{n-1})(a_{n-2} > b_{n-2}) + \dots$$ $(A > B) \equiv (a_n \overline{b_n}) + (a_n + \overline{b_n})(a_{n-1} \overline{b_{n-1}}) + (a_n + \overline{b_n})(a_{n-1} + \overline{b_{n-1}})(a_{n-2} \overline{b_{n-2}}) + \dots$ carry $$(A + \overline{B} + 1) = carry (A - B)$$ Global correlation converts a user friendly implementation into synthesis friendly implementation. # Overall Algorithm The expression tree is optimized iteratively by expanding XOR gates based on local and global correlation criteria. If there are more than XOR gates which satisfy the expansion criteria, a proper ordering of expansion is decided using a greedy heuristic. In order to measure local and global correlation, an estimator function is used to estimate the delay and area values. ### **Experimental Setup** # Results | ADPCM Decoder | | | |--------------------------------------------|----------------------|---------| | Commercial Optimizations (BOA) | 6556 μm² | 1.06 ns | | Selective Expansion | 6934 μm² | 0.90 ns | | 8 x 8-bit Multiplier | | | | DesignWare | 4488 μm² | 1.60 ns | | Three Greedy Approach (TGA) | 5996 μm² | 1.28 ns | | Selective Expansion | <b>7262</b> μm² | 1.02 ns | | Constant Multiplication (A x 7) | | | | A x 7 | 2587 μm² | 0.85 ns | | A + 2A + 4A | 3155 µm² | 0.72 ns | | 8A - A | 1941 µm² | 0.56 ns | | Selective Expansion (A + 2A + 4A as input) | 3018 µm <sup>2</sup> | 0.50 ns | | Selective Expansion (8A – A as input) | 2822 μm² | 0.52 ns | | 15-bit Comparator | | | | Commercial Optimizations | 515 μm² | 0.40 ns | | Selective Expansion | 466 µm² | 0.33 ns | ### Results Contd. #### 8x8-bit Multiplier: Delay Comparison ### Appropriate choice of Parameters Delay and hardware area values for different implementations of a 8 x 8-bit multiplier generated by Selective Expansion for different values of $\epsilon_{threshold}$ , $\Delta_{threshold}$ , $\kappa$ . #### Results Contd. $$p_{3} = (a_{0}b_{2}a_{2}b_{0} + a_{0}b_{2}a_{1}b_{1} + a_{1}b_{1}a_{2}b_{0}) \oplus a_{0}b_{3} \oplus a_{3}b_{0} \oplus a_{1}b_{2} \oplus a_{2}b_{1} \oplus a_{0}b_{0}a_{1}b_{1}(a_{0}b_{2} \oplus a_{1}b_{1} \oplus a_{2}b_{0})$$ #### Results Contd. $$\begin{array}{c} a_1b_2 \oplus a_2b_1 \oplus a_0b_0a_1b_1(a_0b_2 \oplus a_1b_1 \oplus a_2b_0) \\ & \begin{array}{c} \text{Optimizations based on simple dependency} \\ \hline (a_1b_2 \oplus a_2b_1) \oplus (a_0b_0a_1b_1(a_2 \oplus \overline{b_2}) \\ & \begin{array}{c} (a_1b_2 \oplus a_2b_1)a_0b_0a_1b_1(a_2 \oplus \overline{b_2}) \\ & a_0b_0a_1b_1a_2\overline{b_2} \rightarrow (a_1b_2 \oplus a_2b_1) \end{array} \end{array} \end{array} \right\} \begin{array}{c} \text{Local correlation} \\ \hline (a_1b_2 \oplus a_2b_1) + a_0b_0a_1b_1(a_2 \oplus \overline{b_2}) \\ & a_0b_0a_1b_1a_2\overline{b_2} \rightarrow (a_1b_2 \oplus a_2b_1) \end{array} \right\} \begin{array}{c} \text{Global correlation} \\ \hline (a_1b_2 \oplus a_2b_1) + a_0b_0a_1b_1(a_2 + \overline{b_2}) \\ & \begin{array}{c} \text{Optimum implementation} \\ \end{array} \end{array}$$ ### Conclusion We have shown that logic synthesis escapes certain kind of optimizations on XOR-dominated circuits. We present an algorithm which works as a front end to logic synthesis tool and transforms a given circuit into synthesis friendly circuit. Selective Expansion improves the speed of some arithmetic circuits such as 8 x 8-bit multiplier by 20% over state of art techniques.