

# A Routability Constrained Scan Chain Ordering Technique for Test Power Reduction

#### X.-L. Huang, and J.-L. Huang

Laboratory of Dependable System Graduate Institute of Electronic Engineering National Taiwan University

#### Outline

#### Introduction

- Preliminaries
- The Proposed Technique
- Experimental Results
- Conclusion

# Introduction

- Scan-based testing is the most popular Design-for-Testability (DfT) technique for digital sequential circuits.
- Problems with scan-based testing
  - Elevated Power Consumption:
    - ATPG patterns usually cause higher logic switching activities.
    - Scan cells toggle at a higher frequency during the scan chain shifting operation.
  - Routability Degradation:
    - Extra scan cell connections.

# Test Power Management Techniques

- Reported approaches can be divided into following categories:
  - 1. Power-aware test pattern generation (for ATE or BIST).
  - 2. Test pattern and/or scan chain ordering
  - 3. Primary input control to suppress logic transitions
  - 4. Scan chain and/or clock scheme modification to suppress logic transitions.
- One may adopt a hybrid approach to maximize the test power reduction.



- The proposed technique is based on scan chain ordering.
- The goal is to find a scan cell chaining order to minimize the power dissipation during scan chain shifting operation without violating any user-specified routing constraint.
- Advantages:
  - No negative impact on the test time and fault coverage
  - Impacts less on the design flow
  - Can be easily combined with other power reduction techniques.

# Main Contributions

- Allows the user to explicitly set the routing constraints.
- The achievable power reduction is much less sensitive to the routing constraints.
- Can be easily integrated into the conventional design flow.

#### Outline

- Introduction
- <u>Preliminaries</u>
- The Proposed Technique
- Experimental Results
- Conclusion

# **Definitions of Power Consumption**

- Total Power (Energy)
  - The sum of power consumption during test application.
- Peak Power
  - The highest value of power at any given instance.

# Power Dissipation of Scan-Based Testing

- Power dissipation of scan-based testing is expensive, especially during the scan chain shifting operations.
- To reduce the overall test power, it is crucial to reduce the scan shifting power.
- The scan shifting power can be divided into two parts:
  - 1. The scan cell switching activities.
  - 2. The induced logic switching activities.

- Exact evaluation of the total logic switching activities during scan shifting is time-consuming.
- Studies have shown that the number of scan chain transitions and the induced logic element transition are fairly closely related.
- The number of scan chain switching activities is a good indication of the overall power consumption during scan shifting operations.

#### **Notations**

- **f**: The number of scan cells in the scan chain
- (*c*<sub>1</sub>, *c*<sub>2</sub>, ...*c*<sub>*f*</sub>) : The *f* scan cells
- $O = (o_1, o_2, \dots, o_f)$ : The scan chain ordering
- *V* = (*v*<sub>1</sub>,*v*<sub>2</sub>...,*v*<sub>f</sub>) : *f*-bit test pattern
- $R = (r_1, r_2, \dots, r_f)$ : *f*-bit test response

$$f = 4$$

$$V = 1 \ 0 \ 1 \ 1$$

$$R = 0 \ 1 \ 1 \ 0$$
Scan-In  $\longrightarrow C_2 \longrightarrow C_4 \longrightarrow C_1 \longrightarrow C_3 \longrightarrow$  Scan-Out
$$O = (2, 4, 1, 3)$$
Scan-In  $\rightarrow 0 \ 1 \ 1 \ 1$ 

$$1 \ 0 \ 0 \ 1 \rightarrow$$
Scan-Out

#### Transition Estimation of a Test Pattern

• WT(V) : Weighted transition of a test pattern V.

$$WT(V) = \int_{i=1}^{f-1} i \bar{7}(v_{oi} \dagger v_{oi+1})$$

- $\oplus$  detects if there is a transition between two bits  $v_{oi}$  and  $v_{oi+1}$ .
- "i" is the weight of the transition between  $v_{oi}$  and  $v_{oi+1}$ .
- A transition between  $v_{oi}$  and  $v_{oi+1}$  will cause *i* scan cells start from Scan-In pin to change their states

Scan-In 
$$\leftarrow C_1 \leftarrow C_2 \leftarrow C_3 \leftarrow C_4 \leftarrow$$
 Scan-Out  
 $V = 1 \qquad 0 \qquad 1 \qquad 1$   
 $v_{oi} \oplus v_{oi+1} = 1 \qquad 1 \qquad 0$   
 $Weight i = 1 \qquad 2 \qquad 3$   
 $WT(V) = 1*1 + 2*1 + 3*0 = 3$ 

## Transition Estimation of a Test Response

• *WT(R)* : Weighted transition of a test response R.

$$WT(R) = \int_{i=1}^{f-1} (f - i)\bar{7}(r_{oi} \dagger r_{oi+1})$$

- $\oplus$  detects if there is a transition between two bits  $r_{oi}$  and  $r_{oi+1}$ .
- "f-i" is the weight of the transition between  $r_{oi}$  and  $r_{oi+1}$ .
- A transition between  $r_{oi}$  and  $r_{oi+1}$  will cause *f-i* scan cells start from Scan-Out pip to change their states  $\Box \Box \Box \Box \Box \Box \Box \Box \Box \Box \Box$

Scan-In 
$$\leftarrow C_1$$
  $\leftarrow C_2$   $\leftarrow C_3$   $\leftarrow C_4$   $\rightarrow$  Scan-Out  
 $R = 0 \qquad 1 \qquad 0 \qquad 1$   
 $v_{oi} \oplus v_{oi+1} = \qquad 1 \qquad 1 \qquad 1$   
 $f - i = \qquad 3 \qquad 2 \qquad 1$   
 $WT(R) = \qquad 3 \qquad + \qquad 2 \qquad + \qquad 1 \qquad = \qquad 6$ 

## **Total Weighted Transitions**

• Weighted transitions of a set of *m* test vectors,  $V^1, V^2, ..., V^m$ 

$$WT(\{V^{1}, V^{2}, \dots, V^{m}\}) = \bigotimes_{j=1}^{m} \sum_{i=1}^{f-1} i \bar{7}(v_{oi}^{j} \dagger v_{oi+1}^{j})$$

• Weighted transitions of a set of *m* output responses,  $R^1, R^2, \ldots, R^m$ , where  $R^i = (r_1^i, r_2^i, ..., r_f^i)$ 

$$WT(\{R^{1}, R^{2}, \dots, R^{m}\}) = \bigotimes_{j=1}^{m} \int_{i=1}^{f-1} (f-i)\overline{7}(r_{oi}^{j} \dagger r_{oi+1}^{j})$$

- The transition weight assignment is different in these two equations.
  - One is to scan in and the other is to scan out of the scan chain.

# Effectiveness of Scan Chain Reordering

• Proper scan-chain ordering can significantly lower the test power consumption (66% reduction in the example).



#### Outline

- Introduction
- Preliminaries

### • <u>The Proposed Technique</u>

- Experimental Results
- Conclusion

## Integration to Conventional Design Flow



# Inputs of the Proposed Technique

- Test information
  - The sets of test patterns  $V^i$ 's and responses  $R^i$ 's.
- Flip-flop information
  - The locations of each scan cell  $(x_i, y_i)$ .
  - The power consumption factor  $p_i$ 
    - Dependent on the flip-flop size, load and induced switching activities in its fanout cone.
- Routing constraints
  - $\mathcal{L}_{max}$ , the maximum allowable scan chain length
  - $\ell_{max}$ , the maximum allowable Manhattan distance between two successive scan cells.

### Flow Chart of the Proposed Technique



- 1. For each scan cell  $c_i$ , add a vertex  $n_i$  to G.
- 2. For each pair of scan cells  $(c_i, c_j)$ ,  $i \neq j$ , add an edge  $e_{ij}$ between  $(n_i, n_j)$  if the Manhattan distance between  $c_i$  and  $c_j$  is less than  $\ell_{max}$ .
- 3. Associate with each edge  $e_{ij}$  the transition frequency  $T_{ij}$  and Manhattan distance  $D_{ij}$  between  $c_i$  and  $c_j$ .



The Manhattan distance D<sub>ii</sub> between c<sub>i</sub> and c<sub>i</sub> is defined as

$$D_{ij} = |x_i - x_j| + |y_i - y_j|$$

• The transition frequency  $T_{ij}$  is defined as  $T_{ij} = \frac{H(B_i, B_j)}{27m}$ 

- *m* is the number of test vectors. Multiples by 2 is because we have *m* test patterns and *m* test responses.
- **H** denotes the Hamming distance<sup>2</sup>  $(v_i, r_i, v_i^2, r_i^2, ..., v_i^m, r_i^m)$  **B**<sub>i</sub> consists of the corresponding bits in the test patterns and responses of  $c_i$ , i.e.,  $B_i =$

## Example of Cost Graph Construction

*B*<sub>1</sub>=010011

*B*<sub>2</sub>=110011

- Scan cell information
  - $c_1: (8, 9), p_1=1.2$
  - $-c_2$ : (17, 11),  $p_2$ =1.5
  - $-c_3$ : (21, 20),  $p_3$ =1.5
- Test information
  - $-(V_1, R_1)=(010, 110)$
  - $-(V_2, R_2)=(001, 000)$
  - $-(V_3, R_3)=(110, 111)$
- $\ell_{\rm max} = 16$



- For each scan cell add a vertex in the cost graph.
- Manhattan Distance

$$- D_{12} = |8 - 17| + |9 - 11| = 11$$
  

$$- D_{23} = |17 - 21| + |11 - 20| = 13$$
 Violation  

$$- D_{13} = |8 - 21| + |9 - 20| = 24$$

$$- T_{12} = H (010011, 110011)/6 = 1/6$$

$$- T_{23} = H (110011, 001001)/6 = 2/3$$

 $\Box (001001 01001 1) = 1/2$ 



# Finding Next Scan Cell

- A greedy heuristic is used to determine the next scan cell
  - 1. Construct the candidate set.
    - Scan cells that are adjacent to the current flip-flop and not ordered.
  - 2. Calculate the cost of each scan cell in the candidate set.
    - Cost is a function of the Manhattan distance and the transition frequency.
  - 3. The lowest cost one is selected as the next scan cell.
- Termination Condition
  - Empty candidate set
  - Append the selected scan cell to O may violate the routing constraint  $\mathcal{L}_{max}$ .

## **Cost Function**

• Let *c<sub>i</sub>* be the current flip-flop and *c<sub>j</sub>* be the scan cell in the candidate set. The cost function is defined as:

$$\cos t(i,j) = \frac{\alpha \ \overline{7}T_{ij} \ \overline{7}p_j}{p_{\max}} + \frac{\beta \ \overline{7}D_{ij}}{l_{\max}}$$

- $p_{max}$ : maximum scan cell power consumption factor.
- $-\alpha$  : Fixed at 100.
- $-\beta$ : Automatically adjusted by the algorithm.

## Example of Cost Graph Construction

- Scan cell information
  - $c_1$ : (8, 9),  $p_1$ =1.2 -  $c_2$ : (17, 11),  $p_2$ =1.5 pmax
  - $-c_3$ : (21, 20),  $p_3 = 1.5$
- Test information
  - $(V_1, R_1) = (010, 110)$
  - $(V_2, R_2) = (001, 000)$
  - $(V_3, R_3) = (110, 1^{1})$
- $\ell_{\text{max}} = 16$



- Suppose current flop-flop is  $c_2$  and  $\beta$ =5.
- Cost computation

$$c_{1}: \cos t(2,1) = \frac{\alpha \,\overline{7}T_{21}\overline{7}p_{1}}{p_{\max}} + \frac{\beta \,\overline{7}D_{21}}{l_{\max}} = 16.7$$



# Bias Adjustment

- $\alpha$  is fixed at 100.
- $\beta$  is set to 0 at the beginning.
  - To totally ignore the routing constraint at the first iteration.
- If the ordering process is terminated due to routing constraint violation, β will be increased by 1.
- Underlying Idea
  - To ignore the routing constraints unless necessary.

## Extension to Multiple Scan Chains

- The scan cells belonging to the same clock domain are usually located in the same region in the layout.
  - 1. Group the scan cells belonging to the same clock domain into clusters.
  - 2. Apply the proposed technique to each cluster for test power optimization.

#### **Outline**

- Introduction
- Preliminaries
- The Proposed Technique
- Experimental Results
- Conclusion

# Experiment Configuration

- Experiments are performed on 6 industrial designs.
- Each design contains only one scan chain with an initial scan chain ordering.
- Sets of test patterns and responses, FF information and routing constraints for these designs are given.
- Test patterns contain no don't care bits.

## **Circuit Statistics**

- **X-span & Y-span** Width and height of the smallest rectangle that encloses all of the flip-flops.
- ℓ<sub>max</sub> Maximum allowable Manhattan distance between two successive scan cells.
- $\mathcal{L}_{max}$  Maximum allowable scan chain length

| Design | # cell | X-span | Y-span | # pattern | <b>l</b> <sub>max</sub> | $\mathcal{L}_{max}$ |
|--------|--------|--------|--------|-----------|-------------------------|---------------------|
| 1      | 596    | 643    | 1252   | 150       | 648                     | 147124              |
| 2      | 596    | 990    | 1188   | 150       | 748                     | 130269              |
| 3      | 8755   | 2810   | 2338   | 150       | 1146                    | 1745662             |
| 4      | 6398   | 3781   | 1261   | 150       | 1536                    | 4158421             |
| 5      | 5994   | 1670   | 1774   | 150       | 1016                    | 1571524             |
| 6      | 53964  | 5670   | 5774   | 100       | 2693                    | 22638289            |

## **Experimental Results**

- Comparison between original and optimized ordering.
- Total power reduction is in the range of 37-48%.
- Peak power reduction is in the range of 10-22%.
- $\ell$  the maximum distance between two successive scan cells.
- $\mathcal{L}$  the total scan chain length.

| Design | Total Power<br>Reduction(%) | Peak Power<br>Reduction(%) | ℓ<br>Reduction(%) | <b>£</b><br>Reduction(%) | β  | CPU(sec) |
|--------|-----------------------------|----------------------------|-------------------|--------------------------|----|----------|
| 1      | 37.99                       | 11.94                      | 60.98             | 65.77                    | 11 | 2        |
| 2      | 39.57                       | 10.88                      | 60.99             | 54.34                    | 9  | 2        |
| 3      | 43.69                       | 16.37                      | 72.06             | 89.17                    | 12 | 368      |
| 4      | 44.69                       | 15.20                      | 67.63             | 85.05                    | 11 | 229      |
| 5      | 44.08                       | 15.14                      | 65.31             | 75.38                    | 12 | 280      |
| 6      | 48.19                       | 22.09                      | 77.20             | 94.21                    | 23 | 16,005   |

# Further Analysis – Impact of $\ell_{max}$

- Experiments are performed on design 2.
- $\ell_{\text{max}}$  varied from 2,000 to 400.
- $\mathcal{L}_{max}$  fixed at 240,000.
- No apparent degradation of power reduction is observed until *l*<sub>max</sub>=400, a rather stringent constraint.

| ℓ <sub>max</sub> | Total(%) | Peak(%) | l(%)  | L(%)  | β  |
|------------------|----------|---------|-------|-------|----|
| 2,000            | 41.85    | 13.93   | 6.59  | 16.42 | 7  |
| 1,600            | 41.69    | 13.67   | 16.34 | 23.60 | 6  |
| 1,200            | 41.44    | 10.06   | 37.06 | 30.77 | 6  |
| 800              | 41.35    | 12.62   | 57.83 | 21.18 | 2  |
| 400              | 34.18    | 6.58    | 79.70 | 77.66 | 24 |

# Further Analysis – Impact of $\mathcal{L}_{max}$

- Experiments are also performed on design 2.
- $\mathcal{L}_{max}$  varied from 240,000 to 80,000.
- $\ell_{\rm max}$  fixed at 800.
- Again, no apparent degradation of power reduction observed until  $\mathcal{L}_{max}$ =80,000.

| $\mathcal{L}_{max}$ | Total(%) | Peak(%) | l(%)  | L(%)  | β  |
|---------------------|----------|---------|-------|-------|----|
| 240,000             | 41.35    | 12.62   | 57.83 | 21.18 | 2  |
| 200,000             | 40.91    | 7.29    | 58.14 | 34.32 | 4  |
| 160,000             | 39.56    | 10.70   | 58.14 | 54.07 | 9  |
| 120,000             | 39.00    | 10.55   | 58.14 | 57.82 | 11 |
| 80,000              | 36.22    | 6.86    | 60.57 | 71.94 | 28 |

#### Outline

- Introduction
- Preliminaries
- The Proposed Technique
- Experimental Results
- Conclusion

## Conclusion

- A simple yet efficient routability constrained scan chain ordering technique for test power reduction is proposed.
- Experimental results on six industrial designs show significant power reduction.
- Furthermore, the algorithm is rather insensitive to routing constraints.

## Future Work

- Use commercial tools to get more accurate power consumption information.
- Improve the CPU time on larger designs by reducing the complexity of the proposed algorithm.
- Extend the algorithm to handle the patterns with don't care bits.

# Thanks for your attention!