

Jonas Krautter, Mahta Mayahinia, Dennis R. E. Gnad, Mehdi B. Tahoori | 2022-01-20

INSTITUTE OF COMPUTER ENGINEERING - CHAIR OF DEPENDABLE NANO COMPUTING



KIT - University of the State of Baden-Wuerttemberg and National Research Centre of the Helmholtz Association

J. Krautter, M. Mayahinia, D. Gnad, M. Tahoori

#### **Motivation**



#### Memristive memory close to reaching SRAM/DRAM performance



J. Krautter, M. Mayahinia, D. Gnad, M. Tahoori

#### **Motivation**



Memristive memory close to reaching SRAM/DRAM performance

Major benefits: Power, density, efficiency, non-volatility



J. Krautter, M. Mayahinia, D. Gnad, M. Tahoori



- Memristive memory close to reaching SRAM/DRAM performance
- Major benefits: Power, density, efficiency, non-volatility
- Different emerging technologies: STT-MRAM, ReRAM, PCM, ...

J. Krautter, M. Mayahinia, D. Gnad, M. Tahoori



- Memristive memory close to reaching SRAM/DRAM performance
- Major benefits: Power, density, efficiency, non-volatility
- Different emerging technologies: STT-MRAM, ReRAM, PCM, ...
- Aside from challenges for manufacturability: Security a major concern!

J. Krautter, M. Mayahinia, D. Gnad, M. Tahoori

#### Motivation



- Memristive memory close to reaching SRAM/DRAM performance
- Major benefits: Power, density, efficiency, non-volatility
- Different emerging technologies: STT-MRAM, ReRAM, PCM, ...
- Aside from challenges for manufacturability: Security a major concern!
- Rowhammer is still a problem in DRAM...<sup>1</sup>

<sup>1</sup> Frigo et al., "TRRespass: Exploiting the Many Sides of Target Row Refresh", S&P 2021

J. Krautter, M. Mayahinia, D. Gnad, M. Tahoori

#### **Motivation**

• Reliable write is asymmetric



J. Krautter, M. Mayahinia, D. Gnad, M. Tahoori

- Reliable write is asymmetric
- $0 \rightarrow 1$  vs.  $1 \rightarrow 0$  have different delay/power



J. Krautter, M. Mayahinia, D. Gnad, M. Tahoori

- Reliable write is asymmetric
- $0 \rightarrow 1$  vs.  $1 \rightarrow 0$  have different delay/power
- This is the case for almost all memristive memory technologies



J. Krautter, M. Mayahinia, D. Gnad, M. Tahoori



- $0 \rightarrow 1$  vs.  $1 \rightarrow 0$  have different delay/power
- This is the case for almost all memristive memory technologies
- $\bullet$   $\Rightarrow$  Self-terminated write<sup>1</sup> proposed for performance benefits



<sup>&</sup>lt;sup>1</sup> Suzuki et al., "Cost-Efficient Self-Terminated Write Driver for Spin-Transfer-Torque RAM and Logic", IEEE Trans. Magn. 2014

J. Krautter, M. Mayahinia, D. Gnad, M. Tahoori



- $0 \rightarrow 1$  vs.  $1 \rightarrow 0$  have different delay/power
- This is the case for almost all memristive memory technologies
- $\blacksquare \Rightarrow Self-terminated write^1 proposed for performance benefits$
- $\blacksquare \Rightarrow$  Data-dependent timing can be exploited by an attacker!



<sup>&</sup>lt;sup>1</sup> Suzuki et al., "Cost-Efficient Self-Terminated Write Driver for Spin-Transfer-Torque RAM and Logic", IEEE Trans. Magn. 2014

J. Krautter, M. Mayahinia, D. Gnad, M. Tahoori

#### Outline

Background and Related Work



3 Results

Discussion and Conclusion



J. Krautter, M. Mayahinia, D. Gnad, M. Tahoori

#### Outline

Background and Related Work



Results

Discussion and Conclusion



J. Krautter, M. Mayahinia, D. Gnad, M. Tahoori

### **Memristive Memory Technologies**



Spin Transfer Torque Magnetic RAM (STT-MRAM):



[2 Parallel magnetization (LRS)

#### Resistive RAM (ReRAM):



Oxygen vacancy

Conducting filament (LRS)

Phase Change Memory (PCM):











J. Krautter, M. Mayahinia, D. Gnad, M. Tahoori

#### Self-Terminated Write Schemes



| Transition | Timing          | STT-MRAM/ReRAM encoding | PCM encoding |
|------------|-----------------|-------------------------|--------------|
| 0→0        | t <sub>ns</sub> | LRS→LRS                 | HRS→HRS      |
| 0→1        | t <sub>ss</sub> | LRS→HRS                 | HRS→LRS      |
| 1→0        | t <sub>fs</sub> | HRS→LRS                 | LRS→HRS      |
| 1→1        | t <sub>ns</sub> | HRS→HRS                 | LRS→LRS      |

Different transitions have different timing (technology-dependent)

J. Krautter, M. Mayahinia, D. Gnad, M. Tahoori

#### Self-Terminated Write Schemes



| Transition | Timing          | STT-MRAM/ReRAM encoding | PCM encoding          |
|------------|-----------------|-------------------------|-----------------------|
| 0→0        | t <sub>ns</sub> | LRS→LRS                 | HRS→HRS               |
| 0→1        | t <sub>ss</sub> | LRS→HRS                 | HRS→LRS               |
| 1→0        | t <sub>fs</sub> | HRS→LRS                 | LRS→HRS               |
| 1→1        | t <sub>ns</sub> | HRS→HRS                 | $LRS \rightarrow LRS$ |

Different transitions have different timing (technology-dependent)

 $\bullet \ t_{\rm NS} < t_{\rm fS} < t_{\rm SS}$ 

J. Krautter, M. Mayahinia, D. Gnad, M. Tahoori



| Transition | Timing          | STT-MRAM/ReRAM encoding | PCM encoding          |
|------------|-----------------|-------------------------|-----------------------|
| 0→0        | t <sub>ns</sub> | LRS→LRS                 | HRS→HRS               |
| 0→1        | t <sub>ss</sub> | LRS→HRS                 | HRS→LRS               |
| 1→0        | t <sub>fs</sub> | HRS→LRS                 | LRS→HRS               |
| 1→1        | t <sub>ns</sub> | HRS→HRS                 | $LRS \rightarrow LRS$ |

- Different transitions have different timing (technology-dependent)
- $\bullet \ t_{\rm NS} < t_{\rm fS} < t_{\rm SS}$
- Terminating the write after successful transition
  - $\Rightarrow$  Energy and performance benefits

J. Krautter, M. Mayahinia, D. Gnad, M. Tahoori



| Transition        | Timing          | STT-MRAM/ReRAM encoding | PCM encoding |
|-------------------|-----------------|-------------------------|--------------|
| 0→0               | t <sub>ns</sub> | LRS→LRS                 | HRS→HRS      |
| $0 \rightarrow 1$ | t <sub>ss</sub> | LRS→HRS                 | HRS→LRS      |
| 1→0               | t <sub>fs</sub> | HRS→LRS                 | LRS→HRS      |
| 1→1               | t <sub>ns</sub> | HRS→HRS                 | LRS→LRS      |

- Different transitions have different timing (technology-dependent)
- $\bullet \ t_{\rm NS} < t_{\rm fS} < t_{\rm SS}$
- Terminating the write after successful transition
  - $\Rightarrow$  Energy and performance benefits
- Performance benefit: Propagating the write time to architecture level

J. Krautter, M. Mayahinia, D. Gnad, M. Tahoori

#### **Related Work**



 First technology-specific attacks: Cold-boot attacks<sup>1</sup> (exploit non-volatility)

<sup>1</sup>Halderman et al., "Lest we remember: cold-boot attacks on encryption keys", Comm. of the ACM 2009

J. Krautter, M. Mayahinia, D. Gnad, M. Tahoori

#### **Related Work**



• Asymmetric read and write power can reveal data<sup>2</sup>



<sup>&</sup>lt;sup>1</sup>Halderman et al., "Lest we remember: cold-boot attacks on encryption keys", Comm. of the ACM 2009

<sup>&</sup>lt;sup>2</sup> lyengar et al., "Side channel attacks on STTRAM and low-overhead countermeasures", DFT 2016

J. Krautter, M. Mayahinia, D. Gnad, M. Tahoori

### **Related Work**



- Asymmetric read and write power can reveal data<sup>2</sup>
- Power side-channel attacks on STT-MRAM<sup>3</sup>



<sup>&</sup>lt;sup>1</sup>Halderman et al., "Lest we remember: cold-boot attacks on encryption keys", Comm. of the ACM 2009

<sup>&</sup>lt;sup>2</sup> lyengar et al., "Side channel attacks on STTRAM and low-overhead countermeasures", DFT 2016

<sup>&</sup>lt;sup>3</sup>Khan et al., "Side-Channel Attack on STTRAM Based Cache for Cryptographic Application", ICCD 2017

J. Krautter, M. Mayahinia, D. Gnad, M. Tahoori

### **Related Work**

- First technology-specific attacks: Cold-boot attacks<sup>1</sup> (exploit non-volatility)
- Asymmetric read and write power can reveal data<sup>2</sup>
- Power side-channel attacks on STT-MRAM<sup>3</sup>
- Bit-cell design to mitigate data-dependent leakage<sup>4</sup>



<sup>&</sup>lt;sup>1</sup>Halderman et al., "Lest we remember: cold-boot attacks on encryption keys", Comm. of the ACM 2009

<sup>&</sup>lt;sup>2</sup> lyengar et al., "Side channel attacks on STTRAM and low-overhead countermeasures", DFT 2016

<sup>&</sup>lt;sup>3</sup>Khan et al., "Side-Channel Attack on STTRAM Based Cache for Cryptographic Application", ICCD 2017

<sup>&</sup>lt;sup>4</sup>Dodo et al., "Secure STT-MRAM bit-cell design resilient to differential power analysis attacks", TVLSI 2019

J. Krautter, M. Mayahinia, D. Gnad, M. Tahoori

### **Related Work**

- First technology-specific attacks: Cold-boot attacks<sup>1</sup> (exploit non-volatility)
- Asymmetric read and write power can reveal data<sup>2</sup>
- Power side-channel attacks on STT-MRAM<sup>3</sup>
- Bit-cell design to mitigate data-dependent leakage<sup>4</sup>
- Cache-timing attacks with a similar threat model<sup>5,6</sup>
  - $\Rightarrow$  But they rely on distinguishing **cached** vs. **uncached** access!



<sup>&</sup>lt;sup>1</sup>Halderman et al., "Lest we remember: cold-boot attacks on encryption keys", Comm. of the ACM 2009

<sup>&</sup>lt;sup>2</sup> lyengar et al., "Side channel attacks on STTRAM and low-overhead countermeasures", DFT 2016

<sup>&</sup>lt;sup>3</sup>Khan et al., "Side-Channel Attack on STTRAM Based Cache for Cryptographic Application", ICCD 2017

<sup>&</sup>lt;sup>4</sup>Dodo et al., "Secure STT-MRAM bit-cell design resilient to differential power analysis attacks", TVLSI 2019

<sup>&</sup>lt;sup>5</sup>Osvik et al., "Cache attacks and countermeasures: the case of AES", RSA Conference 2006

<sup>&</sup>lt;sup>6</sup>Yarom et al., "FLUSH+RELOAD: A high resolution, low noise, L3 cache side-channel attack", USENIX 2014

J. Krautter, M. Mayahinia, D. Gnad, M. Tahoori

#### Outline

Background and Related Work





results

Discussion and Conclusion



J. Krautter, M. Mayahinia, D. Gnad, M. Tahoori

### **Attack Principle**

Basic principle: Write-After-Write



J. Krautter, M. Mayahinia, D. Gnad, M. Tahoori

### **Attack Principle**



Victim process has secret data (e.g. encryption key)



J. Krautter, M. Mayahinia, D. Gnad, M. Tahoori



- Basic principle: Write-After-Write
- Victim process has secret data (e.g. encryption key)
- Data resides in victim process address space, inaccessible to attacker

J. Krautter, M. Mayahinia, D. Gnad, M. Tahoori



- Basic principle: Write-After-Write
- Victim process has secret data (e.g. encryption key)
- Data resides in victim process address space, inaccessible to attacker
- Goal: Force victim to **overwrite** attacker data in cache
  - $\Rightarrow$  Timing side-channel for bitwise data extraction

J. Krautter, M. Mayahinia, D. Gnad, M. Tahoori



- Basic principle: Write-After-Write
- Victim process has secret data (e.g. encryption key)
- Data resides in victim process address space, inaccessible to attacker
- Goal: Force victim to **overwrite** attacker data in cache ⇒ Timing side-channel for bitwise data extraction
- $\Rightarrow$  Can be a **cache-miss** after the attacker filled the cache...



J. Krautter, M. Mayahinia, D. Gnad, M. Tahoori



- Basic principle: Write-After-Write
- Victim process has secret data (e.g. encryption key)
- Data resides in victim process address space, inaccessible to attacker
- Goal: Force victim to **overwrite** attacker data in cache ⇒ Timing side-channel for bitwise data extraction
- ... or overwriting the attacker data directly (cache-hit)...



J. Krautter, M. Mayahinia, D. Gnad, M. Tahoori

### **Attack Principle**



- Basic principle: Write-After-Write
- Victim process has secret data (e.g. encryption key)
- Data resides in victim process address space, inaccessible to attacker
- Goal: Force victim to overwrite attacker data in cache ⇒ Timing side-channel for bitwise data extraction

... or many other variants! (not exclusive to cache)

J. Krautter, M. Mayahinia, D. Gnad, M. Tahoori

### **Attack Principle**





• Attacker wants to know bit  $b_i$  with  $i \in \{0, 1, ..., 512\}$  (cache-line size)

J. Krautter, M. Mayahinia, D. Gnad, M. Tahoori

### **Attack Principle**





Attacker wants to know bit  $b_i$  with  $i \in \{0, 1, ..., 512\}$  (cache-line size)

• Cache filled with 1 except for  $b_i = 0$ 

J. Krautter, M. Mayahinia, D. Gnad, M. Tahoori





- Attacker wants to know bit  $b_i$  with  $i \in \{0, 1, ..., 512\}$  (cache-line size)
- Cache filled with 1 except for  $b_i = 0$
- Victim overwrites cache line
  - $\Rightarrow$  Write latency is  $t_{ns}$ ,  $t_{fs}$  or  $t_{ss}$  depending on  $b_i$

J. Krautter, M. Mayahinia, D. Gnad, M. Tahoori





- Attacker wants to know bit  $b_i$  with  $i \in \{0, 1, ..., 512\}$  (cache-line size)
- Cache filled with 1 except for  $b_i = 0$
- Victim overwrites cache line
  - $\Rightarrow$  Write latency is  $t_{ns}$ ,  $t_{fs}$  or  $t_{ss}$  depending on  $b_i$
- Victim execution time measurement using cycle counters (e.g. rdtsc)

J. Krautter, M. Mayahinia, D. Gnad, M. Tahoori

#### Simulation



Array level = Bit-cell level + address decoding + routing

<sup>&</sup>lt;sup>1</sup> Dong et al., "NVSim: A Circuit-Level Performance, Energy, and Area Model for Emerging Nonvolatile Memory", TCAD 2012 <sup>2</sup> Binkert et al., "The Gem5 Simulator", SIGARCH 2011

J. Krautter, M. Mayahinia, D. Gnad, M. Tahoori

#### Simulation



Array level = Bit-cell level + address decoding + routing

Architecture-level: Syscall Emulation and Full System

<sup>&</sup>lt;sup>1</sup>Dong et al., "NVSim: A Circuit-Level Performance, Energy, and Area Model for Emerging Nonvolatile Memory", TCAD 2012 <sup>2</sup>Binkert et al., "The Gem5 Simulator", SIGARCH 2011

J. Krautter, M. Mayahinia, D. Gnad, M. Tahoori



- Array level = Bit-cell level + address decoding + routing
- Architecture-level: Syscall Emulation and Full System
- Two ISA: ARMv8, x86\_64

<sup>&</sup>lt;sup>1</sup>Dong et al., "NVSim: A Circuit-Level Performance, Energy, and Area Model for Emerging Nonvolatile Memory", TCAD 2012 <sup>2</sup>Binkert et al., "The Gem5 Simulator", SIGARCH 2011

J. Krautter, M. Mayahinia, D. Gnad, M. Tahoori



- Array level = Bit-cell level + address decoding + routing
- Architecture-level: Syscall Emulation and Full System
- Two ISA: ARMv8, x86\_64
- 2-level cache architecture, both caches with self-terminated write

<sup>&</sup>lt;sup>1</sup>Dong et al., "NVSim: A Circuit-Level Performance, Energy, and Area Model for Emerging Nonvolatile Memory", TCAD 2012 <sup>2</sup>Binkert et al., "The Gem5 Simulator", SIGARCH 2011

J. Krautter, M. Mayahinia, D. Gnad, M. Tahoori



- Array level = Bit-cell level + address decoding + routing
- Architecture-level: Syscall Emulation and Full System
- Two ISA: ARMv8, x86\_64
- 2-level cache architecture, both caches with self-terminated write
- Cache line width is 64 bytes

<sup>&</sup>lt;sup>1</sup>Dong et al., "NVSim: A Circuit-Level Performance, Energy, and Area Model for Emerging Nonvolatile Memory", TCAD 2012

J. Krautter, M. Mayahinia, D. Gnad, M. Tahoori



- Array level = Bit-cell level + address decoding + routing
- Architecture-level: Syscall Emulation and Full System
- Two ISA: ARMv8, x86\_64
- 2-level cache architecture, both caches with self-terminated write
- Cache line width is 64 bytes
- Write latency for 64 × 8 bits is maximum of write latency for each bit (all bits written in parallel)

<sup>&</sup>lt;sup>1</sup>Dong et al., "NVSim: A Circuit-Level Performance, Energy, and Area Model for Emerging Nonvolatile Memory", TCAD 2012 <sup>2</sup>Binkert et al., "The Gem5 Simulator", SIGARCH 2011

J. Krautter, M. Mayahinia, D. Gnad, M. Tahoori

#### **Attack Variant 1**

• Attacker fills the cache with known pattern



J. Krautter, M. Mayahinia, D. Gnad, M. Tahoori

### **Attack Variant 1**



Attacker fills the cache with known pattern

Victim overwrites attacker data when secret is loaded into cache

```
void victim_code_v1() {
    secret[0] &= 0xFF;
}
```

J. Krautter, M. Mayahinia, D. Gnad, M. Tahoori

### **Attack Variant 1**



Attacker fills the cache with known pattern

Victim overwrites attacker data when secret is loaded into cache

```
void victim_code_v1() {
    secret[0] &= 0xFF;
}
```

Attacker measures data-dependent victim execution time

J. Krautter, M. Mayahinia, D. Gnad, M. Tahoori

### **Attack Variant 1**



Attacker fills the cache with known pattern

Victim overwrites attacker data when secret is loaded into cache

```
void victim_code_v1() {
    secret[0] &= 0xFF;
}
```

- Attacker measures data-dependent victim execution time
- $\Rightarrow$  Attacker learns bit  $b_i$  of secret

J. Krautter, M. Mayahinia, D. Gnad, M. Tahoori

### **Attack Variant 1**



Attacker fills the cache with known pattern

Victim overwrites attacker data when *secret* is loaded into cache

```
void victim_code_v1() {
    secret[0] &= 0xFF;
}
```

- Attacker measures data-dependent victim execution time
- $\Rightarrow$  Attacker learns bit  $b_i$  of secret
- Improved variant: Fill only the cache set where secret data resides

J. Krautter, M. Mayahinia, D. Gnad, M. Tahoori

#### **Attack Variant 2**



Attacker provides known pattern as input (e.g. chosen-plaintext)

J. Krautter, M. Mayahinia, D. Gnad, M. Tahoori

### **Attack Variant 2**



• Victim overwrites the attacker data directly in the cache

```
void victim_code_v2(uint8_t *ptr) {
   for (unsigned int i = 0; i < SIZE; i++)
        ptr[i] ^= secret[i];
}</pre>
```



J. Krautter, M. Mayahinia, D. Gnad, M. Tahoori

### **Attack Variant 2**



• Victim overwrites the attacker data directly in the cache

```
void victim_code_v2(uint8_t *ptr) {
   for (unsigned int i = 0; i < SIZE; i++)
        ptr[i] ^= secret[i];
}</pre>
```

Attacker measures data-dependent victim execution time



J. Krautter, M. Mayahinia, D. Gnad, M. Tahoori

### **Attack Variant 2**



• Victim overwrites the attacker data directly in the cache

```
void victim_code_v2(uint8_t *ptr) {
   for (unsigned int i = 0; i < SIZE; i++)
        ptr[i] ^= secret[i];
}</pre>
```

Attacker measures data-dependent victim execution time

•  $\Rightarrow$  Attacker learns bit  $b_i$  of secret



J. Krautter, M. Mayahinia, D. Gnad, M. Tahoori

#### Outline

Background and Related Work





Discussion and Conclusion



J. Krautter, M. Mayahinia, D. Gnad, M. Tahoori

### **Array Level Timings**



| Technology   | Pof   | Array-level timing (from NVSim) |                              |
|--------------|-------|---------------------------------|------------------------------|
| reciniology  | nei.  | $t_{fs} \ (1  ightarrow 0)$     | $t_{ss}$ (0 $ ightarrow$ 1)  |
| STT-MRAM (1) | [1,2] | $\sim$ 6.3 ns (7 cycles)        | ${\sim}7.6$ ns (8 cycles)    |
| STT-MRAM (2) | [3]   | ${\sim}$ 4.5 ns (5 cycles)      | $\sim$ 9.1 ns (10 cycles)    |
| PCM          | [4]   | ${\sim}50.5$ ns (51 cycles)     | $\sim$ 100.5 ns (101 cycles) |
| ReRAM        | [5]   | ${\sim}$ 25.5 ns (26 cycles)    | $\sim$ 125.5 ns (126 cycles) |

• Cycles reported for a 1 GHz clock (as simulated in gem5)

•  $t_{ns}$  (0  $\rightarrow$  0 and 1  $\rightarrow$  1) is one clock cycle for all technologies

<sup>4</sup>Fong et al., "Phase-Change Memory—Towards a Storage-Class Memory", TED 2017

<sup>5</sup>Chen et al., "A 16Mb dual-mode ReRAM macro with sub-14ns computing-in-memory and memory functions...", IEDM 2017

<sup>&</sup>lt;sup>1</sup> Dong et al., "A 1Mb 28nm STT-MRAM with 2.8ns read access time at 1.2V VDD ...", ISSCC 2018

<sup>&</sup>lt;sup>2</sup>Sato et al., "14ns write speed 128Mb density Embedded STT-MRAM with endurance > 10<sup>10</sup> and 10yrs retention...", IEDM 2018

<sup>&</sup>lt;sup>3</sup>Bishnoi et al., "Avoiding unnecessary write operations in STT-MRAM for low power implementation", ISQED 2014

| Data Leakage through  |
|-----------------------|
| Self-Terminated Write |
| Schemes in Memristive |
| Caches                |

J. Krautter, M. Mayahinia, D. Gnad, M. Tahoori

#### **Attack Byte-Transfer Rates**



|                        | Attack transfer rates (kB/s) |                  |                |  |
|------------------------|------------------------------|------------------|----------------|--|
| Technology/ISA         | Variant 1 (L1 write miss)    |                  | Variant 2      |  |
|                        | cache fill                   | set fill         | (L1 write hit) |  |
| Syscall Emulation Mode |                              |                  |                |  |
| STT-MRAM(1)/x86        | 0.050                        | 17.5             | 18.8           |  |
| STT-MRAM(2)/x86        | 0.049                        | 17.0             | 18.3           |  |
| PCM/x86                | 0.022                        | 7.4              | 6.8            |  |
| ReRAM/x86              | 0.019                        | 6.6              | 6.1            |  |
| Full System Mode       |                              |                  |                |  |
| STT-MRAM(1)/x86        | 0.048                        | $\times^{\star}$ | 2.0            |  |
| STT-MRAM(1)/ARM        | $\times^{\star}$             | $\times^{\star}$ | 2.8            |  |

\* No conclusive results were acquired, but more effort could lead to a successful attack.

J. Krautter, M. Mayahinia, D. Gnad, M. Tahoori

#### Outline

Background and Related Work



Results

Discussion and Conclusion



J. Krautter, M. Mayahinia, D. Gnad, M. Tahoori



- Attack depends on high-resolution timing measurement
  - $\Rightarrow$  Statistical methods if not available

J. Krautter, M. Mayahinia, D. Gnad, M. Tahoori



- Attack depends on high-resolution timing measurement
  - $\Rightarrow$  Statistical methods if not available
- Attack Variant 2 possible in systems without cache

J. Krautter, M. Mayahinia, D. Gnad, M. Tahoori



- Attack depends on high-resolution timing measurement
   Statistical methods if not available
- Attack Variant 2 possible in systems without cache
- Self-terminated write can be disabled as a countermeasure

| Benchmark    | Self-terminated write |                | Performance |
|--------------|-----------------------|----------------|-------------|
|              | enabled               | disabled       | loss        |
| blackscholes | 0.277 <i>s</i>        | 0.282 <i>s</i> | pprox 1.8%  |
| bodytrack    | 1.388 <i>s</i>        | 1.424 <i>s</i> | pprox 2.6%  |
| canneal      | 1.448 <i>s</i>        | 1.493 <i>s</i> | pprox 0.3%  |
| dedup        | 4.600 <i>s</i>        | 5.071 <i>s</i> | pprox 10.2% |

<sup>&</sup>lt;sup>1</sup>Sayed et al., "Opportunistic write for fast and reliable STT-MRAM", DATE 2017

J. Krautter, M. Mayahinia, D. Gnad, M. Tahoori

#### Discussion



- Attack depends on high-resolution timing measurement
   Statistical methods if not available
- Attack Variant 2 possible in systems without cache
- Self-terminated write can be disabled as a countermeasure

| Benchmark    | Self-terminated write |                | Performance |
|--------------|-----------------------|----------------|-------------|
|              | enabled               | disabled       | loss        |
| blackscholes | 0.277 <i>s</i>        | 0.282 <i>s</i> | pprox 1.8%  |
| bodytrack    | 1.388 <i>s</i>        | 1.424 <i>s</i> | pprox 2.6%  |
| canneal      | 1.448 <i>s</i>        | 1.493 <i>s</i> | pprox 0.3%  |
| dedup        | 4.600 <i>s</i>        | 5.071 <i>s</i> | pprox 10.2% |

Alternatively: More agressive (but balanced) write time optimization<sup>1</sup>

<sup>&</sup>lt;sup>1</sup>Sayed et al., "Opportunistic write for fast and reliable STT-MRAM", DATE 2017

J. Krautter, M. Mayahinia, D. Gnad, M. Tahoori



- Attack depends on high-resolution timing measurement
   Statistical methods if not available
- Attack Variant 2 possible in systems without cache
- Self-terminated write can be disabled as a countermeasure

| Benchmark    | Self-terminated write |                | Performance |
|--------------|-----------------------|----------------|-------------|
|              | enabled               | disabled       | loss        |
| blackscholes | 0.277 <i>s</i>        | 0.282 <i>s</i> | pprox 1.8%  |
| bodytrack    | 1.388 <i>s</i>        | 1.424 <i>s</i> | pprox 2.6%  |
| canneal      | 1.448 <i>s</i>        | 1.493 <i>s</i> | pprox 0.3%  |
| dedup        | 4.600 <i>s</i>        | 5.071 <i>s</i> | pprox 10.2% |

- Alternatively: More agressive (but balanced) write time optimization<sup>1</sup>
- Asymmetric power is still an issue against power side-channel attacks (but those are much harder to exploit remotely)

<sup>&</sup>lt;sup>1</sup>Sayed et al., "Opportunistic write for fast and reliable STT-MRAM", DATE 2017

J. Krautter, M. Mayahinia, D. Gnad, M. Tahoori

#### Conclusion



Memristive memories soon to be adopted in many devices

J. Krautter, M. Mayahinia, D. Gnad, M. Tahoori



- Memristive memories soon to be adopted in many devices
- Self-terminating write schemes proposed for energy/performance

J. Krautter, M. Mayahinia, D. Gnad, M. Tahoori



- Memristive memories soon to be adopted in many devices
- Self-terminating write schemes proposed for energy/performance
- We showed a security flaw introduced by self-terminating write

J. Krautter, M. Mayahinia, D. Gnad, M. Tahoori



- Memristive memories soon to be adopted in many devices
- Self-terminating write schemes proposed for energy/performance
- We showed a security flaw introduced by self-terminating write
- Attackers can read secret data at up to 20 kB/s

J. Krautter, M. Mayahinia, D. Gnad, M. Tahoori



- Self-terminating write schemes proposed for energy/performance
- We showed a security flaw introduced by self-terminating write
- Attackers can read secret data at up to 20 kB/s
- $\blacksquare \Rightarrow$  Keep security in mind when optimizing performance/power!



J. Krautter, M. Mayahinia, D. Gnad, M. Tahoori

## Thank you for your attention!

Questions? Write us an email!

{jonas.krautter,mahta.mayahinia,dennis.gnad,mehdi.tahoori}@kit.edu