## Memory Access Reconstruction Based on Memory Allocation Mechanism for Source-Level Simulation of Embedded Software

Kun Lu, Daniel Müller-Gritschneder, Ulf Schlichtmann Lehrstuhl für Entwurfsautomatisierung Technische Universität München



Gefördert durch BMBF Projekt SANITAS (Förderkennzeichen. 01 M 3088)

Kun Lu, EDA, TUM

## Outline

- Background and motivation
- Related work
- Our apporach
  - Based on memory allocation principles
  - Reconstruct memory accesses for cache simulation
- Experimental results

## **Background – Use VPs for SW development**



HDL, e.g., SystemC

early development, integration, verification

## **Background – SW simulation with VP**



## **Background - Host-compiled SW simulation**



Kun Lu, EDA, TUM

## **Background - Host-compiled SW simulation**



Problem: data memory Rd/Wr addresses unknown!



## **Related work**

- □ Some get around the problem no data cache simulation
- Random cache miss for each memory access [Hwang, DATE'08, Lin, ASP-DAC'10]
- Consider only the global/static data [Pedram, IESS'09]
- Use host-machine address emulation [Kempf, DATE'06, Posadas, ASP-DAC'10]
  - Not all memory accesses can be emulated (e.g. those related to register spilling)
  - different data locality in host-machine and the target machine => inaccurate data cache simulation
- □ Worst-case address range [Stattelmann, DATE'12]

## **Problem still UNSOLVED**

**Our solution:** 

## **Exploit the Memory Allocation Mechanism**

- Hit the nail on the head

## **Basic memory allocation principles**







## Memory access addresses reconstruction

## - handle each case



# StackHeapData

□ When life is simple – the address is **sp-explicit** 





# Stack Heap Data

#### □ "**sp**" simulation









## Handel pointers – 1/2

Pointers used as function arguments



## Handel pointers – 2/2

#### Pointer arithmetic



## An example of the instrumented source code



## **Experimental results – benchmark simulaiton**

|            | Ĩ        | N       |          | i anaha: N /N           |           | d anahar N (N (N               |                            |
|------------|----------|---------|----------|-------------------------|-----------|--------------------------------|----------------------------|
|            | 1 cycles |         |          | I-cache: Naccess/Inmiss |           | u-caule. Inread/Inwrite/Inmiss |                            |
| SW         | ISS      | SLS     | error(%) | ISS                     | SLS       | ISS                            | SLS                        |
| fir        | 233939   | 233448  | -0.2     | 189980/13               | 189731/13 | 30254 / 1903 / 424             | 30254 / 1902 / <b>427</b>  |
| iir        | 98590    | 98481   | -0.1     | 76229/13                | 76226/13  | 15257 / 5256 / 71              | 15257 / 5256 / <b>69</b>   |
| jpegdct    | 97418    | 97475   | 0.06     | 66877/45                | 66895/45  | 16261 / 11256 / <b>99</b>      | 16261 / 11203 / <b>100</b> |
| isort      | 89910    | 89996   | 0.1      | 73139/6                 | 73177/6   | 8146 / 7953 / <b>26</b>        | 8152 / 7953 / <b>27</b>    |
| r2y_malloc | 134159   | 132919  | -0.92    | 114772/19               | 114772/18 | 2057 / 2063 / 517              | 2057 / 2061 / 518          |
| aes        | 12896    | 12867   | 0.22     | 7551/71                 | 7564/70   | 1665 / 1160 / <b>49</b>        | 1676 / 1159 / <b>51</b>    |
| edgeDetect | 1050008  | 1050527 | 0.05     | 879263/16               | 880443/15 | 155019 / 8397 / <b>281</b>     | 155081 / 8396 / 279        |
|            |          |         | ,        |                         |           |                                |                            |

#### isort – ISS simulation



#### isort – Host-compiled simulation



#### **Rgb2Yuv – ISS simulation**



#### **Rgb2Yuv – Host-compiled simulation**



#### JpegDCT – ISS simulation



JpegDCT – Host-compiled simulation



**EdgeDetect – ISS simulation** 



#### **EdgeDetect – Host-compiled simulation**



## Conclusion

- For compiled SW simulation: memory addresses extracted by exploiting memory allocation mechanism
  - data cache simulation made possible
  - enables TLM simulation
  - ensure overall cycle accuracy

# Thank you!