#### DDR Memory Interference Optimization on Heterogeneous Multicore Platforms

CONCORDE PROJECT

Alfonso Mascareñas González 17/02/2022





## DDR3 SDRAM Device: DDR3 Addressing

- Minimum division: Columns, rows, banks and ranks.
- Banks can treat commands independently from each other



8 banks for







 $\tau_0$   $\tau_1$   $\tau_2$   $\tau_3$   $\tau_4$   $\tau_5$ 



Task Set



Task Set



Task Set

### Task and Memory Mapping Representation

#### The previous kind of maps can be represented as follows:

| Bank | 7 | 0 | 5 | 4 | 4 | 1 | 2 | 1 | 3 | 6 | 6  | 7  |
|------|---|---|---|---|---|---|---|---|---|---|----|----|
| Core | 0 | 7 | 4 | 5 | 5 | 1 | 3 | 1 | 2 | 6 | 6  | 0  |
| Task | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 |

#### Alternatively, in one single row:



#### Task and Memory Mapping Representation

C = core $\tau = task$ 

#### Or even:



# Multi-objective Optimization

How do we map tasks to cores and cores to banks? Is there a logic?

Objectives:

- <u>Increase Task Execution Parallelism</u> → *Minimize(Workload Variance)*
- <u>Decrease Maximum DDR Interference</u> → *Minimize(Maximum Interference Cost)*



 $\begin{aligned} &Minimize(Workload Variance) \Rightarrow \underline{Maximize}(Maximum Interference Cost) \\ &Minimize(Maximum Interference Cost) \Rightarrow \underline{Maximize}(Workload Variance) \end{aligned}$ 

#### Multi-objective Optimization

 $WCET_i$  = Worst Case Execution Time of  $\tau_i$  in isolation IC( $\tau_i$ ) = DDR memory interference cost for  $\tau_i$ PE = Processing element = Core

• Cost functions:

Workload Variance = 
$$\sqrt{\frac{\sum_{i=0}^{n-1} (WCET_i - avg(WCET))^2}{n}}$$

*Maximum Interference Cost* =  $max([IC(\tau_0), ..., IC(\tau_{n-1})])$ 

• Constraints:

 $Deadline_i > WCET_i + IC(\tau_i) + (WCET_j | PE_j = PE_i)$ 

# Meta-heuristics Multi-objective Optimization: Genetic algorithm



• Of all candidate offspring, the worst are discarded

# Meta-heuristics Multi-objective Optimization: Genetic algorithm



• Of all candidate offspring, the worst are discarded

#### Task/Memory Pareto Front – Keystone II

# At the end of the algorithms execution, we obtain different Pareto front

Task mapping on platform cores

The dominated solutions from all the algorithms are removed. Another Pareto front is obtained

Task mapping on platform cores



#### Task/Memory Map Evaluation – Keystone II

Some solutions are tagged for their evaluation

# The maximum interference is measured for each solution



#### Task mapping on platform cores

#### Keystone II Task Mapping Validation



Task Mapping Solutions

#### Task/Memory Pareto Front – Sitara AM5728

# At the end of the algorithms execution, we obtain different Pareto front

Task mapping on platform cores

The dominated solutions from all the algorithms are removed. Another Pareto front is obtained

Task mapping on platform cores



#### Task/Memory Map Evaluation – Sitara AM5728

Some solutions are tagged for their evaluation

#### Maximum Interference (Cycles) 10000 8000 7000 7000 0 0 **Best Solutions** >II III τv 0.5 1.5 2.0 2.5 0.0 1.0 3.0 1e9 Load Variance

#### Task mapping on platform cores

#### The maximum interference is measured for each solution

#### Sitara AM5728 Task Mapping Validation



**Task Mapping Solutions** 

## Task/Memory Mapping Logic

#### • Task-Core mapping:

- Stack most intensive tasks on the same core (sequential execution, i.e., no interference) → <u>Drawback</u> = Less parallelism
- Equally spread the tasks among the cores → <u>Drawback</u> = memory Interference
- Increase the number of active cores (more parallelism) → <u>Drawback</u>
  = Generally memory Interference
- Select the correct core type for the task (ARM,DSP): the execution time and interference vary → <u>Drawback</u> = None
- Core-bank mapping: Always try with a private bank and, when all of these are occupied (N°C > N°B), share bank

### Why should we go for the private banks?

• For instance, let's compare the worst-case read transmission time cost, for the intra-bank and inter-bank case:





# Why should we go for the private banks?

- Another example is the DDR memory PREcharge command worst-case interference cost (PRE):
  - $PRE^{intra} = t_{RP}$
  - $PRE^{inter} = 1$

where  $t_{RP} \gg 1$ 

- And yet another example related to the DDR memory ACTive command worst-case interference cost (ACT):
  - $ACT^{intra} = t_{RCD}$ •  $ACT^{inter} = t_{RRD}$

where  $t_{RCD} > t_{RRD}$ 

# Thank you for your attention