Cache Memory

$Word = 4 bytes$
Cache Memory consists of $2^{n}$ entries called blocks (or lines)
$Main-Memory-Size = 2^{Address-bits}$ bytes (in MIPS, $Main-Memory-Size = 2^{32} bytes = 4 GB$ )
$Address-bits = Tag-bits + n + m + 2$ (in MIPS, $Address-bits = 32$ )
- $Tag-bits = Address-bits - lo g (\frac{Cache-Size _{Data} [ bytes ]}{K}) = Address-bits - n - m - 2$
  - $Tag-bits \geq 1$
- $Index-bits = n$ (for choose the set)
- $Block-offset-bits = m$ (aka: word-offset; for choose the word inside the block)
  - $m \geq 0$
- $Byte-offset-bits = 2$ (used for the byte part of the address)
Cache
- Cache size:
  - $data Cache-Size = K \times 2^{n + m + 2} bytes = K \times 2^{n + m + 5} bits$
  - $data + metadata Cache-Size = K \times 2^{n} \times (1 + tag-bits + 2^{m + 5}) bits$
  - $Compression Ratio = \frac{data + metadata Cache-Size}{data Cache-Size} = \frac{1 + tag-bits + 2 ^{m + 5}}{2 ^{m + 5}}$
- $Block-Size = 2^{m} words = 2^{m + 2} bytes = 2^{m + 5} bits$
- $Total-Blocks-in-Cache = 2^{n} \times K = \frac{data Cache-Size}{Block-Size}$
  - (this is also the number of valid-bits in the cache)
- $Total-Sets-in-Cache = 2^{n}$
- $Valid-bit = 1$
$K$ is the number of ways (for blocks) per set
- $K = \frac{data Cache-Size}{2 ^{n + m + 2}}$ (where cache size is in bytes)
- $K = 1$ is a direct-mapped cache
- $K = 2$ is a 2-way set-associative cache
When $n = 0$ is the cache is fully associative, and the number of sets is $2^{0} = 1$
For given cache size, when $K$ increases, $n$ (index-bits) decreases and the tag-bits increases
The ratio between the main memory size and the cache size (data) is $\frac{2 ^{tag-bits}}{K}$

Mapping Process

in general:

by taking modulo of a number with $2^{p}$ we get its $p$ LSB bits

by dividing a number with $2^{p}$ we get the number without its $p$ LSB bits (i.e. shifting right by $p$ bits)

NOTE

When $x$ is a word address then we use in $m$ , but when $x$ is a byte address then we use $m + 2$

Given we have a word address $x$ in main memory

The range of addresses in a block in main memory that contains $x$ is:
- $base address = ⌊ \frac{x}{2 ^{m}} ⌋ \times 2^{m}$
- $ending address = base address + 2^{m} - 1$
$⌊ \frac{x}{2 ^{m}} ⌋$ is the memory block number of $x$
$⌊ \frac{x}{2 ^{m}} ⌋ mod 2^{n}$ is the index (set number) to which $x$ will be mapped in the cache
$⌊ \frac{x}{2 ^{n + m}} ⌋$ is the tag of $x$
Alternatively, we can calculate the index as $⌊ \frac{x mod 2 ^{n + m + 2}}{2 ^{m + 2}} ⌋$

EXAMPLE

$m = 2$ (thus, $2^{4} = 16$ bytes per block)

$n = 6$ (thus, $2^{6} = 64$ sets=blocks, because $K = 1$ )

$x = 1200 = 0 b 10010110000$ is an address in main memory

(shift by m+2) by dividing $x$ by $2^{m + 2} = 2^{4} = 16$ , we get $75 = 0 b 1001011$ which is the block number in memory (index+tag)

$1200 \div 16 = 75$ (ignore the remainder, if there were any)

by takeing modulo to $75$ with $2^{n} = 2^{6} = 64$ , we get the index (set number, $n = 6$ bit) which is $11 = 0 b 001011$

$75 mod 64 = 11$

by dividing $75$ by $2^{n} = 2^{6} = 64$ , we get the tag which is $1$

$75 \div 64 = 1$ (ignore the remainder, if there were any)

another approache:

$1200 mod 1024 = 176$

$176 \div 16 = 11$

Performance

Computer Architecture: A Quantitative


10. Average memory-access time = Hit time + Miss rate Miss penalty 
11. Misses per instruction = Miss rate Memory access per instruction 
12. Cache index size: 2index = Cache size /(Block size Set associativity)

Cache CPI Misses per instruction Miss penalty

$Memory-Accesses = Hits + Misses$
$Hit Rate = \frac{Hits}{Memory-Accesses}$
$Miss Rate = 1 - Hit Rate$
The hit time is the time required to access a level of the memory hierarchy, including the time needed to determine whether the access is a hit or a miss.
The miss penalty is the time required to fetch a block into a level of the memory hierarchy from the lower level, including (1) the time to access the block, (2) transmit it from one level to the other, (3) insert it in the level that experienced the miss, and then (4) pass the block to the requestor.
$Miss Penalty = \frac{Memory-Access Time}{CCT} = Memory-Access Time \times CR$
$CPI = Base CPI + Miss Rate \times Miss Penalty$
(Average memory-access time) $AMAT = Hit time + Miss rate \times Miss penalty$
Ex.
- $Hit time = 2 cycles$
- $Hit rate = 92%$
- $Miss penalty = 100 cycles$
- $CCT = 1.5 ns$
- $AMAT = 2 + 0.08 \times 100 = 10 cycles$
- $CCT = 10 cycles \times 1.5 ns = 15 ns$
Ex.
- Given:
  - $CR = 4 GHz$
  - $Memory-Access Time = 100 ns$
  - $Base CPI = 1$
  - $miss rate = 2%$
- Answer:
  - $miss penalty = 100 ns \times 4 GHz = 400 cycles$
  - $CPI = 1 + 0.02 \times 400 = 9$
- Assume we add L2 cache:
  - $access time = 5 ns$
  - $miss rate = 0.5%$ (global miss rate to main mem)
  - $miss penalty = 5 ns \times 4 GHz = 20 cycles$
  - extra penalty 400 cycles
  - $CPI = 1 + (0.02 \times 20) + (0.005 \times 400) = 3.4$
  - speedup: $9/3.4 = 2.6$
Given:
- $miss-rate = {4% 2% data instructions$
- $miss-penalty = {100 cycles 200 cycles data instructions$
- $load/store frequency = 36%$ (instructions that access data memory)
- base CPI = 2
miss cycles per instruction
- data: $0.04 \times 0.36 \times 100 = 1.44$
- ins.: $0.02 \times 200 = 4$
total number of memory-stalls cycles per ins.: $1.44 + 4 = 5.44$
new CPI (with memory stalls): $2 + 5.44 = 7.44$
speedup: $\frac{2}{7.44} = 0.2688$
given:
- $CR = 4 GHz$
- $miss-rate = {4% 2% data instructions$
- $miss-penalty = {800 cycles 600 cycles data instructions$
- $load/store frequency = 40%$ (instructions that access data memory)
- $Base-CPI = 2.5 cc/ins$
find the CPU time of a program with $IC = 2 \times 1 0^{9}$ instructions
- miss cycles per instruction:
  - data: $0.04 \times 0.4 \times 800 = 12.8$
  - ins.: $0.02 \times 600 = 12$
  - total: $12.8 + 12 = 24.8$
- new CPI: $2.5 + 24.8 = 27.3$
- $CPU-Time = \frac{2 \times 1 0 ^{9}}{4 \times 1 0 ^{9}} \times 27.3 = 13.65 sec$
in general:
- $CPU Time = \frac{IC}{CR} \times (CPI_{base} + (MR_{data} \times f_{LS} \times MP_{data}) + (MR_{ins} \times MP_{ins}))$

N-level

$T_{avg} = i = 1 \sum N [(j = 1 \prod i - 1 (1 - h_{j})) h_{i} (k = 1 \sum i t_{k})]$

$N$ is the number of levels in the hierarchy, where level $1$ is the fastest and level $N$ is the slowest.
$h_{i}$ is the hit rate at level $i$ given that all previous levels were misses
- $h_{N} = 1$ (the last level is always hit)
$1 - h_{i}$ is the miss rate at level $i$
$t_{i}$ is the access time at level $i$

example:

$N = 3$
$h_{1} = 0.95, h_{2} = 0.99, h_{3} = 1$
$t_{1} = 2 ns, t_{2} = 10 ns, t_{3} = 1 0^{7} ns$

Explorer

Cache Memory

Mapping Process

Performance

N-level