Cache Memory

There's a big problem that we have been carefully ignoring for quite a while now:

When the program needs to access memory, it looks first in the cache.
If the required value is there, we call it a cache hit.
When we have a cache hit, the program just keeps on truckin.
When we have a cache miss, we have to stall the CPU while we go out and get the value from main memory.
When this happens, we also install the value into the cache.
This exploits a property of programs known as temporal locality.
This is just a name for the observed fact that, when a particular memory address is accessed, the probability that it will be accessed again soon goes up.
So putting it into the cache when it's first used means that it will already be there if it's used again soon.
A surprisingly small cache can performs almost as well as having all the main memory be fast enough that no cache is needed. A hit rate upwards of 95% is common.
Having a main memory that fast would be prohibitively expensive is were possible. Issues involving physical size and the speed of light probably make it impossible, anyway.
Modern processors have the cache right on the same chip as the CPU, which reduces the distances that signals have to travel between CPU and cache.

In this organization, each memory location has only one place in the cache that it might be.
The cache is much, much smaller than main memory; this means that many, many memory locations are assigned to the same cache location.
Only one of them can be there at a time. That's obvious.
The secret is that, however much memory a program occupies, it is using only a small part of it at any given time.
By having the contents of the cache closely track the parts of memory that are actually in use, we get performance approaching that of really, really fast main memory.
Here's how the mapping between memory addresses and cache addresses works:
- divide the memory address by the size of the cache and use the remainder as the cache address
- the remainder will be a number in the range 0 .. (cache size - 1), which exactly corresponds to the range of cache addresses
- since the cache size is a power of 2, doing the division requires no work from the CPU
One detail that we need to talk about is that caches don't do bytes; if the CPU needs to fetch a byte, it gets the whole word from the cache and discards the other bits.
Because of this, caches never use the 2 rightmost bits in an address.

What we have been calling the cache address up to now is the index
When we discard the 2 rightmost bits, we get numbers that increase by 1 (not 4) as we go from one word to the next; we will call these numbers word numbers.
I'd like to call them word addresses, but that risks confusion, which we don't need.
A word's address is 32 bits, always with the rightmost 2 of them 0 bits. When we discard those 2 rightmost bits, we get a 30-but number that is one fourth of the 32-bit address.
That is, a word's address is 4 times its word number.
The 2 rightmost bits in a memory address (the ones that caches ignore) are known as the byte offset.
Let's discuss the term offset; it will come back soon.
- the byte offset makes a good example since we already know about words and bytes
- the byte offset can have one of these values: 0, 1, 2, 3
- that tells us the byte's position relative to the beginning of the word
- the address of the byte is the sum of the word's address and the byte's offset
- we can compute a byte's offset by dividing the byte's address by 4 (the number of bytes in a word)
The Fun-with-Address-Spaces pages apply to all this.

This is pretty simple since there's only one place in the cache that a word can go.
Take the word's address, throw away the 2 bits on the right; now we have its word number.
Divide that by the cache size; that gives us the cache index, which tells us where to put the word in the cache.
There's another question here: once it's in the cache, how do we know where it came from?
That is, how do we distinguish it from all the other addresses that map to the same cache index?
That's explained in the next page: The Tag Field.