Cache Memory
There's a big problem that we have been carefully ignoring for quite a while now:
In our examples we have 200 ps as the clock cycle time, and both tick 1 and tick 4 access memory,
which has a latency on the order of 15 ns.
That's right, we have 200 ps to perform an operation that takes 75 times that long.
That's obviously a problem.
Our answer to this problem is cache memory, which is a small amount of really fast memory that
keeps copies of the memory locations that are actually in use at any given time.
- When the program needs to access memory, it looks first in the cache.
- If the required value is there, we call it a cache hit.
- When we have a cache hit, the program just keeps on truckin.
- When we have a cache miss, we have to stall the CPU while we go out and get the value from main memory.
- When this happens, we also install the value into the cache.
- This exploits a property of programs known as temporal locality.
- This is just a name for the observed fact that, when a particular memory address is accessed,
the probability that it will be accessed again soon goes up.
- So putting it into the cache when it's first used means that it will already be there
if it's used again soon.
- A surprisingly small cache can performs almost as well as having all the main memory be fast enough
that no cache is needed. A hit rate upwards of 95% is common.
- Having a main memory that fast would be prohibitively expensive is were possible. Issues involving
physical size and the speed of light probably make it impossible, anyway.
- Modern processors have the cache right on the same chip as the CPU, which reduces the distances
that signals have to travel between CPU and cache.
The simplest organization for a cache is called direct mapping
- In this organization, each memory location has only one place in the cache that it might be.
- The cache is much, much smaller than main memory; this means that many, many memory locations
are assigned to the same cache location.
- Only one of them can be there at a time. That's obvious.
- The secret is that, however much memory a program occupies,
it is using only a small part of it at any given time.
- By having the contents of the cache closely track the parts of memory that are actually in use,
we get performance approaching that of really, really fast main memory.
- Here's how the mapping between memory addresses and cache addresses works:
- divide the memory address by the size of the cache and use the remainder as the cache address
- the remainder will be a number in the range 0 .. (cache size - 1), which exactly corresponds
to the range of cache addresses
- since the cache size is a power of 2, doing the division requires no work from the CPU
- One detail that we need to talk about is that caches don't do bytes;
if the CPU needs to fetch a byte, it gets the whole word from the cache and discards the other bits.
- Because of this, caches never use the 2 rightmost bits in an address.
Some More Terminology
- What we have been calling the cache address up to now is the index
- When we discard the 2 rightmost bits, we get numbers that increase by 1 (not 4)
as we go from one word to the next; we will call these numbers word numbers.
- I'd like to call them word addresses, but that risks confusion, which we don't need.
- A word's address is 32 bits, always with the rightmost 2 of them 0 bits.
When we discard those 2 rightmost bits, we get a 30-but number that is one fourth of the 32-bit address.
- That is, a word's address is 4 times its word number.
- The 2 rightmost bits in a memory address (the ones that caches ignore)
are known as the byte offset.
- Let's discuss the term offset; it will come back soon.
- the byte offset makes a good example since we already know about words and bytes
- the byte offset can have one of these values: 0, 1, 2, 3
- that tells us the byte's position relative to the beginning of the word
- the address of the byte is the sum of the word's address and the byte's offset
- we can compute a byte's offset by dividing the byte's address by 4
(the number of bytes in a word)
- The Fun-with-Address-Spaces pages apply to all this.
How it works
- This is pretty simple since there's only one place in the cache that a word can go.
- Take the word's address, throw away the 2 bits on the right; now we have its word number.
- Divide that by the cache size; that gives us the cache index,
which tells us where to put the word in the cache.
- There's another question here: once it's in the cache, how do we know where it came from?
- That is, how do we distinguish it from all the other addresses that map to the same cache index?
- That's explained in the next page: The Tag Field.