Advanced Micro Devices Inc (AMD): Re: Actually the data you present shows...

Reply Private New

Replies (2) Next 10 Prev Next

Send PM Follow Ignore

Followers	29
Posts	25865
Boards Moderated	0
Alias Born	09/11/2002

wbmw

Re: dacaw post# 65166

Sunday, 11/06/2005 8:52:53 PM

Sunday, November 06, 2005 8:52:53 PM

Re: Actually the data you present shows him to be absolutely right.

Though the graph you present doesn't show any set-associative > 8-way it can be seen that the fully associative miss-ratio is essentially flat from 64K to infinite cache and the 8-way is flat from 128K onwards. (to my eye). 16-way and 1023-way must fall between the direct-mapped and the 8-way.

Like CJ said, you're reading the graph wrong. Pete presented two hypothetical scenarios:

- 512KB, 16-way SA
- 64KB, 1,023-way SA

At 64KB, 1,024-way is fully associative, because a cache line is 64B and there would be 1,024 total entries. I would think 1,023 is a typo, as you speculated (unless there is a subtlety that I'm missing....).

At any rate, if you look at the value of the yellow line at 64KB (call it node 1 value), and then look at the value between 256K and 1M where the yellow, purple, and blue lines converge (node 2), you'll see that the value is less than the node 1 case. Node 2 is actually the same miss rate for 4-way SA as well anything in between, so even a 4-way 512K cache will perform equally well to a fully associative cache of the same size, at least as far as SPEC_CPU2k goes.

Now here's the thing that Pete doesn't admit. The graphs assume a perfect eviction algorithm for all the associativities, and this is not a realistic case. The higher associativity you go, the more of a lookup penalty you have. With direct mapped cache (1-way associativity, basically), you don't have to look up the entries at all because you simply evict whatever line is in that location whenever new data comes in. In a 2-way cache, you need to find out which of two cache lines you want to evict. Preferably, you evict the one you haven't used in the longest time, which is what an LRU (Least Recently Used) algorithm does. For a 4-way cache, you need to look at all 4 entries before deciding which one to evict. As you can see, the more associativity you have, the longer the LRU algorithm is going to take to make its decision. In a fully associative cache (such as the 1,024-way one Pete mentions), the LRU lookup is going to take an ETERNITY. So in reality, fully associative caches can actually be SLOWER the more entries they have. Of course, there are other algorithms besides LRU, but often you end up evicting the cache lines you really need, which means your miss rate goes UP! So you can see there are many implementation issues in large caches, and you'll find that few if any cache designers will opt for more than 16-way SA in their caches, and I think AMD does it because they can afford a longer cycle delay time in their L2 if their main memory access is so much lower latency than Intel's. Larger caches can usually afford the longer lookup time, anyway, while smaller caches tend to use very small associativity ways. Pete is totally wrong in his arguments, but as usual, he'll never admit to it.