Re: > Otherwise, you would expect the Xeon MP with 8M of cache
> to be a much stronger competitor to Opteron. But it's not.
In the particular case of Xeon, you have a much longer
pipeline with the results that stalls can be quite expensive.
You can correct me if I'm wrong but I don't believe that the
Xeon has the prefetching that Core 2 Duo has. And the Xeon
is a server chip which typically means that you have a much
bigger number of processes running concurrently which would
tend to flush the cache more frequently. On the desktop, 4 MB,
for typical desktop applications and a lot of math benchmarks
should contain the application. On a server benchmark with
a database running along with maybe twenty to thirty processes,
you're going to flush 8 MB pretty easily over and over and
over again.
Most people have a very naive view of cache and memory hierarchy, thinking that it can solve all problems, but you seem to have a much more enlightened approach. Clearly, the Xeon MP does not perform well for many reasons, but the cache (however big) is not a saving grace. It will help in some situations, but it's very workload dependent, and certainly there are better things to do these days with silicon real estate than to fill it all with cache. If you look at Core 2 Duo performance, I think you will find that very little of it comes from the doubling of cache (4M vs. 2M), and rather than most of it comes from the many design changes that went into the core. Similarly for K8 wrt K7, the IMC was the right solution for many reasons (especially the scalability it gives to memory bandwidth in a multi-socket system), but for the broad range of workloads on the single socket client, I don't think you should expect more than 5-10% performance, with the rest coming from the many optimizations made to the K8 core.