News Focus
News Focus
icon url

mas

05/10/06 12:35 PM

#4816 RE: wbmw #4812

Conroe has 4MB of cache compared to 1MB for K8. 4 times the cache equals half the cache miss-rate i.e. a doubling of the cache hit effectiveness which is not negligible. I have just given you a link which confirms from Fred Weber himself the lions share of the performance improvement, 'first order of magnitude', is the improved memory latency. Yet still you persist in believing simplistically that all uarch improvements give the same improvement ! That is so ridiculous not even a newb would claim that, only a fool drowning in his stupidity would claim that. Here's another big friggin clue for you, numerous benchmarks have shown that K7 and K8 perform virtually identically in cache bound work. What would that tell an intelligent person ?
icon url

mmoy

05/10/06 12:35 PM

#4817 RE: wbmw #4812

> It's funny that you guys are willing to explain a 60%
> discrepancy by claiming it has to do with the cache,

I take it that you're talking about Conroe vs K8. In the
case of Conroes vs K8, we have a big bump in cache and
many microarchitectural improvements. I do not know what
the mix in improvement is but I would guess that the
large cache is a pretty important factor. I'm not claiming
that the 60% is due to the cache. It's not clear to me that
anyone else is making that claim either.

> rather than the more obvious explanation that the IMC does
> not contribute 30% to performance, but rather the
> micro-architectural enhancements of the K8 do.

You appear to be switching back to the discussion of K8 vs
K7 here. As I said before, I'm not that familiar with the
microarchitectural improvements from K7 to K8 because I got
interested in AMD processors with K8 and K7 was uninteresting
to me other than a number of people asking me to optimize for
it (which I declined to do).

The case for the IMC is pretty easy as compilation is memory-
intensive and the improvements on compilation from K7 to K8
(from what I've heard from other builders) and the improvements
from P4 to K8 are substantial (from my personal experience).
If you want to make a case for the microarchitectural improvements, it may help to enumerate those improvements
so that I could analyze where they would help with regard to
the process of compilation.

> Otherwise, you would expect the Xeon MP with 8M of cache
> to be a much stronger competitor to Opteron. But it's not.

In the particular case of Xeon, you have a much longer
pipeline with the results that stalls can be quite expensive.
You can correct me if I'm wrong but I don't believe that the
Xeon has the prefetching that Core 2 Duo has. And the Xeon
is a server chip which typically means that you have a much
bigger number of processes running concurrently which would
tend to flush the cache more frequently. On the desktop, 4 MB,
for typical desktop applications and a lot of math benchmarks
should contain the application. On a server benchmark with
a database running along with maybe twenty to thirty processes,
you're going to flush 8 MB pretty easily over and over and
over again.