News Focus
News Focus
icon url

fastpathguru

05/11/06 4:26 PM

#4906 RE: wbmw #4902

It will be obvious to everyone very soon with Core 2 Duo that it's possible to increase the amount of ILP in an x86 processor to even exceed the benefit of an IMC by 20-40%. And it's not going to all be due to a large shared cache, either.

I think you mean "IPC", not "ILP" above. Having an IMC or larger cache has nothing to do with ILP, but can have great effect on IPC. (Improving the extraction of ILP from an x86 instruction flow can/will also improve IPC.)

And, though I'm not going to get down in the mud with you about it, I think it's intuitively obvious that reduced memory latency was a big fat juicy peice of fruit that got plucked with the addition of k8's IMC. It's something that helps EVERY SINGLE NON-CACHE-BOUND application (you've claimed it doesn't help, "on average"), and its benefit increases with the size of the workload. Yes, AMD's engineers worked hard <cue violins> on some aspects of k8's microarchitecture, but significant ILP "low-hanging-fruit" had already been plucked by k7.

That further gains in ILP are increasingly difficult is no secret: That's why there's a big shift towards applying silicon to parallelism, vs. extracting ILP from a single thread. (That's not to say there aren't gains to be made, they're just going to become more expensive and/or more specialized.)

I think (i.e. it's my opinion) that it's YOU who will be surprised when it turns out that more of Conroe's gains WILL come from the large cache than from the uarch improvements. IMHO.

fpg