Re: So the Torrenza code-name is for a processor line, which is now actual silicon. And that silicon is called Barcelona. Got it.
Not even close. Torrenza is AMD's codename for their work with socketed accelerators. It has nothing to do with Barcelona.
Re: No mention of the fact that Core 2 already does 128-bit SSE. I'll give AMD one uarch advantage when it comes to a separate FP issue pipeline. Does anyone know if Opteron already has separate FP schedulers? Prescott had that, and is one way the IPC was maintained versus Northwood despite the much longer pipeline. Unfortunately, going back to a P6 derivative for Core caused this feature to get dropped (for now).
Core 2 has not 1, but THREE SSE units and can issue commands to all three per cycle. This is at least equivalent to the current AMD approach, which has an FADD, FMUL, and FMISC unit. Some of the enthusiast press has learned from speaking to Intel that Intel's units aren't completely symmetric, but at the very least, I think it's safe to assume they are capable of the same FADD + FMUL + FMISC instruction stream (if not even more capable than this).
Re: Will nested page tables really have that much of an impact on the everyday user? I admit I am unfamiliar with much of the hype surrounding virtualization.
It depends on the virtualization workload. Some workloads have almost 0% overhead, meaning that nested page tables will have no benefit, since there is no overhead to save. Other workloads have as much as 40% overhead, and nested page tables is supposed to reduce this down to ~10% in these cases.
So, if you are working from equal baselines, nested page tables is superior. If you are starting from different baselines (i.e. Clovertown outperforms Barcelona in a given non-virtualized workload), then nested page tables won't ensure that the same workload virtualized would necessarily show the same benefit.
Re: So Barcelona is the first x86 CPU to implement heavy clock-gating? Pure, unabashed fabrication. What a crock of sh*t.
He isn't talking about clock gating here, but rather Barcelona's ability to use separate PLLs for each core (something I don't think Core 2 can do). For Core 2, I think both cores run at the same frequency, or one core enters stop/grant, while the other core is at a given frequency. AMD's approach will allow some cores to run at 50% duty cycle, while other cores run at 100% duty cycle, for example. I'm not sure how much of a benefit this will actually be.
Re: There is nothing inherently better about a shared L2 cache versus a separate L2 cache for each core. It's simply a tradeoff between optimizing for multi-core operation versus single-core. A shared L2 cache will allow any single-core application to have more cache available, thus increasing performance. Now consider that Intel's manufacturing advantage allows it to have a shared cache that is almost as big as AMD's 4 separate caches combined, and I'd say Intel is sitting pretty here.
Actually, I would go so far as to say Intel's shared cache approach is superior. I don't understand how a person could be naive enough to call AMD's private caches a better solution. After all, they went through the trouble of sharing the L3....
Re: Core 2 still has more advanced Load/Store reordering techniques. Core 2 is still 4-wide issue versus Barcelona's 3-wide. Core 2 is available now, for a very reasonable price. Barcelona is still a powerpoint slide. That is the reality.
You will experience a large amount of FUD leading up to the Barcelona launch, and 'Droids across the world will proclaim how many times faster than Clovertown it will be. 2x... 3x... oh, shoot, why not go for 10x...? I predict we'll hear at least one post from Pete Gerassi proclaiming some ridiculous performance number for Barcelona.
And if, in the end, Intel does win in the majority of benchmarks, I think it will be a huge blow to AMD's customer base. Personally, I don't know what the end tally will be like, and I will guess that it will be close enough to go either way. Barcelona does have some inherent advantages, but the slow clock and slow ramp will hurt AMD, even if all else goes according to plan. I think Intel has an opportunity to retain leadership. It will be difficult, but I would ensure that supplies of Clovertown are plentiful, and even if they need a 150W "benchmark special" running at 3.33GHz quad core, I think they should consider doing it, especially since the benchmark results for leadership products establish such a mindset for the rest of the product line.