News Focus
News Focus
Followers 4
Posts 217
Boards Moderated 0
Alias Born 02/17/2004

Re: wbmw post# 38279

Wednesday, 02/07/2007 12:15:07 PM

Wednesday, February 07, 2007 12:15:07 PM

Post# of 152297
Here's my take on that hack-job of an article...

AMD’s next-generation processor line, code-named Torrenza, has gone from a block diagram to living, breathing silicon. The first incarnation of AMD’s redesigned x86 CPU is Barcelona, that which your non-co-readers will call quad-core Opteron.

So the Torrenza code-name is for a processor line, which is now actual silicon. And that silicon is called Barcelona. Got it.

Barcelona is genius, a genuinely new CPU that frees itself entirely of the millstone of the Pentium legacy. It’ll do the same for you.

Oh give me a f***ing break....

Each of Barcelona’s four cores incorporates a new vector math unit referred to as SSE128 (128-bit streaming single-instruction-multiple-data extensions). I am aware that you only do quantum physics on weekends, but the potential for hardcore IT tasks such as encryption, compression, real-time analysis of high volumes of streaming business transactions, and wire-speed packet analysis is also the stuff of science fiction. Barcelona gives floating point operations their own schedulers (checkout lanes) and runs them twice as fast as 64-bit SSE did. AMD claims that Barcelona’s per-core floating point performance is more than 80 percent faster than the present Opteron. Benchmark that. And separating integer and floating-point schedulers also accelerates this thing called virtualization, which you may notice is a recurring theme for Barcelona.

No mention of the fact that Core 2 already does 128-bit SSE.

I'll give AMD one uarch advantage when it comes to a separate FP issue pipeline. Does anyone know if Opteron already has separate FP schedulers? Prescott had that, and is one way the IPC was maintained versus Northwood despite the much longer pipeline. Unfortunately, going back to a P6 derivative for Core caused this feature to get dropped (for now).

Nested paging tables is a per-core feature that will light the afterburners on x86 hardware virtualization. A paging table holds the map that translates virtual memory addresses to physical memory addresses, and each CPU core has only one. Virtual machines have to load and store their page tables as they get and lose their slice of the CPU. AMD solved the problem with nested paging tables. Simplified, each VM maintains its own paging table that stays fixed in place. Instead of loading and saving paging tables as your system flips from VM to VM, your system just supplies Barcelona with the ID of the virtual machine being activated. The CPU core flips page tables automatically and transparently. This is another feature that’s implemented for each core.

Will nested page tables really have that much of an impact on the everyday user? I admit I am unfamiliar with much of the hype surrounding virtualization.

Much fuss has been made about power efficiency, but the best of x86 power saving schemes is crude. They adjust the clock speed and the operating voltage of the entire CPU, and the selection of set points is small. Barcelona keeps this technique, but builds on it with inspiration from IBM and Transmeta. Barcelona blacks out power to individual portions of the chip that are idled, from in-core execution units to on-die bus controllers. This hasn’t made it into PCs before because it’s very difficult to manage light switches for several “rooms” individually and to make sure that, like a refrigerator light, whenever a door is opened, the light is on as if it’s been burning the whole time. Power savings from these schemes are dramatic. If Barcelona lacked this feature, it would still be a green CPU.

So Barcelona is the first x86 CPU to implement heavy clock-gating? Pure, unabashed fabrication. What a crock of sh*t.

I do admit, one concern I have is that AMD may one day expend the engineering resources to implement as much clock-gating as Intel has been doing since Willamette. If they ever do, they could greatly close the existing power gap.

Unlike Intel’s Core, Barcelona gives each core dedicated L2 cache

There is nothing inherently better about a shared L2 cache versus a separate L2 cache for each core. It's simply a tradeoff between optimizing for multi-core operation versus single-core. A shared L2 cache will allow any single-core application to have more cache available, thus increasing performance. Now consider that Intel's manufacturing advantage allows it to have a shared cache that is almost as big as AMD's 4 separate caches combined, and I'd say Intel is sitting pretty here.

Barcelona incorporates a redesign that reduces cache latency (access delays).

Wasn't the cache latency increased on Brisbane, to account for future larger cache sizes? Why do this if it won't be the same on Barcelona?

Barcelona adds Level 3 cache, a newcomer to the x86 and a page out of IBM’s POWER playbook. All four CPU cores in a Barcelona socket will share a single master catalog of recently-retrieved data. A three-level cache is a must-have for a multicore CPU, and that becomes obvious when you get a demo that switches L3 on and off.

With only 2M shared between all 4 cores, don't expect this to have much impact on multi-thread workloads.

Barcelona is a new CPU, not a doubling of cores and not extensions strapped on here and there. Get ready to be blown away long before its release, which is scheduled for midyear.

Core 2 still has more advanced Load/Store reordering techniques. Core 2 is still 4-wide issue versus Barcelona's 3-wide. Core 2 is available now, for a very reasonable price. Barcelona is still a powerpoint slide. That is the reality.
Volume:
Day Range:
Bid:
Ask:
Last Trade Time:
Total Trades:
  • 1D
  • 1M
  • 3M
  • 6M
  • 1Y
  • 5Y
Recent INTC News