InvestorsHub Logo
icon url

wbmw

08/28/08 3:07 AM

#85847 RE: pgerassi #85846

Re: Sure you are lying because the EE C2Q CPU power usage didn't include the chipset, VRMs, 2GB DDR3/1600 memory and all the other stuff on the MB running 3DMark06 HDR/SM3.0 in software. That is the CPU on its "board". They didn't look at just the 2 R770 GPUs, they looked at all that was on the GPU board's total power.

I know that, Pete, but it really doesn't make a difference to the end user. They can configure a system without the graphics card, but they very well can't configure a system without a processor.

From an end-user's perspective, adding a >250W video card comes with an incremental >250W to their system. Choosing a 130W CPU over a 65W CPU, however, comes as an incremental 65W to the system.

That's been my point - that ATI seems to be increasing power haphazardly to get the highest performance. That may be ok for some niche enthusiast users, but it's not ok for the vast majority of PC users. And their more reasonable <100W cards like the 3850 come with far less computational capabilities than the high end cards that everybody likes to talk about.

Re: Absolute power is meaningless unless you look at performance too

More power is ok up to a point, as long as it offers more performance. But past a certain point, the costs to cool and deliver power to such an extreme device makes it unsuitable for most of the market.

Re: But the 4870x2 will only be used for GPU and GPGPU work, read HPC of very parallel loads. They don't check highly serial loads that push against Amdahl's Law. In that area the EE C2Q doesn't do well either given its high power usage. It loses against a fast single core Athlon 64 using far less power because of the high latency to memory of FSB attached chipset memory.

Only in your alternate reality, I'm afraid. In serial workloads, the Core 2 architecture and higher frequencies allows the C2Q to blow past any K8 or K10 core in existence. That's why Intel implemented large caches and a prefetching system that actually works and delivers results. Latency to memory is minimized, and FSB traffic is reduced. Intel made it work, and years later you still haven't noticed the reviews showing Intel 50% ahead of AMD in the benchmarks....

Re: There a 2.4Tflops processor even at 300W is quite efficient in a Tflops per watt basis. Even the 1Tflops GTX280 is efficient, just not as high. The 25.6Gflops EE C2Q is nowhere even close at over 200W (all using SP).

LOL, your peak FLOPs numbers are meaningless. There are parallel workloads where a Core 2 Quad will run circles around a 4870 X2. Why? Because the CPU was designed to get as close to peak FLOPs as possible, while the GPU has all these math units and a very SLOW PCI-Express link through which to transfer all the data, which only gives 8GB/s per direction. And the GPUs don't have caches to mitigate this, either!

In order to use 2.4 TFLOPs of compute, you need to provide the dataset, and that takes 10s of gigabytes of data, which will never fit in the local memory of a video card. You have to copy over the data from system memory, which slows down your entire simulation. And even if you do it asynchronously, you will still never make full use of your peak FLOPs if you are always limited by the data pipeline.

Face it, Pete. Outside of games, workloads aren't composed of thousands of independent variables like pixels on a render target. You can get away with certain sizes of simulations, but at some point all those peak FLOPs go to waste.

And things get even worse when you consider the vulnerability that graphics cards have to soft error rates. GDDR5 is the greatest of all offenders. It has a high bit error rate, and no error correction (or even detection). You probably won't notice if one of the pixels on a single frame of your game is the wrong color, but if you have hundreds of these cards doing large scientific simulations, you'll be getting errors every day, in far more noticeable places. No one has figured out a solution for that issue, yet.