Advanced Micro Devices Inc (AMD): Elmer: Your backing of SPECxxx_rate

Reply Private New

Replies (1) Next 10 Prev Next

Send PM Follow Ignore

Followers	0
Posts	625
Boards Moderated	0
Alias Born	03/25/2004

pgerassi

Re: Elmer Phud post# 86273

Friday, 11/21/2008 10:44:29 PM

Friday, November 21, 2008 10:44:29 PM

Elmer:

Your backing of SPECxxx_rate_base2006 is not how most compare systems for HPC work. Because of the way this benchmark has evolved over time, it is fairly meaningless for most HPC users who will test each platform in their own environment and see which is both fast and, usually, cheap. For most others, even "base" is far from what they find in the real world when they use their production compiler (gcc or MS C (and variants)) and how the resulting software performs (this was the original goal of SPEC in the first place before being co-opted by OEM marketing departments). So "base" overstates what a normal developer would get for the typical environment and "peak" is far from the most an advanced savvy developer will get from a system.

The more typical server buyer is like what Dan3 does on SI, he gets each marketed system to test using his environment and their applications and see what their performance and how much would the system set them back for money and power used. Then after testing samples from each supplier they look at (or who wanted the sale), they choose the one that gets them the most performance for their budget. Frankly almost everyone says that is the best way to do it. SPEC CPU2006 might help get past that initial cut of who to bring in to test, just like TPC or SAP do, but its part of the beginning and nowhere near the end.

For the rest of us, its a debating point. I choose that because of Intel's "cheating" of flag handling, base is more meaningless than ever before (sometimes it got to the point that base and result scores matched exactly, which shouldn't happen in the real world). The standard result is closer to what savvy users would get, but given the tendency of SPEC not allowing the use of typical numeric libraries like BLAS is far from what starting HPC developers would get much less the savvy ones. Thus the result score gets closer, but isn't really indicative of real world results. Thus, that is the one I, reluctantly, use. And if you really check back, I have always used "peak" instead of "base" to compare CPUs.

The point is that Nehalem isn't out yet and the scores it gets have to be taken with a large dose of salt. Just look at SPECint_2006. At 24 cores, the top Xeon gets 25.5, at 16, its 25.0 (from 2.67GHz Dunnington to 2.93GHz Tigerton), at 12 there isn't one, at 8, its 30.3 (3.33GHz X5470 Harpertown), at 4, its 30.2 (3.5GHz Wolfdale, 3.2GHz i7 965 gets 33.6), at 2, its 26.3 (3.13GHz E3120 Wolfdale), at 1, its 17.4 (3GHz Woodcrest 5160). Generally the speed drops as the core count goes up (and the clock as well). Thus extrapolating speed from 1P to 4P is not 4x because the clock goes slower, some memory contention happens and cache coherency checks slow it down more. Else there would be no slow down as the core count goes up.

Given the above, Nehalem will not get 2x Shanghai when they actually meet. Not in typical server type loads and not in HPC work. Actually savvy HPC developers would likely get a bigger boost going to a GPGPU than by going to a higher speed CPU. A $500 GPGPU crunches numbers (480 DP Gflop/s (2.4 SP Tflop/s) far faster than a $500 CPU can (12 DP Gflop/s).

Pete