InvestorsHub Logo
icon url

wbmw

09/10/05 10:27 PM

#61951 RE: j3pflynn #61935

Re: Conroe article

Interesting performance expectations:

Thus, we can expect the following performance from a single CPU core within Conroe 3 GHz, 1066 MHz FSB, and 2 MB L2 Cache (half of 4 MB for each core): >2500 in SPECint_base2000 and >2200 in SPECfp_base2000. Additional performance gain in floating-point arithmetic can be achieved in case of full-clock 128-bit FPU or two 64-bit combined FMA units and multiplication/addition fusion. In this case we can speak of SPECfp_base2000 at the level of 2400-2500. The performance rates published for integer applications demonstrate approximately the 1.4-fold performance gain versus Pentium 4 3.8 GHz, which agrees with the advertised assessments.
icon url

pgerassi

09/10/05 11:17 PM

#61955 RE: j3pflynn #61935

Dear J3pflynn:

THe article has some glaring errors in the decode section. The 4-1-1 P-III/PM decoder does not issue 6 MOPs per cycle, but only 3. Only one decode path can do complex decodes, but it still issues one MOP per cycle. K7/K8 can decode 3 complex decodes per cycle generating 6 MOPs per cycle (3 executing and 3 load/stores). In K8, a executing MOP paired with no load/store MOP can be combined with a no executing MOP paired with a load/store MOP into a single MOP pair.

So using their terminology K7/K8 has a 4-4-4 decoder generating 2-2-2 MOPs per clock. P6 is supposed to have a 4-4-4-4 decoder, but generating only 1-1-1-1 MOPs per clock.

As to the performance estimates, 2.8GHz K8 already beats the SPECfp2000 score. A 3GHz K8 would likely still outrun Conroe in SPECfp2000. A 3GHz K8 would also likely beat Conroe's SPECint2000 score using the same compiler. Of course K10 may be out at that time with an additonal FPadd and FPmul unit to do SSE2 packed instructions at 1 FPadd_pair and 1 FPmul_pair per cycle or 4 DP flops per cycle. This will likely push K10 far out of Conroe's reach in FP and even exceed Power and Itanium SPECfp2000 scores.

Pete