Advanced Micro Devices Inc (AMD): Chipguy: No wonder you are so afraid...

Reply Private New

Replies (2) Next 10 Prev Next

Send PM Follow Ignore

Followers	0
Posts	625
Boards Moderated	0
Alias Born	03/25/2004

pgerassi

Re: chipguy post# 42004

Thursday, 08/12/2004 12:00:05 PM

Thursday, August 12, 2004 12:00:05 PM

Chipguy:

No wonder you are so afraid of the real length of the pipelines! You are trying to make Prescott look good and it fails. You want Dothan to be faster than it truly is. Well, they both fail in that regard.

From instruction load to instruction retirement, NW P4 is 28 stages. All agree, but those Intel boosters. They want to use the trace cache as a staring point, but that does not contain x86 instructions, it contains micro uops. And when the mispredict leaves the trace cache area, as it does frequently in normal code, it must decode those pesky x86 instructions. And we go by what the pipeline does, not the arbitrary points you want to use to make the P4 look good. Either use the execution pipeline of the P3 (6 stages) or use the full length of the P4 x86 to retirement pipeline 28 stages.

Either way when you compare apples to apples or oranges to oranges, P4 gets a speed boost of sqrt(pipeline length ratio P4/P3) over P3 at the same process. And you want to make the same mistake Mas makes, use the floating point pipeline in the Banias instead of the typical far more used integer pipeline (since by all accounts, relative to Athlon, Banias is trounced in floating point). Else, Banias doesn't look to be the breakthrough you want and Dothan doesn't look much better. They just get back to what P3 does well wrt P4.

Leakage is making long pipelines counter productive and that really hurts speed demons like P4 (especially Prescott). And with Dothan not being much better than P3 pulled towards 90nm, Intel boosters have lost a lot of ammunition for Intel's other failings. Many on SI's original AMD thread remember the Williamette performance and manufacturability debates. Those on the side that P4's overly long pipeline, "double speed" ALUs and limited decoder resources would "bite them on the a.." as we went to smaller process sizes, are completely vindicated in just two process step downs (180nm->130nm->90nm).

And the truth really hurts, doesn't it?

Pete