Advanced Micro Devices Inc (AMD): Wbmw: You post shows just how when pr...

Reply Private New

Replies (1) Next 10 Prev Next

Send PM Follow Ignore

Followers	0
Posts	625
Boards Moderated	0
Alias Born	03/25/2004

pgerassi

Re: wbmw post# 42023

Thursday, 08/12/2004 10:04:28 PM

Thursday, August 12, 2004 10:04:28 PM

Wbmw:

You post shows just how when presented with facts, you then start with personal attacks. Intel itself stated P3 pipeline length. They do not want the public to know how long the Banias/Dothan pipelines are, they just beat around the bush. The P4 length is pushed to make it shorter than it really is (for clock rate/pipeline length calculations). You can't compare two pipelines where one does a larger task than the other.

In every other CPU, the pipeline length is well defined from instruction fetch (x86) to instruction retirement (x86 or uop). Only because Intel makes a trace cache mated to a less powerful decoder does Intel want you to look at the uop execution pipeline in comparisons. But all others decode first to uops and then execute them and the latter pipeline segment can be scheduled much later time wise. So in pipeline length clock rate calculations, the P4 has to use the entire pipeline (x86 to retirement) for valid comparisons to other CPUs. You can use the other figures, if you want to look at pipeline stalls or some other thing like mispredicted branches. Then the shorter 20 stage pipeline would be relevant. But, that purpose was not what was asked, clock rate was.

Dothan and Banias are slow because they needed to reduce power. This causes design decisions go different ways than when performance is the goal. They had to add logic to shutdown execution units that were or will be idle. They had to redesign the cache to limit leakage (and even at 130nm, leakage in the cache was a problem). They left out units that speed things up very little (any enhancement or unit that increases power by 1/3 more than overall performance does like they left out floating point units that increase performance less than 3% for each 1% increase in power).

AMD has lower power to begin with and thus could use standard parts to cover the same bases. Their desktops used low enough amounts of power that they could be used in mobiles.

As to benchmarks, well Pentium 4C, P4E, Banias and Dothan all lose against one line, K8. P4C and P4E are slower and even at lower power, Banias and Dothan lose to K8s. And that is before K8 goes to 90nm. The mere fact that P4C, P4E, Banias, Dothan and P3 all adhere to the proper application of the sqrt(total pipeline length ratio) ~= clock ratio on the same process. Going to different process just invalidates the results because they violate the last rule. And we all know that AMD's 130nm process is different than Intel's 130nm process.

Protesting that just goes to show how you wish to invalidate the results because it goes against your presumptions.

Pete