InvestorsHub Logo
icon url

upc

08/11/04 5:54 PM

#41962 RE: chipguy #41958

That's the dumbest thing I have heard all week.

I find it difficult to imagine you've been silent for an entire week.

upc
icon url

Petz

08/11/04 6:05 PM

#41963 RE: chipguy #41958

A Dothan clocks at 56% of the rate of Prescott with a pipeline perhaps 40% of the length of Prescott's pipeline and you think Dothan's transistors can be 10 or 20% slower?

The clock rate increase from a longer pipeline models much closer to the sqrt of the ratio. So that extra 2.5 factor pipeline length should allow Prescott to clock 58% faster.

That would put it at 2x1.58 or 3.16 GHz. 3.6/3.16 is in the 10 to 20% range, but reality is closer to 10%, since 3.6 GHz Prescott is even rarer than a 2 GHz Dothan.

Petz
icon url

pgerassi

08/11/04 8:08 PM

#41975 RE: chipguy #41958

Chipguy:

You do not know how many logic delays are needed to get things done. From various sources that claimed Northwood's 20 stage execution units used 8 unit delays per stage plus 2 unit delays for the latching required for synchronous operation worst case (I don't know where they got that information or even if it is correct. Only the CPU designers know). Northwood has 28 stages from decode to retirement. 10 * 28 = 280 total delays of which 224 are for logic. Dothan like P3 before it has 10 stages with worst case logic of 13 unit delays and latching of 2 unit delays for a total of 150 total and 130 logic delays (Dothan has less need for a big scheduling window and no need of trace cache which adds complexity and some stages of Northwood are for signal transport). The frequency of Northwood is 3.4GHz and Dothan is 2.0GHz. Thats 1.7 to 1.

Assuming equal logic delays (which are probably in Dothan's favor due to it being 90nm and Northwood at 130nm), Dothan's Ft should be 15/(10*1.7) = 0.88 of Northwood's Ft. Prescott uses 31 stages plus the 8 of the decode pipeline for a total of 39 stages. Assuming the same complexity (Prescott is likely worse), Prescott must have no more than (8+2)*28/39-2 = 5 stages of logic and 2 for latching using the same Ft. Unfortunately, Prescott requires more delays than that (5*39 = 195 which is less than Northwood's 224)(likely needs at least 6 logic delays) and for a higher speed (4.5GHz was the likely target vs 3.2GHz of Northwood (3.4GHz NW was required because of Prescott's power usage)), Prescott's Ft is more likely 61% more than Northwood's. 33% is likely increase from 130nm to 90nm alone (straight shrink), so Prescott needs a Ft about 21% higher at the same size.

If you have different numbers, please plug them in and look at what you get. In any case, when one forgets that latching between stages is required and as the number of stages rise, the amount lost to latching becomes that much greater. And it can be uneven with some stages having less logic delays than others. This is why Dothan, Athlon, K8 and other short pipeline systems can get more work done overall.

As to your simplistic view, you are incorrect at least to what the designers wanted to achieve. Dothan currently runs at 2.0GHz and Prescott was to reach 4.5GHz or Dothan has to get to 44% of the frequency of a CPU with a pipeline that is 26% as long. However much of the difference is in how such a long pipeline is inefficient compared to a short one. A pipeline twice as long doesn't allow the clock to run twice as fast at the same Ft. Look at the P3 and Williamette P4 at the same process. 1.1GHz/10 stages (110MHz per stage) vs 2GHz/28 stages (70MHz per stage) both desktop designs at the time.

Pete