Advanced Micro Devices Inc (AMD): Chipguy: You do not know how many log...

Reply Private New

Next 10 Prev Next

Send PM Follow Ignore

Followers	0
Posts	625
Boards Moderated	0
Alias Born	03/25/2004

pgerassi

Re: chipguy post# 41958

Wednesday, 08/11/2004 8:08:00 PM

Wednesday, August 11, 2004 8:08:00 PM

Chipguy:

You do not know how many logic delays are needed to get things done. From various sources that claimed Northwood's 20 stage execution units used 8 unit delays per stage plus 2 unit delays for the latching required for synchronous operation worst case (I don't know where they got that information or even if it is correct. Only the CPU designers know). Northwood has 28 stages from decode to retirement. 10 * 28 = 280 total delays of which 224 are for logic. Dothan like P3 before it has 10 stages with worst case logic of 13 unit delays and latching of 2 unit delays for a total of 150 total and 130 logic delays (Dothan has less need for a big scheduling window and no need of trace cache which adds complexity and some stages of Northwood are for signal transport). The frequency of Northwood is 3.4GHz and Dothan is 2.0GHz. Thats 1.7 to 1.

Assuming equal logic delays (which are probably in Dothan's favor due to it being 90nm and Northwood at 130nm), Dothan's Ft should be 15/(10*1.7) = 0.88 of Northwood's Ft. Prescott uses 31 stages plus the 8 of the decode pipeline for a total of 39 stages. Assuming the same complexity (Prescott is likely worse), Prescott must have no more than (8+2)*28/39-2 = 5 stages of logic and 2 for latching using the same Ft. Unfortunately, Prescott requires more delays than that (5*39 = 195 which is less than Northwood's 224)(likely needs at least 6 logic delays) and for a higher speed (4.5GHz was the likely target vs 3.2GHz of Northwood (3.4GHz NW was required because of Prescott's power usage)), Prescott's Ft is more likely 61% more than Northwood's. 33% is likely increase from 130nm to 90nm alone (straight shrink), so Prescott needs a Ft about 21% higher at the same size.

If you have different numbers, please plug them in and look at what you get. In any case, when one forgets that latching between stages is required and as the number of stages rise, the amount lost to latching becomes that much greater. And it can be uneven with some stages having less logic delays than others. This is why Dothan, Athlon, K8 and other short pipeline systems can get more work done overall.

As to your simplistic view, you are incorrect at least to what the designers wanted to achieve. Dothan currently runs at 2.0GHz and Prescott was to reach 4.5GHz or Dothan has to get to 44% of the frequency of a CPU with a pipeline that is 26% as long. However much of the difference is in how such a long pipeline is inefficient compared to a short one. A pipeline twice as long doesn't allow the clock to run twice as fast at the same Ft. Look at the P3 and Williamette P4 at the same process. 1.1GHz/10 stages (110MHz per stage) vs 2GHz/28 stages (70MHz per stage) both desktop designs at the time.

Pete