InvestorsHub Logo
icon url

dougSF30

01/23/04 8:47 PM

#24188 RE: UpNDown #24186

UpNDown, there is no explanation, because the original claim was ridiculous.

Think about it-- The claim is the following: No matter WHAT the design/layout is, regardless of any process technology in existence now or in the future, as long as the feature size is the same and the voltage is the same, a given x86 instruction will require roughly the same amount of power to execute.

I admire its sheer audacity, I suppose. That's about all you can say for it.


Doug




icon url

Jerry R

01/23/04 11:19 PM

#24193 RE: UpNDown #24186

UpNDown - But wouldn't a longer pipeline imply more need for temporary storage, rename registers, etc.? Thus, a longer pipeline would need more logic gates, but with smaller per-cycle work units, allow higher frequencies?

Not necessarily. Shorter pipeline designs, by their nature, will have much more logic switching per clock cycle because more levels of logic are allowed.

One of the the most basic exercises performed during the early phases of chip design are the circuit feasibilities to determine how many levels of logic can be implemented per pipestage for a given target frequency of operation. For example, let's say you were designing a 2 GHz CPU. This yields a 500ps clock period. Let's say the process you are using yields an average CMOS logic delay of 30ps (e.g. for a 3 input NAND gate). The typical sequential element (e.g. flip flop) have an output delay of 50ps, and a input setup time of 30ps. Assuming there is no clock "skew" (divergences from one clock net to another), the math here would indicate that your design should contain, on average, 14 logic levels per pipestage.

Knowing this, you now have to design your microarchitecture to determine what can be accomplished (e.g. register file read, addition operation, etc.) in a given pipestage. From this, a design team can then proceed to determining how long the pipeline needs to be accomplish everything from instruction fetch to operation writeback and retirement.

Higher switching capacitance per pipestage is equally as determining a factor as frequency in the power consumption equation. You are basically trading off two factors that are equally influential.
icon url

chipguy

01/24/04 12:58 AM

#24202 RE: UpNDown #24186

It seems the lower IPC/higher frequency route leads to more power usage -- more logic gates driven at a higher speed. Please explain where I'm wrong?

To go down the path of higher IPC your processor needs
to be able to issue and execute more instructions in
parallel, i.e. wider issue.

Going wider means geometrically more complicated decode,
control, and datapath bypass logic (the triple way parallel
x86 decoder in K7/K8 is probably at least an order of
magnitude more complex than the single decoder in P4).
It also means that more signals have to travel around
the chip farther and that implies more wires, longer
wires, and far more repeaters. Going wider also has its
own inefficiencies. If the parallelism isn't present in
the code then that complicated logic runs for the sake
of finding reasons to keep execution units idle.

The power scaling differs in minor ways from going the
speed racer route vs the brainiac approach which is why
I said the power wouldn't vary greatly for a given level
of performance, not that it wouldn't vary. Also, the
growing issue of device leakage changes the picture
somewhat. Leakage isn't very sensitive to clock rate
but it is to transistor count. That would normally tend
to favor the speed racer but Intel is adding complexity
to its x86 cores at a far too high a rate to benefit
from this effect.