InvestorsHub Logo
icon url

dacaw

11/20/04 10:56 AM

#47782 RE: jhalada #47765

Effects of Compiler, #regs & OOO on fp performance

I've watched this piece of fud about hidden regs vs visible regs with some amusement.

With itanic the optimization is done by the compiler. Since there is no out-of-order facility in the proc there is little opportunity to optimize at run time.

I experienced this when I moved from the K6-III to Athlon in my heavily hand-tweaked assembler. The K6-III fp unit had no opportunity to do OOO execution on fp code, like the itanic. When you were in a complex section just changing the order of a couple of lines could result in pretty large speed changes. What a pain! You even had to put nops in to align code on boundaries.

In contrast Athlon optimizes the fp code at run time by moving the ops around as resources come available. Thus micro-optimization by the coder is pointless - you really don't see any difference by tweaking the odd line here or there - or sprinkling nops around.

It looks to me, from articles on the Athlon64, that the fp OOO has been improved quite a bit. Of course just having 16 SSE regs is the bees knees.

Saying the itanic's registers are "better" because they are visible is just silly. They have to be visible else the proc can do nothing worthwhile with them. Its compile-time optimizations that are the whole basis of the EPIC design. I'd rather have intelligent run time hardware that maximises the resources available.

There are lots and lots of studies that analyze the benefit of increasing the # of regs. Of course its diminishing returns. 16 seems optimal right now given software tech and hardware design.

In so many ways x86 is broken. AMD64 makes it worthwhile for the first time.