InvestorsHub Logo
icon url

Petz

11/05/04 12:53 PM

#46884 RE: dacaw #46876

Although the current FP unit has 3 pipelines, it really only has one FP multiplier and one FP adder. SSE3 has some special instuctions for fast FFTs and DCTs, which are transforms used in image processing and video compression (mpegx), amoung other things. These instructions do x+y and x-y simultaneously. A second FP adder would help a lot, or a special combined adder/subtracter might take a little less silicon real estate.

IMO, adding a second FP adder would eliminate the advantage that Power 5 (as you said) and Itanium now have in floating point horsepower. They can each do 4 FOPS per clock, but 2 adds + 2 multiplies is hardly any better than 2 adds + 1 multiply, in most applications. And, of course, Opteron is way ahead of either Itanium or Power 5 in clock speed, and way below them in die size.

Petz

icon url

DDB

11/05/04 6:25 PM

#46900 RE: dacaw #46876

dacaw - re: SSE3 and execution units

AMD's patents show interesting CPU architectures which may even hold a little truth regarding K10 or later CPU designs. So there might be a nice surprise for us in the future.

And I agree with you, that a higher SIMD execution bandwidth would help a lot.

Unfortunately, such a change like adding execution units or modifying the FPU for the ability to work on 128bit data at once, is really a very complex task. Just a hint: if it would be that easy, then these changes would very likely have taken place while going from K7 to K8. The actual changes are rather small with the 32 additional registers being the biggest change. Other modifications regarding 128 bit wide logical registers took already place in the K7s reorder buffer, decoders, retirement unit etc. Extending the FPU datapaths to 128 bit and modifying the rest of the CPU (decoders, ROB etc.) to support this kind of data and execution would require too much time and money for an already existing CPU design, while the next one (containing such improvements) is already waiting behind the door.

Dropping the MMX, 3DNow! and x87 support in XP64 helps increasing the performance. MMX instructions are available within SSE2 and for the other stuff (excluding complex x87 instructions) SSE/SSE2 can be used as well. Although 3DNow! is somewhat more elegant, it doesn't offer higher performance than SSE anymore.