Replies to Post #46000 onAdvanced Micro Devices Inc (AMD)

Replies to #46000 on Advanced Micro Devices Inc (AMD)

10/18/04 10:45 AM

dacaw - I had similar observations while working on the optimization of Prime95. Such code depends on the way, how the Macro-Ops, resulting from decoding of x86-code, are grouped together and if there are enough low-latency variants (reg, reg) of these instructions. Pipeline stalling problems could arise, if instructions going to only one of the three FP pipes outweigh the other instructions by a significant amount (e.g. 20 FADD pipe ops mixed with 10 FMUL pipe ops - they will be grouped into 20 groups scheduled to the FPU reservation station with 10 of them having empty FMUL slots). With MMX code this should be somewhat easier, because many of the MMX ops can go through both FMUL/FADD pipes.

chipguy

10/18/04 11:10 AM

#46003 RE: dacaw #46000

recall that when I was developing the DCT code I ran it through AMD's CodeAnalyst (then in beta form). My DCT did a lot of its work in floating point on normal 8x8 macroblocks. MMX was used a lot too - I did all I could to make it fast but above all accurate.

If your code "did a lot of its work" in FP then why do
you say "MMX was used a lot too"?

mmoy

10/18/04 11:43 AM

#46007 RE: dacaw #46000

I didn't know that such a tool exists. I've compared IDCT
routines in FP and Scaled Integer and found that Scaled
Integer performs much better on SSE and SSE2 machines. You
can lose accuracy though but this is for displaying JPEGs
in Mozilla and I haven't heard any degradation reports from
the code nor have I seen any myself.

I'll have to take a look at this tool. I've done a lot of
work but I know that there are improvements possible and I
didn't focus that much on pipelines.

Does the tool recommend code changes or reorder instructions
for you or do you tweak, test and repeat?

Volume
Day Range:
Bid Price
Ask Price
Last Trade Time:

Boards:

Quotes:

Boards

News

Market Data

Markets

Discover

Discover

Boards:

Quotes: