InvestorsHub Logo
Followers 0
Posts 297
Boards Moderated 0
Alias Born 04/26/2004

Re: mmoy post# 46703

Monday, 11/01/2004 9:34:33 PM

Monday, November 01, 2004 9:34:33 PM

Post# of 97827
Parallel execution


"The second example has to execute sequentially as they all
modify EAX."


Accessing memory is much slower than a register dependancy, by factors of 10. By increasing code size you are creating memory accesses. I think you'll find that the load unit can queue these accesses anyway.

Download CodeAnalyst and look at the (simulated) code execution.

If you want a real speedup think about the ja opcode - tests the CF and ZF in one instruction. One of my favorite tricks!


Volume:
Day Range:
Bid:
Ask:
Last Trade Time:
Total Trades:
  • 1D
  • 1M
  • 3M
  • 6M
  • 1Y
  • 5Y
Recent AMD News