- General Purpose Registers (GPRs) extended out to 64 bits, - Number of GPRs doubled from 8 to 16 - Number of SIMD (MMX, SSE, SSE2, and 3DNow!) registers doubled from 8 to 16 - Integrated DDR memory controller - Integrated HyperTransport interface - Improved branch prediction
The pipeline went from 10 to 12 stages.
I think that the benchmarks in this discussion centered around 32-bit benchmarks so the 64-bit stuff is not really applicable. That leaves Improved branch prediction and increasing the pipeline length. Were there any other microarchitectural changes made? I imagine that Improved branch prediction helps but this is already a short-pipeline machine.