This test also allows to compare the memory latency, once the test exceeds the L2. The Opteron memory subsystem shows a 120 cycles latency, whereas the Athlon XP shows 360 cycles, namely three times slower, as the clock speed are the same. We can see here the effect of the integrated memory controller, that allows to minimize the request time to memory, and then drastically reduces the read latency.
- General Purpose Registers (GPRs) extended out to 64 bits, - Number of GPRs doubled from 8 to 16 - Number of SIMD (MMX, SSE, SSE2, and 3DNow!) registers doubled from 8 to 16 - Integrated DDR memory controller - Integrated HyperTransport interface - Improved branch prediction
The pipeline went from 10 to 12 stages.
I think that the benchmarks in this discussion centered around 32-bit benchmarks so the 64-bit stuff is not really applicable. That leaves Improved branch prediction and increasing the pipeline length. Were there any other microarchitectural changes made? I imagine that Improved branch prediction helps but this is already a short-pipeline machine.