alan,
I think the issue is floating point,
And in particular linpack. I think the fused macro-ops will allow four floating point operations per cycle, bringing floating point performance above the P4... when code is compiled to take advantage of the "virtual FMAC".
I thought these were 4 integer pipes...
Yeah, on floating point, you can have any performance you want. It is just a question of how much resources you throw at it. But I doubt Intel is going to make Conroe unbalanced in favor of floating point.
On Integer, it is a lot more tricky to extract parallelism, which is why additional pipes get diminishing returns. But Intel may have found some new tricks...
Joe