fpg, Re: This dual-core architecture will be able to approach the 100% speedup ideal closer than a regular 2-way Opteron system, as the cores will be able to communicate with each other at full speed without going past the SRQ, whereas the latter system must traverse each chip's SRQ and HTT interface.
For data access local to the CPU, you would be correct, which would make a single CPU dual core system attractive versus a dual CPU SMP system. However, remote accesses will put twice as much traffic on the Hypertransport links due to the dual cores, so remote bandwidth per core will be reduced. A 4-way (dual CPU, dual CMP) may be a wash over a regular 4-way SMP system.