All new microarchitectures have their idiosyncracies and best code sequences to do things. It takes a while for assembly language coders and non-Intel programming toolchain folks to catch on. This was most distinctly felt for Willamette/netburst but was still important factors for the 486, Pentium, and P6.
Probably the biggest areas for NGMA related optimizations are in SIMD/streaming/FP type apps. A 3 GHz NGMA chip has a peak FP performance (both cores) of 24 DP or 48 SP GFLOP/s. There is no way most current apps are coded in a way that even begins to tap that potential. Even most scalar integer code out there is probably far from optimal for a four issue wide x86 chip with three integer units.