And the reality is that memory transactions are pipelined, so you aren't waiting the ~70ns for a first-to-load transaction, but rather 2-4ns for a bus turn-around for the next transfer to occur.
All this substantially reduces the best case win of 20% to something more like 2-4%. So the earlier comment that the IMC is aiding K8 performance, rather than the micro-architecture, is plainly wrong.
You are really so funny if you believe the above fairy story ;-). A little knowledge is a very dangerous thing ;-). When you have a cache miss that data/instruction will experience the full memory latency regardless of whether it's pipelined with other cache misses. It can't pinch another previous oncoming memory request just because it's been requested earlier and the data it has may not be what it wants anyway. You are just too absurd for words.