Yep, that and the "memory disambiguation" (allowing better load reordering), particularly for int performance.
Determining whether a load and a store share the same address is called memory disambiguation. Allowing loads to move ahead of stores gives a big performance boost. In some snippets of benchmarking code, Intel saw up to a 40% performance boost, solely the result of the more flexible way Loads get reordered. It is pretty clear that we won't see this in most real applications, but it is nevertheless impressive and it should show tangible (10-20%) performance boosts together with the fast L2 and L1 cache.
Register for free to join our community of investors and share your ideas. You will also get access to streaming quotes, interactive charts, trades, portfolio, live options flow and more tools.