InvestorsHub Logo
icon url

Tenchu

07/07/08 3:13 AM

#64534 RE: mmoy #64529

Michael, > This will make life easier for programmers (yes, programmers are lazy) so that they don't have to align structures but the best performance will probably come from aligned structures.

Why wouldn't programmers want to align SSE data?

Seems like an awful waste to store 16-byte chunks of data on non-aligned addresses. I mean, come on, when cachelines are 64 bytes in size, isn't it obvious to pack four 16-byte chunks into a cacheline? Otherwise, you'll end up with 1 our of every 4 of your chunks crossing cacheline boundaries, and that's very inefficient for performance.

Tenchu
icon url

ChipGeek

07/07/08 1:23 PM

#64546 RE: mmoy #64529

Re: Unaligned access improvements

There may be other improvements in this area, but the big one I'm aware of is that Nehalem is once again able to do store-forwards to unaligned loads across 1,2,3,4,5,6,7,8, and 12 byte boundaries (ie forward data to unaligned loads without waiting for that store to complete its write to the cache).

This was implemented in P4 thanks to the additional pipestages that were available in that uarch, but Merom took a step back to the P6 days of only being able to forward to loads that are aligned to 8-byte boundaries (they may have added 4 and 12-byte alignment, I'm not sure).

Nehalem once again gets close to an alignment-agnostic state of being, which should close some performance glass jaws that exist out there today for Core 2 chips.