I've noticed on Win32 is that the size of the executable is 60% larger with PGO. PGO adds about a 7% performance improvement on my application so I think that it's well worthwhile
But it must increase usage of L1 and L2, which apparently is fine (since you have a gain) but when the user has other apps that running at the same time, increased usage of caches will have adverse effect on other apps running...