In this case I'm sure Intel could claim that they're performing a legitimate optimization. Frankly, I doubt it; this kind of optimization would be difficult to recognize and apply in generic code. It'd also be for little benefit, because I've never seen someone use code like this to set or clear huge sets of bits. That part is kind of the catch, because this optimization would make the code slower if the run lengths weren't sufficiently large. In nbench's case they are, but there's no way the compiler could have known that on its own.
What's more, this optimization wasn't present in ICC until a recent release. Somehow I don't think that they just now discovered it has general purpose value. More likely case is that they discovered is they could manipulate AnTuTu's scores. Seems to coincide well with this third-party report appearing showing how amazing Atom's perf/W is - using nothing but AnTuTu. Or the leaked scores seen for CloverTrail+ and now BayTrail that are AnTuTu. Is this really a coincidence?