Well, I've read about the benefits of the compactness of code but what I'm actually seeing is that a thousand lines of assembler code on memmove can beat the stuffing out of a small copy loop.
That's the way the compilers are going as well with more and more inlining and I assume that they know what they're doing.