re: The ONLY reason to use a non-x86 part is if it really can run the workloads faster
Performance per dollar has been discussed to death, but what seems to be missed is the bit about 'the workloads'. There wouldn't be much of an impetus to move to ARM even if it was slightly better on performance per dollar. So it isn't general purpose processor performance that would win out. It may be possible to add specialized support to a slower processor and get much better value for money.
For instance IBM mainframes support string operations on UTF strings and compression and decompression and decimal operations and special support for sorting. They don't have SIMD. That is a reasonable mix for a commercial workload and a web server. If they want to do good floating point they can stick in Intel or POWER chips into their mainframes.