They can each do 4 FOPS per clock, but 2 adds + 2 multiplies is hardly any better than 2 adds + 1 multiply, in most applications.
Wrong. Most of the important HPC/technical computational
kernels are well balanced between add/multiply - matrix
multiply, matrix decomposition, dot product, polynominal
evaluation, digital filter evaluation etc.