Same way as P4 is just a x86 cpu with SSE2 extensions that are, BTW, absolutely required to run it faster than Athlon.
No kidding? The benefit of SSE2 is primarily DP FP SIMD instructions. A P4/3.2C yields 1252 SPECfp_rate2k while an XP 3200+ yields 873 SPECfp_rate2k. The P4 is 43% faster as measured by SPECfp_base2k. Intel has disclosed that the use of SSE2 instructions improves the SPECfp2k score of the P4 by about 5%. This implies that even without SSE2 the P4 would still beat the Athlon by about 36%.
Perhaps your use of "absolutely required" above was a bit overstated. :-P