OC Detective, the "experiment" was just for Dan, and just to debunk some false assumptions he made. Of course DMA has some impact on CPU utilization, since the CPU has to poll a "done" flag in main memory (or somewhere else, but usually it's in main memory).
As for TCP/IP offload engines, the main problem they are solving is packet-switching, which involves moving blocks of data around in memory. The TOE stuff is one level above DMA and gets somewhat complicated, but what Dan is talking about is old PIO methods of I/O data transfers, which is one step behind DMA. Not only that, but the way Dan applies PIO is so idiotic that a programmer would have to try real hard just to emulate what Dan thinks is normal behavior.
Tenchu