Elmer, re: ECC can correct an error
We have to keep the bus usages separate. aHT is used for general transfers, I assume you don't think we should be sending out ECC on all our video writes. cHT (coherent HT) is used for cache probes and memory transfers and it might be more useful to use ECC for those.
ECC can correct some errors if you're willing to accept that there hasn't been a burst of errors that appears falsely to be a single-bit error. CRC can detect a higher percentage of errors.
A mechanism for retransmitting the data is desireable whether using ECC or CRC. For memory reads, we use ECC because the memory is a stupid device and must reconstruct the data. If it can't, we still want to be able to figure out what it should have been. But all aHT transfers are between intelligent controllers, so there is no reason to go with ECC when a retransmit would be required anyway if the data could not be reconstructed with the ECC bits delivered. In other words, even if we had ECC coding for aHT or cHT transfers, we'd still want CRC and retransmit as well in case the ECC wasn't sufficient.
You say "On a shared bus ECC is in parallel with the data". I believe the ECC bits only get as far as the memory controller, they don't make it to the system bus (FSB). For Opteron, the ECC bits get to the on-chip memory controller. From then on, if the memory line must be transmitted to another Opteron, cHT is used. So real memory errors are corrected on-the-fly by the host Opteron. It's only when another CPU requests the data that the cHT is used.
The statement: "Corrupted data must be retransmitted, further consuming bandwidth" is just FUD. We're talking about something that might happen less than once a million, billion, who cares, transfers so any increase in required bandwidth would be infinitesimal.
Finally, we get to "CRCs must be attached to the data packet and therefore always consume bandwidth even when there is no data corruption." Yes, and I don't think anybody would want it any other way. For the bulk transfers -- disk blocks, data acquisition devices, etc. -- that are needed to saturate the aHT links the percentage overhead decreases as the size of blocks increases so the bandwidth effect is also minimal.
What might be interesting to discuss is the effect of waiting for a CRC check on a full cache-line transfer between processors. The Opteron can no longer use a "most needed data first" optimization and must wait for the whole cache line plus CRC to arrive. This will show up as an increase in memory latency when accessed from other Opterons.
[Edit: I see Tenchu has taken up the banner here.]