Your Quote about Transactional Memory: "guess that this is one way to improve performance on single-threaded programs without any programming."
I think that you meant "improve performance on multi-threaded programs without significant programming".
Transactional memory makes for simpler and more transparent implementation of semaphore locks and other software contructs needed for multi-threaeded code. If this is indeed automated by the compiler, it will allow better and widespread use of multiple cores.