Replies to Post #24853 onAdvanced Micro Devices Inc (AMD)

Replies to post #24853 on Advanced Micro Devices Inc (AMD)

Replies to #24853 on Advanced Micro Devices Inc (AMD)

01/29/04 11:00 PM

#24859 RE: yourbankruptcy #24853

Here's a DMT page of an article from 2001:

http://www.slcentral.com/articles/01/6/multithreading/page11.php

Applications Of Multithreading: Dynamic Multithreading

While programmers can write code to be multithreaded, it is time consuming to do so. Considering that time to market is rather important, many programs may forgo the whole multithreading phase (multithreaded in the sense that they are CPU intensive, and would benefit from the addition of another logical processor).

Yet, not all CPU intensive tasks are multithreaded. For these, the ability for the hardware to be responsible for creating threads instead of the programmer would be a great boon in performance. In fact, such processing paradigms do exist, and some don't even require compiler support! (so much for the RISC approach of simplifying everything on the hardware side…). One approach to this is found in the Dynamic Multithreading Architecture (DMT), inspired by Haitham Akkary, now at Intel Corp.

DMT makes use of a traditional SMT pipeline and adds onto it. Increasing the size of the classic reorder buffers and register files (beyond that of a traditional SMT processor) does not make sense, because for it to be effective, the temporal locality of the instructions must be fairly close. Rather than increase them to disproportionate sizes and massively increase latencies, another level outside the pipeline called Trace Buffers are included for every thread that is supported. The optimal size for the Trace Buffers is 200 instructions per thread, where 300 resulted in a relatively minor boost in IPC over 200.[14]

One way (among four) in which Dynamic Multithreading will break a sequential program into multiple threads is to search through a program for a loop, and when found, to go beyond the loop looking for an additional thread. If there is sufficient work to do that is beyond the loop boundary that is not dependent upon the work done in the loop, it will create another thread, and speculatively execute this one. Generally, the idea is to look ahead through the program, and run as many portions of it as possible by speculatively creating new ones.

The last little trick that the MAJC architecture reveals is the same general idea as the above form of spawning new threads from a single thread. They've chosen to call it "Space Time Computing," [6] but the effect is the same - it spawns a new thread from an older one. The difference is that, because MAJC is not based on an SMT architecture (rather a hybrid between CMT and CMP), the newly created thread will instead be executed on another processor on the die.

What about Jackson Technology? Could it too be a form of Dynamic Multithreading? By using a Trace Cache, the Pentium 4 architecture, in a sense, makes quasi-threads where they are simply the path of execution the last time they were run. If different areas of the trace cache could be scanned for "threads," then a DMT processor might make use of the trace cache for the formation of threads.

Akkaray's thesis didn't come out until 1998, and who knows if the P4 was so far along in its design that they couldn't reorganize it so as to include DMT. SMT, on the other hand, was out around about 1995, perhaps earlier (my earliest SMT related source is 1995), which is just after the introduction of the Pentium Pro - the P6 core still found in the Pentium III.

On the other hand, as Jackson Technology has still not appeared, perhaps Intel incorporated DMT in an unfinished state and disabled it so that they could finish it for later revisions. Either way, it seems that the timing works out in favor of SMT, or perhaps even DMT over CMP, which would be extremely expensive to produce (though not beyond Intel's abilities).

Despite the fact that DMT takes a base SMT processor, which is already lengthened by 2 pipeline stages (pipelined register read, and register write), the possibility is still open that an additional stage might have to be added so as not to significantly impact cycle time. However, even if this is the case, the additional stage showed only ~5% performance loss over a DMT architecture that lacked the additional stage.[15]

Overall, DMT was shown to increase performance of SPECInt 95 programs by 15% without changing the number of fetch ports or functional units, and by 30% with one additional fetch port. DMT, like SMT, shows more potential for speeding up integer applications than floating point applications. This is because integer programs tend to have more branches, and thus more times when having multiple threads is beneficial in hiding long latencies.

The DMT architecture described in Akkary's thesis is a form of speculative multithreading that operates on a single threaded program. It reaches far into a program, and achieves higher performance by running later parts of a program on a base SMT pipeline. More recent research has shown that running multiple programs (or preexisting threads from a multithreaded program) using a traditional SMT approach with the additional support of a DMT architecture (called Dynamic Simultaneous Multithreading, or DSMT) improves performance over a completely SMT processor by 5-15% depending upon the amount and type of applications. This works by spawning new threads via DMT protocols when there are fewer threads than the processor has support for. [12]

Milo Morai

01/29/04 11:53 PM

#24866 RE: yourbankruptcy #24853

clip form EEtimes story

....

Nathan Brookwood, a microprocessor analyst at Insight64 (Sarasota, Calif.), said leakage is the likely culprit behind delayed product introductions at Intel, which is locked in a close race with Advanced Micro Devices Inc. to bring new process technology to the microprocessor market.

Intel is sampling a 90-nm version of its desktop Prescott P4, which heads for market this quarter, as well as the 90-nm Dothan version of the Banias mobile processor, which Brookwood said is being pushed out slightly to the second quarter. If AMD is able to execute on its promise to ramp 90-nm versions of its Opteron processor in the second quarter, Brookwood said, then "AMD will only be one quarter behind Intel, which is kind of amazing when you remember that AMD has been as much as three quarters behind in previous product introductions."

Brookwood also cited a trend that casts a shadow on the industry's continued ability to scale as fast as the ITRS predicts. When Intel, for example, moves from one technology node to the next, it has been able to cut power as it boosted frequency. The 180-nm Pentium running at 2 GHz drew 72 watts; the 130-nm Pentium at 2 GHz draw 52 W.

The shift to 90 nm has not resulted in a similar drop. The 130-nm P4 at 3 GHz draws 82 W, but when shifted to a 90-nm process, it draws more power, "in the high 80s or maybe even 90 W," Brookwood said. "Most of the problem is with static power consumption. These devices leak like crazy."

...
http://www.eetimes.com/story/OEG20040123S0041

Volume
Day Range:
Bid Price
Ask Price
Last Trade Time:

Boards:

Quotes:

Boards

News

Market Data

Markets

Discover

Discover

Boards:

Quotes: