Register for free to join our community of investors and share your ideas. You will also get access to streaming quotes, interactive charts, trades, portfolio, live options flow and more tools.
Register for free to join our community of investors and share your ideas. You will also get access to streaming quotes, interactive charts, trades, portfolio, live options flow and more tools.
Table 2 of the Opteron Tech Doc references clock speeds up to 2600MHz - and, of course,
includes 2400MHz. Although it is not entirely clear, it is likely that the max power figure given
in the literature corresponds to the max clock speed in the same literature.
I disagree with this intrepretation. The Opteron data sheet only describes three specific
speed grades 1.4, 1.6, and 1.8 GHz. The maximum power corresponds to 1.8 GHz.
The table that includes 2.6 GHz refers to encodings of the clock multiplier value code
applied to the Opteron at reset. It is common for chip documentation of field encodings
to define values for products that don't/won't exist in the current stepping, or process,
and sometimes even ever.
For example the Pentium 4 documentation has definitions for the cache description
fields in the value returned by the CPUID instruction. Some of the encodings define
cache sizes and configurations that differ radically from Willamette, Northwood, and
what is reported in the works for Prescott. Only a fool would assume Intel would
eventually ship products that encompass all possible values let alone combinations
of these field values just because Intel defined them.
Core-Logic plus 1MB of L2-Cache on a Thorougbred-size die.
Obviously we see a hybrid-process here: Core-Logic somewhere
between 90 and 130nm and Cache well below 90 nm structures already.
Perhaps you should consider the effect of highly optimized design and layout.
The L3 cache in McKinley packs 3 MB in 175 mm2. That's less than 60 mm2
per MB. That's in a 0.18 um process. I am not sure AMD even got to 60 mm2
per MB in Opteron even though it is in a full process shrink compared to
McKinley (0.13 um vs 0.18 um). BTW, the McKinley L3 uses a highly dense
hand designed sub-block architecture that Intel claimed was denser and
had higher cell efficiency than the best commercial SRAMs.
No doubt a few readers won't be able to accept that Intel engineers came up
with a new and innovative way to do something and will continue to grope
around looking for process-based excuses and explanations.
That would put a single channel of memory controller between 28mm^2 and
46mm^2, or more space than I would expect a dual channel memory controller
to occupy.
I should hope so. At 130 nm you should be able to pack between 50,000 and
150,000 routed logic gates into each square millimeter.
If you look at the AMD web site die photo for Opteron the region marked memory
controller is comparable in size to the FPU. It also appears to contain quite
a large fraction of very regular logic structures and data paths. This suggests
a lot of cache line victim buffers and prefetch buffers etc. A quick estimate
from the photo suggests it is ~4.3% of the die area or between 8 and 9 mm2.
BTW, the memory controller isn't dual channel, it is single channel, 128 bits wide.
Are you completely useless? Look it up yourself. I think it was IEDM 2000 or 2001.
Please list some of the other important aspects that differ ........you know......the ones you are sure of.
Sorry, I don't compete with SI and CW by giving out free process analysis.
The subject is whether or not IBM can supplement AMD production capacity
of 130 nm x86-64 devices. It only takes a single significant process difference
to prevent this and I have already described one.
For starters, the IBM process uses a local interconnect layer, the AMD/Mot process doesn't.
You're sure of that ....are you???
Well I am not 100% sure. I was giving AMD and Mot the benefit of the doubt
that neither would present a fraudulent paper at IEDM.
For starters, the IBM process uses a local interconnect layer, the AMD/Mot process doesn't.
re: Dell is a joke, screwdriver shop
Would you like a cookie with that Kool-Aid you're enjoying? You just lost any credibility you may have ever had with that post.
AMD enthusiasts have a major bipolar disorder centered around Dell.
When its strongly expanding market share for its all-Intel, all-the-time, product line
its alternatively the devil incarnate, a screwdriver shop, or Intel's main b*tch.
But the slightest rumour Dell is even looking at an AMD chip makes the faithful
weak-kneed with delirious anticipation. They fall for the rumour no matter how
often or regularly it comes around. Makes Charlie Brown's football obsession
thing with Lucy look shrewd and calculated in comparison.
IBM is the most promising server OEM in industry, even if it's not nesessary the biggets.
Then no doubt you will be applauding the loudest when it introduces the I2 based x series
450 in the next month or two.
Dell is a joke, screwdriver shop - they can't make any complicated hardware.
Fan-boys really slay me. If Dell brought out an Opteron box tomorrow they would have
the fastest political rehabilitation since the days of Stalinist Russia and they could do
no wrong in your book.
Why is it that Intel produced a 2ghz P4 on .18 but can only manage a 1ghz Itanium?
The server P4 has a 150% longer pipeline than I2 and runs 100% faster.
The Opteron has a 20% longer pipeline than Athlon and runs 20% *slower*.
Hmmm.
For some its too late. May others will learn from their mistakes.
For some its too late. May others will learn from their mistakes.
A demo at the booth showed two boards with the same exact software- a PIII coppermine
running at 700Mhz and a TM5800 running at 800 Mhz. The application was pegged to run the
CPU at the (full flat out constant) 800 Mhz, as I understood it. The curent draw was measured
and at the time I was there, the 5800 took a little over half as much power as the PIII.
What a joke, comparing the 130 nm 5800 with the 180 nm Coppermine. Its not
like TMTA couldn't locate a Tualatin in time for the show. What's next, a side by
side demo of the 8000 with a Katmai PIII or a Pentium Pro?
If Opteron really takes off, and becomes supply constrained, IBM would be a great source.
Not any time soon. The AMD and IBM 130 nm SOI processes differ in some important aspects.
Presumably AMD's 90 nm process development was well underway when they recently signed
up IBM for joint future process development so big blue may be no help as a potential second
source until 65 nm.
LOL, geek.com is run by an obvious Intel hater. BTW, isn't 100 mm2 a suspiciously
round number?
According to a March 31, 2003 story about Pentium M on the on-line version of MPR
(sorry but it is subscription only so I won't bother give you the URL), the chip size is
given as 10.56mm x 7.84 mm or 82.8 mm2. In comparison, Tualatin came out at 80.5
mm2 but later had a linear process shrink to 74.1 mm2. So Banias with 1 MB L2 is
only about 10% bigger than a 130 nm Pentium III with 512 KB L2. Not bad eh?
Why would you expect necessarily lower power for a chip with 1Meg of cache as opposed to 256K?
Because it would have half as much off-chip memory traffic on average. Getting data
off-chip is far more power intensive then pulling it out of an on-chip cache.
Why is 193mm^2 such an abomination for a chip with 1Meg of cache?
I wouldn't call Opteron an abomination for being 193 mm2 considering its performance
but as far as size goes, Banias also sports a 1MB L2 and it is ~80 mm2 in a 130 nm
process with one fewer interconnect layer IIRC.
The newer UltraSparcs don't suck too bad. They do pretty well with their StarFire architecture
they bought from SGI, and all but their low end stuff is pretty balanced between processor, memory
and I/O.
The problem is that Sun just brought out a processor for the size of server that Opteron
addresses, the UltraSPARC-IIIi. In fact it is in many ways it is like Opteron - on chip DDR
controller with 128 bit wide interface, on chip 1 MB L2 cache, 130 nm process, glueless
SMP support.
The US-IIIi is like Opteron in many respects except for one big one - performance. The
first US-IIIi systems runs at 1 GHz part and these yield 485 SPECintbase2000 and 722
SPECfpbase2000. In contrast the Opteron 244 gets 1095 SPECintbase2000 and 1122
SPECfpbase2000. Let's not even talk about SPEC *rate*.
Sun has just spent the last 2 years and countless hundreds of millions of dollars trying
to bring the US-IIIi to market and it is about a year late. This is why I doubt Sun will pop
out a low end server line based on Opteron - it would unambiguously demonstrate to
the world, and Sun's existing customers, what a sad POS the US-IIIi is in no uncertain
terms. Worse yet Sun board members and shareholders might start to question if all
the money it spends every year on uP development is bringing any value when it can buy
chips like Opteron on the open market. Scott McNealy likes to stick it to Wintel whenever
possible but he knows which side his bread is buttered on.
How much can that 2.25 multiplier be reduced through future optimization?
[...]
Ultimately, the Intel strategy is that this consideration becomes less important at geometries of 65nm and below. Certainly at 90nm there will still be a large cost premium for Itanium over either Opteron
or Xeon.
This is a non-issue. Look at the proliferation of RISC in embedded control where code size is
almost always more important than in computers.
The instruction fetch bandwidth issue is easily handled in the icache at little extra cost. And with
MB levels of on-chip cache by the time you get to the uP pins most of the memory traffic for most
programs is almost entirely data so code traffic is seldom an important factor.
You are right, code bloat in RISC and IA-64 is due to the limitations of the instruction set
and the rigidity of the architecture.
And you are right too, your many errors in fact and logic are indeed mainly do to your
serious shortcomings in intellectual capacity. :-P
BTW, in case you missed the point of my comment above, you completely misrepresented
what I said about code size to fit your own agenda. I was going to address your comments
about limitations and rigidity but realized that it is a waste of time since your fundamental
purpose here is sophistry not reasoned debate.
No. The issue Dan3 raised was the difference in code size for 32 vs 64 bit
operations on so-called traditional 64 bit architectures, basically all RISCs.
And for all intents and purposes there is none.
Code bloat is real for RISC vs x86 and IPF vs RISC but the issue of 32 vs 64
bit operations has nothing to do with it.
According to an acquantance who designed RISC processors and now designs
IPF processors, for the same level of performance a RISC needs about 1.5 x
the instruction fetch bandwidth as x86 and in turn IPF needs about 1.5 x the
instruction fetch bandwidth as RISC. According to AMD, x86-64 needs about
1.1x the instruction fetch bandwidth as x86 (mostly due to the fact that the
opcode prefix needed to access the extra registers outweighs the reduced
dynamic instruction count from fewer loads/stores).
The size of IPF executables will be larger than RISC or x86 binaries by larger
ratio than that suggested by the fetch bandwidth ratios I mentioned before
because IPF programs include pieces of "fix up code" that is seldom executed.
Also keep in mind that as IPF compilers improve over time that the code size
will tend to go down proportionately as performance goes up since raising
performance mostly means packing useful code into bundles more efficiently.
Leakage from charge tunneling through the gate dielectric is also starting to become
an issue for some types of circuits.
64bit processors have huge advantages in a number of circumstances, but traditional 64bit
processors (particularly IA-64, with its rigid instruction format) bloat the code size in the extremely
frequent circumstances in which 32-bit integer operations take place.
Wrong. Traditional 64 bit processors are RISCs and these use the same 32 bit instruction
size for 32 and 64 bit operations. Various RISCs support 8 to 16 bit immediate data in the
instruction word and any literal data larger than that, whether 32 or 64 bit, is fetched from a
initialized static data segment using the same size code sequence for either 32 or 64 bit
operations.
The situation is similar for IPF except there is additional support for direct 64 bit immediates
through the use of two of three slots in an instruction bundle. This has the effect of reducing
the code size to access a program literal value compared to RISC when the value is too
large to code as an immediate and it takes three instructions or more to fetch it from the
constant data segment for that procedure.
Or you can reduce the resistance as is the case with strained silicon.
Again a simplification for the non specialist. It is more accurately an
increase in effective charge carrier mobility in the transistor channel
region. Resistance implies V = I R. In reality an accurate transistor
model might have a hundred different parameters or more.
True resistance is a concern mainly in long interconnect traces. But
a truely well designed processor will have only a small component
of interconnect delay on speed critical signal paths.
There *should* be a decided advantage to SOI, and all else being equal,
a SOI device should clock higher on lower voltage.
If no screw-ups in the design, then compared to bulk SOI gives
1) higher frequency at same voltage
or
2) same frequency at lower voltage
or
3) a trade-off combination of the 1) and 2)
However, current leakage to a slightly conductive substrate is more of an issue
if I understand it correctly.
There are a lot of issues and it would honestly take far too much time to even scratch the
surface. And it would probably bore most of the people here to tears.
You also said that you thought that AMD abandoned low-k. Where did you hear that?
That was me? I don't remember. A lot of companies got burnt by organic low-Ks at 130
nm though. Good electrically but crap mechanical properties so high packaging failure
rates and trouble with stress and accelereated life testing.
Keep in mind that low-K means different things to different people and different things
at different times. FSG was once considered "low-K".
Wouldn't the SOI benefits be most pronounced under static conditions? As frequency
goes up don't the benefits go down?
The use of SOI reduces some types of parasitic capacitances in the chip. This
allows either a bit higher performance, a bit lower power consumption, or a
little bit of both.
The potential performance benefit of SOI depends on a lot of different factors.
In general, the aggressive design teams that push bulk CMOS CPUs to the
highest performance levels (HP, Intel, AMD) use a lot of dynamic logic. But DL
works in part by reducing the importance of the type of capacitance that SOI
eliminates so the speedup from using SOI is less, say 10-15% assuming full circuit
redesign where necessary. Other design teams (IBM, Sun, most embedded CPU
vendors) tend to favour "text book" or ASIC design styles which use mostly
static logic gates. This means so-so performance CPUs in bulk CMOS but a
bigger speedup when moved to SOI, say 20 to 25% (although still behind
aggressive CPU designs moved to SOI).
Then I simply do not understand the reasoning for using SOI. All of the popular press
says that it is for reduction of leakage. Are you saying that this premise is incorrect?
In simple terms a chip consists of transistor "terminals" tied together with wires.
A signal net consists of a group transistor terminals and wires directly connected
together and is usually driven to a 0 or 1 state by a single gate or buffer. To make
chip logic operate faster you can either reduce the capacitance (tendency to store
charge as voltage changes) of the driven nets or you can increase the current drive
strength of the controlling gate or buffer. SOI is a process technique that eliminates
most of the parasitic capacitance of two of the three different types of transistor
terminals. This somewhat reduces the overall capacitance of signal nets, slightly
reduces logic delay times, and can allow slightly higher clock frequencies. But
SOI also adds new potential design problems and the devil is in the details.
The popular press often simplifies technical issues to the point of nonsense. The
basic concept of capacitance is hard for most people to understand and in the
context of low K dielectrics for interconnect I have seen capacitance explained as
the tendency of signals to "leak" between adjacent wires. Perhaps this over-
simplication was retained in a discussion of SOI. BTW, for a good laugh I'd like
to see the press explain why process guys need to find both higher K materials
and lower K materials for future generations of chips to keep Moore's Law going.
Running at 1GHz, the Astro chip smoked a Sony GRX650 1.8GHz Pentium 4-M notebook in
brief tests of WinXP application launches and system responsiveness.
Application launching and system responsiveness? How about funkiness and
shagadelicity? Heaven forbid a clear and unequivocal numerical measurement
of Astro's actual CPU performance on a real application or benchmark escape
into the wild.
The best clue to discern charlatanry from a scientific endeavor is an adversion
to quantitative measure and the independent confirmation of experimental
results. I wonder if TMTA will again refuse MicroDesign Resource's request
for an loaner eval system to confirm its claims about its products.
At least you seem to finally accept we are different individuals, LOL.
"That was a great move to sell TMTA at 95 cents. You are so smart!”
Rest assured I'd never say that about anyone who bought into this
dog at $23. ;^)
http://www.investorshub.com/boards/read_msg.asp?message_id=898564
Since SOI should help limit leakage,
You know what they say about assumptions.
Partially depleted SOI devices like Opteron suffer from the same leakage
mechanisms as bulk CMOS devices and add at least a two new ones of
its own (pass gate transient leakage and bipolar leakage).
BTW, these new phenemon, along with many other design issues, often
bite design teams new to SOI on the a$$. I wonder if this is one of the
things that delayed Hammer and seem to be limiting its frequency yield?
If you have to increase Vt as a band-aid to compensate for unexpected
degree of SOI specific leakage effects it is going to kill your transistor
performance and keep you from lowering Vdd.
I hadn't gone down this line of thinking before. Thanks spokeshave.
In a very short time there will be more x86-64 chips out than all other 64-bit microprocessors combined.
More than ~10m SPARCs and ~100m MIPS III & IV?
Quite the prediction!
Raluck, I sold my TMTA shares for $0.95. It was much less than I wanted, but I'm glad to be rid of them.
Smart move. Everyday TMTA reminds me more and more like TDFX.
I doubt anyone wants the CPU design team but perhaps Apple might
be interested in code morpher IP to improve emulation of x86 apps
on PowerPC.
Using your "logic", I have to conclude that for Intel to make a single Itanium is a mighty struggle.
My logic? I didn't disclose my logic.
Do you design full custom integrated circuits for a living? Have you done
product engineering and yield analysis? What about devising standard
cost models for ICs? If not you don't have the slightest clue about most
of the factors whose consideration went into my comment.
The AMD Opteron processor prices are listed as follows: the 1.4-GHz model 240 will
cost $283, the 1.6-GHz model 242 will be priced at $690, and the 1.8-GHz model 244 chip
will be priced at $794
If true the current process + current stepping yield vs frequency curve peak occurs
around 1.5 GHz or so. I guess we won't be seeing 2.4 GHz parts any time soon.
The problem with Hans's approach is that with a low resolution image of
a highly detailed object like a chip it isn't all that hard to "see" small features
you're convinced are there even if they aren't. Or misinterpret the nature of,
or reason for, a minor physical design detail.
I am not saying Prescott doesn't have latent 64 bit capabilities of some kind
(I'd be surprised if it didn't). But Hans's evidence is less than compelling.
Now that Intel has resumed shipping the 3GHz P4 with 800MHz FSB, the new SPEC targets
that AMD promised Hammer will beat have now risen.
P4 3.00GHz 800MHz FSB Base scores.
SPECint 1152
SPECfp 1201
I am looking forward to the new P4/3200. With an 800 MHz FSB performance
should scale quite nicely. I expect it to score at least 1206 / 1245 SPECint/fp.
Anyone want to bet that Opteron will reach 2 GHz before P4 reaches 3.2 GHz?
wbmw, actually, it depends on how efficient you want your EPIC code. A simple port in
the way you described can be done, but it does not take advantage of EPIC optimizations
based on the order of instructions. Without hand coding of critical paths you get a reliable
but inefficient code with parts of the CPU stalling because instruction queues get underruns.
This is nonsense. Existing Intel and HP compilers perform robust and effective code
transformations and optimizations beyond the capabilties of the majority of asm
programmers to readily understand let alone code correctly.
I have not read anything in connection with Itanium2 to see that the compilers have
improved noticably.
http://www.specbench.org/cpu2000/results/
With current production compilers Itanium 2 gets higher integer and FP SPECbase2k
scores than any other 0.18 um Al bulk CMOS processor, even the venerable out-of-order
execution Alpha EV68.
Opteron's memory controller has 200% of the bandwidth that Athlon-64's memory contorller has.
Theoretical peak, yes. In practice much less. Perhaps you should do some research on
performance characteristics of DDR memory systems under real life conditions as you
vary burst length. Opteron and A64 both use 64 byte cache lines so Opteron's memory
interface will operate with 4 word bursts vs 8 word bursts for A64. Bad news for Opteron.
On some data access patterns it may only get ~50% more memory throughput than A64.
Opteron ships only in multiprocessor configurations, in which a minimum of 2 chips are supported.
So Opteron itself is available only in a configuration with 400% of the memory bandwidth of Athlon-64.
Only one Opteron in a 2 way system is active? The data traffic from CPU A performed on
CPU B's local memory won't be offset by data traffic from CPU B performed on CPU A's
local memory?
That arithmetic stuff is really way above your head, eh?
That computer stuff is really way above your head, eh?
wbmw: Your posts are most helpful (IMHO), when you stick to engineering. When you
start talking about financial matters (operating leverage, balance sheet, special charges,
etc.), it becomes a slippery slope. FWIW. Dew
And the opposite applies to you.
64-bit Windows looks like a Q3 or Q4 thing. So desktops can't use 64-bits right away.
64 bit Windows is a Thursday thing. Just not for x86-64.....
Will the model 3200+ compare with the 3.2Ghz P4 with single channel DDR and 533Mhz FSB, or with dual channel DDR and 800Mhz FSB?
I don't think you need amazing powers of prescience to figure this one out.
AMD was making $1+ per share, per quarter, in that period.
With a highly manufacturable part in a mainstream process. I see reasonable
doubt that this description will ever apply to 130 nm Hammer. Unless it is ported
to IBM's SOI process and turned out of their fabs.
Why don't you find a former employee of Bipolar Integrated Technology (BIT) and
ask them how useful it is to have a competitive CPU manufactured in a boutique
process and that can't be yielded consistently.