Register for free to join our community of investors and share your ideas. You will also get access to streaming quotes, interactive charts, trades, portfolio, live options flow and more tools.
Register for free to join our community of investors and share your ideas. You will also get access to streaming quotes, interactive charts, trades, portfolio, live options flow and more tools.
And AMD. Note, if you will, that almost all of the product offered on Pricewatch is boxed product, not tray. So the majority of product is through authorized channels and not gray market.
That's proof? Counterfeiters around the world are sophisticated
enough to fake holographic sticker used in software packaging
and credit cards and duplicate laser marking of ICs. The use of
boxes rather than trays means nothing. When excess electronic
component inventory gets sold by an OEM it typically changes
hands quite a few times in a labyrinth of dealers and resellers
before reaching the final buyer.
AMD, where a similar proportion of the die area is cache memory
By my estimation the Opteron L2 is about 42% of the die. This is
a smaller portion of die than the 57% of the Madison die occupied
by the L3. The non-L3 area of the Madison is about 161 mm2, while
the non-L2 area of the Opteron is about 112 mm2.
When you factor in the higher defectivity associated with the
Opteron's SOI processing and three extra levels of interconnect
it isn't clear which device has lower yield loss from defectivity.
But when you include the higher parametric yield loss from SOI
specific effects the odds are very good the Madison has better
overall yield than Opteron.
Now, get a load of this: Every single one of the 34 servers with better price/performance (on TPC-C transactions) than that Itanium system is a Xeon system, except ONE!
That "other" one is not a Power4 (too expensive), but an Opteron system.
Wow that's incredible! That sure puts a crimp into Intel's plan
to replace x86 with Itanium 2. Wait a second, that isn't Intel's
plan, never mind. :-P
The non-L3 portion of the Madison is about 160 mm2. That is less than the non-L2 portion of the Willamette
P4
LOL. You are comparing .13u itanium core with .18u Willamette core. How about comparing it with .13u cores?
Once again you miss the point. The question was die size, not
feature size. To the extent defectivity is different between
Intel's 180 nm and 130 nm processes, it is probably slightly
lower in the 130 nm process because the overall industry trend
over time.
If Intel can turn out Willamettes in huge quantities for mass
consumer markets then it can certainly manufacture Madison, which
has less die area unprotected by redundancy than Willamette, with
incredible ease. I really don't think these concepts are all that
hard to grasp.
That's a new one on me but I'll take your word for it.
You can look it up in the I2 L3 paper from ISSCC 2002.
My observation has been that cache arrays yield better than their equivalent sized logic area would on a given process.
Like all modern large memory structures they are protected from
most point defects by the inclusion of redundant circuitry. If
a defect is detected during tested the affected portion of the
cache can in almost all cases be disabled and replaced by one
of the spare elements, a process called repair. This is also the
basis of the memory industry. Without redundancy and repair a
modern DRAM would be economically impossible.
Let's do a simple illustrative example. Let's take a 190 mm2
chip and compare it to a 380 mm2 chip consisting of a 190 mm2
core and a 190 mm2 L3 cache. For the sake of argument let's
say the chance of a defect in a 190 mm2 region is 50%.
The 190 mm2 chip would have a yield of 50%. If the 380 mm2 chip
didn't use redundant repair in its L3 it would have a yield of
25%. Let's assume that any given defect in the L3 has a 95%
chance of being repairable. The yield of the 380 mm2 chip with
L3 redundancy is 49%, almost as good as the 190 mm2 chip.
From the point of view of yield the L3 is basically "free" chip
area. Of course the 380 mm2 chip will still have only about half
the number of potential die sites on a wafer so it will have
about twice the die cost as the 190 mm2 chip, everything else
being equal.
A more complex and realistic example has to include Poisson
probability of multiple defects and the ability to repair more
than one defect. The Madison L3 can repair up to 4 separate
defects compared to 2 for McKinley.
Chipguy, SGI, Compaq/HP initially announced they were going to 100% conversion of their servers to IPF. However, as you well know, they have all backtracked and have announced dual product line strategies with a possible "eventual" conversion. Translation if the market accepts it we will convert to IPF.
What backtracking? They have stretched their old RISC product
lines to accomodate the slower than expected availability of
IPF software and still give their user base flexibility when
to switch.
There is no turning back. The Alpha EV8 was cancelled and the
design team went to Intel. HP isn't doing a new PA-RISC core,
they are simply shrinking and tweaking their 7 year old design
while most of their engineers work on new IPF processors with
Intel. Likewise, SGI is doing retread after retread of the MIPS
R10k design. That is why IBM is the only company still with a
competitive RISC product line going forward.
Dell also initially supported IPF and then dropped it and now has annonced support again. Wonder if they will drop t again?
That is a common misconception. Dell dropped their Merced
workstation line but continued to offer the Merced server on
their web site.
This includes all the results presented for Opteron, as well as some on other platforms. It will be interesting to see what the scores are after they are retested
Zeus had better change their web site. They still claim 3498 on
this benchmark.
http://www.zeus.com/news/articles/030623-001/
Somehow I doubt the new score will be higher.
here are some new submissions that HP has submitted for their new Integrity (Itanium 2) servers.
Hewlett-Packard Company HP Integrity rx2600 Zeus 4.2r2 2 1930
Hewlett-Packard Company HP Integrity rx5670 Zeus 4.2r2 4 3702
Hewlett-Packard Company HP Integrity rx5670 Zeus 4.2r2 2 1914
This is compared to a few new IBM submissions based on 2.8GHz Xeon MP:
IBM eServer xSeries 255 Zeus V4.2r2 4 2110
IBM eServer xSeries 360 Zeus V4.2r2 4 2174
This is even higher than IBM's own Power4 submissions (using 1.45GHz processor):
IBM pSeries 630 Model 6C4 Zeus 4.2r1 4 1988
IBM pSeries 630 Model 6E4 Zeus 4.2r1 4 1988
Thanks for posting the new results. I am suprised POWER4+ isn't
doing better on this, the I2 has nearly twice the throughput per
clock. IBM's deep pipe, 2 cycle ALU latency microarchitecture
seems to have a lot of performance holes. No doubt the 970 will
inherit its quirkiness. We will probably see G5 benchmarks all
over the map - great here, terrible there. I guess Apple will
tell us only about the great ones.
Even at the chip level, the Itanium die size is twice as big, which means AT LEAST 2x higher silicon cost, since 374 mm is an extremely large area to have no Si defects, as Elmer has often explained.
No it isn't and anyone who knows even the basics about chip design
practices would immediately realize why. The Madison die is over
half L3 by area and the L3 is protected from defect failure by
sub-block redundancy, a technique superior and more robust than
the row and column redundancy techniques used in uPs (and memory
chips) prior to I2. The non-L3 portion of the Madison is about
160 mm2. That is less than the non-L2 portion of the Willamette
P4, a desktop processor once made in the tens of millions per
quarter and even sold under the Celeron brand for under $100.
Any claim that I2 die size causes problems for production or adds
excessively to cost is pure nonsense.
But as Barusa correctly pointed out, in this time frame, x86 has been eating the lunch of the RISC chips vendors, and these vendors are one by one dropping out of the market.
And why are these RISC vendors dropping their proprietary line?
Because they are adopting IPF. Compaq, HP, and SGI decided they
couldn't afford to keep their Alpha, PA, and MIPS competitive
in the face of Intel's IPF family. Sun is still in denial with
their horribly obsolete SPARC family and IBM is rich enough to
cover all its bets - i.e. support both their own RISC and IPF.
Itanium is basically pointless, aimed at the market that is disappearing.
RISC based systems represented nearly a $20B market last year in
the teeth of a terrible IT spending slowdown. The RISC market
will continue to disappear in the coming years but that will be
due to IPF taking away market share.
Actually, at SPEC.org, they don't claim to have Madison-core Itanium servers either. ;)
They claim they will be available in six months.
You seem to have problems with either basic arithmetic or telling
the truth. The SPEC submissions for all three HP systems indicates
hardware availability in September.
Do you think Itanium 2 performance is so compelling that this cycle will be broken? Hardly, as far as I can tell. less than 10% lead on SpecInt, and even that is mainly because of > 300 mm^2 die and 6MB of L2, not because of the miraculous Itanium core.
Was 286 the highest performance processor ont there? Not really, but it was picked because it was compatible. You get the picture.
You not only don't have the picture but you are looking in the
completely wrong direction. Intel developed the IPF family to
compete in the high end, high value system market against RISC
processors, not to replace x86 (duh). That is why Intel has
never stopped developing new generations of x86 products. Let's
see how I2 stacks up against the fastest RISCs in SPECbase2k:
(percent faster int / fp)
106% /253% faster than HP PA-8700+/0.875
106% / 92% faster than Sun US-III/1.2
66% / 89% faster than HP Alpha EV7/1.15
23% / 33% faster than IBM POWER4+/1.7
With the exception of IBM who is hanging tough with POWER (for
now at least) Intel has the RISC market on the run and it is
only a matter of time and a bit stronger economy before IPF
starts taking significant chunks of market share away from RISC
in high end servers and workstations. Did you catch Sun's 4Q
results yesterday? Their product sales were down 20% YOY and
it is only going to get worse for them.
Running gcc compiled code on a Xeon? I am surprised the
Opteron couldn't manage a much larger lead.
Elmer, for SPECint2000, how is HP getting 1322 for a Madison 1.5 GHz, while SGI is only getting 1077?
Maybe it's the compiler; HP is using their own aCC compiler (new version 11.23 vs. 11.22 used for McKinley), while SGI is using Intel's Linux compiler.
Tenchu
The compiler is part of it but a bigger factor is the system
design. The Altix 3000 is a highly expandable HPC system. This
expandability and application orientation means the architecture
and memory system is designed for high sustained bandwidth, not
low latency. OTOH the HP systems uses a limited expandability
chipset (1-4P) optimized for low latency and high bandwidth.
BTW, there was a similar difference in SPECint scores between
HP and SGI systems with McKinley.
I once again I find it interesting that Ruiz
is bringing up the "takeover" word. It must be really buzzing around.
Probably by potential Opterons customers looking for a lot
more long term certainty before making a public committment.
Taking away the northbridge makes the whole design simpler - and the whole product more reliable.
Important item for a server, don't you think?
Perhaps for a third tier server OEM wannabe. But from the
perspective of a first tier OEM like IBM, HP, Unisys, and
SGI this is a major disadvantage because it limits the
ability to achieve product differentiation by innovation.
It is no accident that IBM, SGI, HP and others have their
own chipsets for IPF that have unique capabilities and
feature sets for their intended markets rather than use
Intel's own 8870 chipset. Intel realized that it had to
leave room for OEMs to differentiate their products to be
able to sell IPF against proprietary RISC processors. OTOH
AMD targets lowest common denominator commodity SHV
model, not high end RISC. Different market, different needs.
BTW, what was their prediction of last year? Was it met?
It is hard to tell, the bar is really short.
It looks like 20k to 25k units. Was it met? I dunno. Doesn't
sound too far off reported figures.
BTW, in February MDR named A64 PC processor of the year for
2003. Perhaps they aren't any better than Dataquest or IDC at
estimating the future. :-P
chipguy, I agree, 6 meg of cache gives a lot of bung. Next time they will come with 9 meg of cache, mind you. Not that huge cache is designed to hide the bad core design and ambigous instruction set, but more to highlite who is the boss in this town.
There is a bit more to it than that. A 50% larger cache reduces
memory traffic by an average of about 20% for a given level of
performance. Conversely, it allows about 20% higher performance
for a given amount of memory traffic. A 1.8 GHz/9.0 MB I2 has
roughly the same average bus activity as a 1.5 GHz/6.0 MB I2.
In other words, Intel will provide OEMs a performance upgrade
path that preserves their investment in chipset, board, and
system design.
OTOH, the Opteron approach focusses on the memory interface. In
order to support higher performance in the future AMD has said
it would turn to DDR2. That means OEMs, such as there are, have
to redesign their system boards.
The documents that chipguy quoted earlier today show that the OS has to have an awareness of iHT in thread scheduling. That is a significant burden to shift to the OS, and explains why Intel (who tends to be hardware-centric) is confused as to why support from Microsoft on an application basis is not forthcoming. They really don't seem to understand the implications of what they created.
Well MS is a big boy and can decide what it will and won't support
and how well. But if MS doesn't want to provide a good OS solution
for commercial servers based on Xeon processors then there are
others who will happily exploit the opportunity.
http://www-106.ibm.com/developerworks/linux/library/l-htl/
This does put Intel at a disadvantage, however, as Itanium2 is aimed at a higher price, lower volume market than Opteron. If Opteron takes 10% of the Xeon market next year then I think that will be very successful. This is likely to be far more units than Itanium2 will sell.
FWIW, in its DEC 2002 report on Intel's Processor business, MDR
(the guys who publish Microprocessor Report) predicted 2003 sales
of IPF processors of ~150k units and 2004 sales at ~375k.
Excluding A64 sales, is anyone predicting *Opteron* sales of a
similar order? I haven't seen any figures from any of the market
analysis firms.
We'll have to watch the numbers and see what happens.
Indeed.
Well they can't do that and keep the 6M L3 cache! Without it, and Madison is below Opteron in most server benchmarks. Wonder what the die size on that sucker is.
Madison is 374 mm2, a little less than twice as large as an
Opteron. Given that it uses bulk CMOS, 3 fewer interconnect
layers, a high fraction of die area protected by redundant
circuit structures, and is built in a 2 year old 130 nm process
bought and paid for by x86 sales I'd bet Madison costs Intel
less to make than it costs AMD to make an Opteron with all
its SOI baggage.
The fact that Intel is willing to laser off 4.5 MB of cache,
as much cache as present in 4 Opterons, from a Madison die
and sell it as a Deerfield for ~$750 later this fall shows
the kind of cost margin Intel has with this device.
I suspect you are considering the move from DDR333 to DDR400 as offering a significant latency improvement, but what you fail to realize is that the latency reduction comes from a single timing, and it does not decrease the overall latency by that much.
Well, Cas 2.5 to 2.0 is 20% a reduction.
ROFL. Why don'y you start parking your car 20% closer to
the end of your driveway and see how much that speeds up
your commute to work.
Hey boys and girls, let's have a look at state of the art in
130 nm merchant 64 bit server chips from Intel and AMD and
compare throughput in 2P and 4P systems shall we?
http://www.specbench.org/cpu2000/results/res2003q3/
SPECint_rate_base2k/SPECfp_rate_base2k
2P
HP rx2600 30.5 / 42.4
Einux 4800 25.0 / 24.7
4P
HP rx5670 60.0 / 66.4
Einux 4800 46.1 / 44.2
Madison reportedly gets a speed bump this fall. Anyone think
that Opteron will get to 2 GHz before then? Not that it would
affect leadership on these benchmarks mind you.
So theoretically, something like this can probably be implemented, but it looks like a mess to me.
Have you ever written a multitasking kernel? This is a trivial
change for the scheduler. You want to talk tricky then look at
the exeception code for an i860XR or an v7 SPARC.
The idea of the operating system disabling HT for some applications and enabling it for others is self-contradictory, which is why no operating system fulfills your pipe dream.
Sorry but the facts directly contradict your misguided and
simplistic notions:
http://www.intel.com/technology/itj/2002/volume06issue01/art01_hyper/p09_task_modes.htm
"To optimize performance when there is one software thread to execute, there are two modes of operation referred to as single-task (ST) or multi-task (MT). In MT-mode, there are two active logical processors and some of the resources are partitioned as described earlier. There are two flavors of ST-mode: single-task logical processor 0 (ST0) and single-task logical processor 1 (ST1). In ST0- or ST1-mode, only one logical processor is active, and resources that were partitioned in MT-mode are re-combined to give the single active logical processor use of all of the resources. The IA-32 Intel Architecture has an instruction called HALT that stops processor execution and normally allows the processor to go into a lower-power mode. HALT is a privileged instruction, meaning that only the operating system or other ring-0 processes may execute this instruction. User-level applications cannot execute HALT.
On a processor with Hyper-Threading Technology, executing HALT transitions the processor from MT-mode to ST0- or ST1-mode, depending on which logical processor executed the HALT. For example, if logical processor 0 executes HALT, only logical processor 1 would be active; the physical processor would be in ST1-mode and partitioned resources would be recombined giving logical processor 1 full use of all processor resources. If the remaining active logical processor also executes HALT, the physical processor would then be able to go to a lower-power mode.
In ST0- or ST1-modes, an interrupt sent to the HALTed processor would cause a transition to MT-mode. The operating system is responsible for managing MT-mode transitions (described in the next section)."
http://www.intel.com/technology/itj/2002/volume06issue01/art01_hyper/p10_os_apps.htm
"Operating systems manage logical processors as they do physical processors, scheduling runnable tasks or threads to logical processors. However, for best performance, the operating system should implement two optimizations.
The first is to use the HALT instruction if one logical processor is active and the other is not. HALT will allow the processor to transition to either the ST0- or ST1-mode. An operating system that does not use this optimization would execute on the idle logical processor a sequence of instructions that repeatedly checks for work to do. This so-called "idle loop" can consume significant execution resources that could otherwise be used to make faster progress on the other active logical processor.
The second optimization is in scheduling software threads to logical processors. In general, for best performance, the operating system should schedule threads to logical processors on different physical processors before scheduling multiple threads to the same physical processor. This optimization allows software threads to use different physical execution resources when possible."
I told you how the P4 SMT HW
works. If MS does or doesn't support a specific feature in one
or more of its OSes at any given instant in time isn't my concern.
Finally! Even if it is not your concern, I think it would be fair to point out that the feature cannot be used in current version of Windows, if you were interested in this forum being an information exchange.
LOL, I am certainly being held to high standards here. I say
something about a feature in a microprocessor and instantly
the onus is on me to give a complete rundown on which OSes
support it and which don't. Meanwhile you don't seem to be
overly concerned about even the general accuracy of claims
made on behalf of AMD processors by AMD enthusiasts let
alone the comprehensive and up to date status of SW support.
What a hypocrite! :-P
With that said, there is no need comment further.
Nice try Joe.
BTW, chipguy knowingly posted this half-truth here, and never corrected it, even after prompting to do so, which gives you an idea about who is posting here in good faith and who does not.
Don't misrepresent others please. I told you how the P4 SMT HW
works. If MS does or doesn't support a specific feature in one
or more of its OSes at any given instant in time isn't my concern.
BTW, do you also think Intel was lying about the 32 bitness of
its 386, 486, and P5 processors before Windows 95 came along? ;^)
I was merely pointing out that what is accepted or not accepted in the RISC world has no relevance to the efficacy of AMD64 vs. IA32.
Sorry to intrude in your dellusion but pointers in 32 bit code
occupy 4 bytes in both x86 and RISC software and pointers in 64
bit code occupy 8 bytes in both x86 and RISC software. In both
cases the larger pointer size can increase program data memory
usage by 10 to 30% with an impact on performance ranging from
trivial to significant (5 to 10% or more).
You are being ludicrous. No RISC has a double speed ALU
either but that doesn't mean it is not a useful feature.
Its obvious that the existence of a single bad port proves absolutely nothing about the average improvement. My guess is that the 32-bit client has a significant amount of assembly language, but the 64-bit client reverted to an un-optimized C++ version so they could do the "port" by just setting a compiler switch.
It was probably slowed down because the code size was 10 to 15%
larger and the program used 15 to 30% more data memory to run
because of larger pointers. This increased cache and TLB misses
relative to the 32 bit binary.
These typical code and data expansion figures are well documented
in "Porting GCC to the AMD64 architecture" by Jan Yubicka in the
proceedings of the 2003 gcc Developers Summit.
Of course, one major disadvantage of P4's hyperthreading is that you either enable it for all programs or you disable it.
Not true. More nonsense from you. First you claim there are SMT
capable RISC processors and now this. Are these deliberate
lies or simply the result of massive ignorance about modern uPs?
It was accepted in the RISC world that SMT didn't always improve performance
LOL, nice try. Name a RISC processor shipping today that uses SMT.
This is not an isolated case. I have seen different reviews
of the Opteron where a fraction of 64 bit versions of apps
were slower, sometimes significantly, than the 32 bit versions.
This is accepted in the RISC world but some AMD enthusiasts
seem to feel that 64 bits is all upside for Hammer. This makes
for a generic knee jerk response of "Sure the Opteron isn't
the fastest on X but that will change when there is a 64 bit
version of X". I simply think it is time to pop this balloon.
Even the "fastest AMD processor on the planet" would have been nice. I guess it's only best suited for server apps, at least until (if) the frequency ramps.
Remember when the critics hammered Intel by claiming that
the new Willy P4 was slower than the PIII? The arrival of P4
optimized code made it clear that this simply wasn't the
case. How many ISVs will go to the effort to optimize 64
bit apps for Opteron, a processor with approximately 0%
market share? Also any effort must surely fall well short of
the huge performance payoffs that P4 optimizations brought.
This reminds of the old joke about some society matron
confronting a drunken Churchill. She says "you are drunk!"
Winnie replied "So, you are ugly and tomorrow I'll be
sober".
SZ, I ask you - how can Opteron be slower than Athlon?
Glad you asked. Opteron is a 64 bit machine that runs 32 bit code very well. You might want to study the circumstances around the transition from 286 to 386, 16/32 bit respectively.
Few if any 32 bit programs by definition are optimized for Opteron hence the '32 bit'. Not that they couldn't take advantage of the flat memory adressing. But, that would be superflous effort, since 64 bit applications will have that advantage and much more.
Hold on now, this flip-flopping has me confused. I thought K8
was so great because of its super duper performance on 32 bit
code. Now that is shown to trail P4 and even Athlon the story
changes to oh no, it needs 64 bit software to show its stuff.
Someone on Aces's showed that the 64 bit SETI client app was 9%
slower than a generic 32 bit client on Opteron. The appologists
then say it is the wrong kind of app for 64 bits (doesn't use
enough registers, or SSE yada yada yada). Well it sure the heck
is the right kind of app for IPF style 64 bits because an old
900 MHz McKinley ripped the Opteron a new one on SETI, more
than twice as fast as Opteron running either 32 or 64 bit code.
Even whipping boy Itanium/Merced was about a third faster.
Chipguy: The funniest thing was VANS FUDWARE GUIDE publishing a review showing the latest VIA core beating up on Transmeta. YOU KNOW YOU SUCK when VIA is taking you out behind the wood shed and beating you down
Oh my, the Crusoe is even worse than I'd imagined. I wonder if
Van spiked the test by not training the CMS on the benchmark
code before measuring performance.
You are right, this is quite an ugly collection of cellar
dwellars and Crusoe looks like the worst of the worst.
Actually I searched both Google and Yahoo and was suprised that there really aren't many benchmarks of any kind.
Do you think it is TMTA's interest to spread around samples
of its weak and underperforming processors to enthusiast web
site hardware reviewers to play with? They even turned down
Micro Design Research (publishers of Microprocessor Report)
when they wanted to borrow a Crusoe evaluation system to test.
This is a company that avoids open standard benchmarks like
the plague and probably for good reason.
so it seems they're only going to target the 10 lb. "desktop-replacement" behemoths.
I bet AMD could pick up the Kaypro brand name to go with it
for a song. The ideal marketing campaign is to get an ex-USSR
womans shot-put team member in a tank top to pick one up and
heft it into a groaning airplane overhead bin. She would say
in a thick slavic accent "AMD Kaypro with A64 technology:
strong like tractor but half the weight".
Speaking of TMTA, has anyone seen any SETI at home benchmark
scores for any model of Crusoe? Just curious.
That is surprising. I never thought that Banias ever claimed to be a pro at high bandwidth floating point. It was always supposed to be an integer champ. So what's your theory on how a 1.5GHz Pentium M can outperform a 1.8GHz Opteron by more than 46%?
That's a tough one. The Banias has a special loop detector in its
branch predictor. It can perfectly predict loop branching for
much higher branch counts than conventional schemes. Intel has
disclosed how it optimized accesses to the memory stack. It is
possible that they made similar optimizations to the x87 stack.
It is also hard not to think that the FPU was improved compared
to either the P6 or P4 core, perhaps fully pipelined. Too bad we
don't have a SPECfp2k submission with breakdown across the 14
apps. It might provide a clue to Banias's amazing performance on
SETI at home.