Register for free to join our community of investors and share your ideas. You will also get access to streaming quotes, interactive charts, trades, portfolio, live options flow and more tools.
Register for free to join our community of investors and share your ideas. You will also get access to streaming quotes, interactive charts, trades, portfolio, live options flow and more tools.
Funny how the very same blog entries, Oracle says that they will continue using Opteron for is ETA storage server line and its X4100 and X4200 server lines. Its just more of the same FUD (mostly garbage) from you.
Pete
•500 ATI Radeon™ HD 5970 graphics cards - each with 2.7 TeraFLOPS of processing power
That should read each with 4.64 TeraFLOPS of processing power
Each 5970 contains two Cypress GPUs each running at 725MHz each of which has 2.32 TeraFLOPS of processing power (SP) or 464 GigaFLOPS (DP). That yields a FRC server with 2.32 PetaFLOPs (SP) of processing power from the GPUs. I think they confused the 5970 with the 5870 which runs at 850MHz and gets 2.72 TeraFLOPS SP and 544 GigaFLOPS DP.
The 250 61xx class CPUs are 12 core MCs with up to 2.3GHz clocks means 3000 cores. Thus 250 12 core CPUs at 2.3GHz means 27.6 TeraFLOPS of DP and 55.2 TeraFLOPS of SP which would make a top end FRC server have 2.38 PetaFLOPS of SP or 0.492 PetaFLOPS of DP processing power.
Funny how the Nehalem still isn't shipping in 4S platforms? And Nehalem sales overall are still less than 2% of the x86 market and aren't planned to go above that this year.
And you still misquote me by stating something I did not say. I stated that Nehalem will have to go through 1 year of validation by server customers after it starts shipping to see meaningful sales. How does that equate to "not shipping"? Frankly it doesn't, except in some Intel Booster Dream you keep having.
Meanwhile, Istanbul is having meaningful sales less than a quarter after its launch with shipments.
And that breaks your hearts and dreams.
Elmer:
Then the post should have stated phase change, not "chilled air cooled". I nor most anyone else would classify that as air cooled per standard definitions.
On the previous post, I showed LN2 Ph2 940 getting 5+GHz.
Overclockers at the SF AMD event (12/6/08) had a competition for who could get the highest overclocks with stock HSFs and with LN2 given 5 random Ph2 940s on a retail 790GX MB (Gigabyte's 790GX IIRC) in just 30 minutes. The best stock HSF was 3.94GHz and the best LN2 was 5.5GHz matched by 4 teams. The 6.3GHz LN2 overclock Ph2 940 ran Primes in 6.33 seconds. The 5.8GHz overclock for Ph2 940 on LN2 was for a long time. People got to play Crysis on that system to get the feel so it likely was stable over many minutes at a time (as long as someone fed it LN2).
Pete
Dear Mas:
Here is a 5GHz overclock on LN2 (see post #55):
http://oktabit.foracamp.gr/content/first-look-phenom-ii-x4-940-english-version?page=5
Screen shot of CPZ v1.49:
http://oktabit.foracamp.gr/files/Phenom_940_LN2_5GHz_logo.jpg
Pete
Dear Mas:
Wbmw failed to see the phrase "chilled air" for those 4.8GHz "on air" benches. That means something was supplying colder than ambient air to the HSF. What did they do? Put the test box in a deep freezer? Those can go below -40F. Much colder than a 72F ambient and not what anyone else would call "air cooled". Most of the others in that area were LN2 or dry ice (Frozen CO2).
Pete
Subzero:
Your name looks to be proof of your lack of memory in this instance. Lets see, the JFTC and the KFTC both have charged and convicted Intel of breaking their Antitrust laws. The EC staff has charged Intel of many acts of Antitrust. The Commissioner hasn't judged on them yet.
Typical Intel boosters try to "pooh pooh" those away. Elmer, for example, tries to stick his head in the sand by claiming these systems have got to work like those in the US. When its pointed out that these are foreign countries and do things differently, he sticks in ear plugs, closes his eyes, hums real loud and then tries to claim otherwise.
Now Intel is trying to do the same wrt to the European Competition Commission. Likely we will hear the, they don't work like US courts blather fom their boosters. This Commissioner is known to hit companies with hefty fines and back them up with more, if they don't stop what they are doing. I guess that Intel is reconciled that they will lose those cases and be hit with some record breaking fines.
Pete
Elmer:
You are the one claiming Intel is making money on Itanium. Its your job to prove it, not mine. I just proved that it wasn't making money overall.
Pete
Subzero:
What is noney?
If that is representative of your writing skills, perhaps you should better take remedial english first.
Most of that dissipation is due to Intel's criminal acts. If those would have been timely stopped by the FTC and others, Intel's market share would have dropped to 50-60% during the P4 years. And there would not have been any profits to cover Itanium's stink (losses). It would have been dropped as yet another bad idea.
And you denigrate AMD's designers who did what Intel claimed couldn't be done, add 64 bit capabilities to x86. Their process people run a "World Class Fab" according to the industry. Their managers have done what no others could, compete with Intel, even with all those criminal acts.
Have they made mistakes? Sure they have. Buying ATI too soon was one of them. If they would have waited a few months, they could have picked it up for less than half of what they paid. But so has Intel. You forget those dumb things Intel did like try to make consumer display chips, sell web site services, the RAMBUS fiasco and many others. They destroyed far more money than AMD ever did in those failed ventures.
The claim of world's most successful semiconductor company forgets those businesses that make money off the stuff those semiconductors go into. By that definition, IBM is more successful than Intel. Their Power series of semiconductors made them over $10 billion in 2007 alone. Their market cap is bigger than Intel as well, $106 to $73 (billion).
As far as Barcelona is concerned, its still stomping all over Xeon in 4P. Shanghai just hits harder and wider.
Pete
Chipguy:
IBM pays for its microelectronics division through sales of system integration and software. They could set any price for the CPUs even below cost and still make lots of money at the bottom line. That the division is losing money is an artifact of IBM's accounting methods, not its real worth to IBM. Without a unique Power CPU, they couldn't justify the high prices of their proprietory software and services. So its used as a loss leader. Its just like elevator companies that can give you the elevator for free including its installation. They make it all up and lots more in the required service contracts.
AMD has paid for process development and equipment purchases through AMD64 CPUs. That it is losing money is simply due to Intel's criminal acts. Without those, AMD likely would have been in a duopoly with Intel with a far larger share of the market. In part, those criminal acts helped Itanium. Because without them and the illegal profits procured thereby, Itanium would have died.
A good deal of 32nm process development has been paid by AMD64 CPUs to date. Without the ill timed purchase of ATI (a few months later would have halved the purchase price), AMD likely could have funded process development through Malta's production. As it is, this is not much different than what IBM does with its fabs. Use outside purchasers of processed wafers to help defray its costs. TFC goes one step futher, in having those customers help defray process development costs.
In another few process generations, even Intel will have to go this route. Intel is balking at equipment makers requests to directly fund 450mm wafer processing equipment development. Just like it balked at paying for EUV scanner development. When even a large fabricator like Intel, needs only a dozen or so top of the line scanners to equip all its fabs, its hard to get repaid for the equipment makers development costs.
As for that quote about profitability, I notice he didn't say what expenses he subtracted from revenue. Its just like claiming prison labor is free, when the costs to house the prisioners, to transport them to and from their holding cells, to gaurd them at the jobsite, to feed them and procure them are all handled by some other agency. Sure on the company's books its free, but in real life, it isn't truly free.
Like I said, I'm sure that they didn't take into account past development costs in that profitable statement. They don't portion out labor bonuses, options and the like. They likely didn't include any expenses for process development or equipment depreciation. They probably don't portion out the MG&A either.
Without any specifics about what is and is not included, leaves great gobs of vagueness in that statement. Its just like the State Governor who states that he lowered your taxes. After reading the footnotes, he didn't include the automatic raising of gas taxes, impact fees, expanding the coverage of sales taxes, raising licensing fees, raising permit fees, adding a wheel tax, imposing a waste reduction tax, adding a snow removal fee, boosting the excise fees and raising processing fees. After all that, one knows that the State Governor raised taxes, just not the one on income. Or Congress saying they cut the budget when they really meant, that they cut the rate of its planned rise.
I take such unjustified statements as being a load of manure. If you take them at face value, I have the world's "longest" bridge to "sell" you.
As far as your Cherry statement, its really a Corpse flower stuck on the gross margin. When it is opened (blooms), it stinks to high heaven. It always causes the gross margin to sag.
Pete
Elmer:
Easy, just look at what it cost to develop Itanium and the multi-billion dollar commitments, years ago. The sales, haven't been high enough to pay for that alone, yet.
So the not paying for its development is relatively easy to prove.
The not paying for process development is easy too. By the time they relase at a process node, that process has been in production for 2 to 3 years. By that point, it is a fully matured process, ergo, no development is needed.
Ditto for equipment. Even Intel states this from time to time ("using depreciated equipment on a mature process"). How any reasonable Intel booster could fail to see these points, just shows how out of touch they must be.
As for labor bonuses, options, etc., Intel puts them into "other" rather than portion them to each business unit. That is why "other" is such a large money losing "business". As long as the AMD64 (EMT64) side of the business makes lots of money, this is allowed to slide since all figure this is picked up by the "bottom line". Typical thinking that "profits cover many warts".
That is straight from their earnings reports, 8Ks and 10Ks. But you, supposedly a long time, viewer of such things, quickly forgets such notes, because they go against your rosy view.
If Itanium is making money, then why doesn't Intel seperate it into its own business unit? Because it would then become obvious to all that its a money loser, even after all of the breaks given to it. So they use the sweet smell from the AMD64 side to cover Itanium's stink.
Pete
Chipguy:
They didn't beat them at the same time. IPF loses to Xeon, time and time again. IPF loses to Opteron, time and time again. IPF loses to Power, time and time again.
Trying to put where Xeon, Opteron and Power were two three years ago, now isn't much of a feat. And comparing where Xeon is now when sometime in the future, IPF comes out is just more of the same fallacy.
It is Mid Q3, 2008. IPF still loses to Power. It still loses wrt Xeon or Opteron.
18 months ago, the market didn't implode. Looking forward that far gets lots of changes when it actually arrives. Heck, a few months ago, Intel said they would have $10 billion in Q4 sales. Now they wonder, if they will make their mid quarter update of a $9 billion target. They are cutting expenses, delaying products and doing other things to save money.
Only in your biased view is everything rosy for Itanium. The real numbers do not support your view.
Pete
Elmer:
Only in Intel boosters minds is it profitable. It doesn't pay for any process development, unlike X86, AMD64 or Power. It doesn't pay for its development, unlike the others. Its doesn't pay for the equipment it is made on, unlike the others. It doesn't pay for the indirect labor charges, unlike the others. It doesn't pay for MG&A, unlike the others.
Sure it might be profitable, once you remove tons of expenses. But then a whole bunch of unprofitable products would too. Itanium's ROI is negative and it isn't getting any better.
To show just how unprofitable it is, think what if Intel couldn't make any x86 or AMD64 CPUs anymore (for some reason), would Itanium be able to support Intel all by itself? Anyone with any sense would yell, heck no!
And in a prolonged downturn, most companies would be getting rid of dead weights. Itanium is a likely target.
Pete
Pete
Chipguy:
So freaking what! Itanium has to be behind in process because it can't fund process development, equipment purchase or anything like that. Even after that, it needs its development to be covered (written off via) the x86 cash cow. It still isn't making money, unless a portion of its workforce gets paid bonuses by the cash cow x86 line.
That extra time (2-3 years) allows it to get more out of the older process by throwing lots of transistors into its huge dies. If process generations would slow down, then Power would also get that time to redesign with a more mature process and would still show up Itanium.
And Tukwila will again be slower than planned, so they will lower the plan by updating their memories of that plan. The point is that at this time, Itanium is slow and way behind Power6. Like it was before and will continue to be. Then your flawed comparison would be equal clock speed.
Elmer:
AMD lost billions because of the ATI purchase was ill timed. That is not to say that it won't be a good purchase in the long run. We shall see as it normally takes 5 years for such thing to bear fruit.
I could say that Itanium was a multi billion dollar bust. The only way its making money was to write off the large development expenses and have most of its expenses borne invisibly by the x86 cash cow. In any other company, it would have been long gone.
That said, Barcelona did relatively well. It still beats Dunnington in 4P SPECfp_rate2006 (170 vs 156) and blows it away in Virtualization work. Pretty good for a "failed" CPU.
Shanghai is better though. While it can be plugged into any Barcelona server and workstation socket, Nehalem can't do the same to any current C2Q server or workstation. So even when Nehalem servers and workstations come out, there will be a 12 month delay before purchases will ramp up due to customer validation of the platform. Shanghai thus will have a short validation period as only the "core" change needs to be validated. So Nehalem might dent Shanghai sales in Q2 and Q4 2010, Shanghai has 1-2 years to get better too.
Thus, you are comparing a 1 year into the future CPU to a currently sold one. That is a constant fallacy both sides are guilty of from time to time. If we reverse the look, 2.7GHz Shanghai does what to year old Xeons (73xx and 53xx). It beats them badly in 2P SPECint_rate2006 (136 versus 116) and at 4P SPECint_rate2006 (249 versus 214). SPECfp_rate2006 is no contest at 2P (210 versus 119) and at 4P (118 versus 67.3). Should I be saying that Shanghai beats Clovertown by 2X in SPECfp_rate2006, so Clovertown is destined for the junk heap and Intel with it? No, that isn't a good characterization and neither is yours with Nehalem.
Pete
Elmer:
Your backing of SPECxxx_rate_base2006 is not how most compare systems for HPC work. Because of the way this benchmark has evolved over time, it is fairly meaningless for most HPC users who will test each platform in their own environment and see which is both fast and, usually, cheap. For most others, even "base" is far from what they find in the real world when they use their production compiler (gcc or MS C (and variants)) and how the resulting software performs (this was the original goal of SPEC in the first place before being co-opted by OEM marketing departments). So "base" overstates what a normal developer would get for the typical environment and "peak" is far from the most an advanced savvy developer will get from a system.
The more typical server buyer is like what Dan3 does on SI, he gets each marketed system to test using his environment and their applications and see what their performance and how much would the system set them back for money and power used. Then after testing samples from each supplier they look at (or who wanted the sale), they choose the one that gets them the most performance for their budget. Frankly almost everyone says that is the best way to do it. SPEC CPU2006 might help get past that initial cut of who to bring in to test, just like TPC or SAP do, but its part of the beginning and nowhere near the end.
For the rest of us, its a debating point. I choose that because of Intel's "cheating" of flag handling, base is more meaningless than ever before (sometimes it got to the point that base and result scores matched exactly, which shouldn't happen in the real world). The standard result is closer to what savvy users would get, but given the tendency of SPEC not allowing the use of typical numeric libraries like BLAS is far from what starting HPC developers would get much less the savvy ones. Thus the result score gets closer, but isn't really indicative of real world results. Thus, that is the one I, reluctantly, use. And if you really check back, I have always used "peak" instead of "base" to compare CPUs.
The point is that Nehalem isn't out yet and the scores it gets have to be taken with a large dose of salt. Just look at SPECint_2006. At 24 cores, the top Xeon gets 25.5, at 16, its 25.0 (from 2.67GHz Dunnington to 2.93GHz Tigerton), at 12 there isn't one, at 8, its 30.3 (3.33GHz X5470 Harpertown), at 4, its 30.2 (3.5GHz Wolfdale, 3.2GHz i7 965 gets 33.6), at 2, its 26.3 (3.13GHz E3120 Wolfdale), at 1, its 17.4 (3GHz Woodcrest 5160). Generally the speed drops as the core count goes up (and the clock as well). Thus extrapolating speed from 1P to 4P is not 4x because the clock goes slower, some memory contention happens and cache coherency checks slow it down more. Else there would be no slow down as the core count goes up.
Given the above, Nehalem will not get 2x Shanghai when they actually meet. Not in typical server type loads and not in HPC work. Actually savvy HPC developers would likely get a bigger boost going to a GPGPU than by going to a higher speed CPU. A $500 GPGPU crunches numbers (480 DP Gflop/s (2.4 SP Tflop/s) far faster than a $500 CPU can (12 DP Gflop/s).
Pete
Chipguy:
That world is currently here.
At the 4P level, Dunnington only gets to 2.67GHz and Harpertown Xeon 7300 series to 2.933GHz. Funny how AMD's 2.7GHz Shanghai beats both of those Xeon quad core CPUs in 4P SPECxxx_rate2006. Only the 4P/24C variant gets higher than 4P/16C Shanghai in SPECint_rate2006 and not by that much (18%) given the 50% more cores advantage at roughly the same clock. Of course, 4P Nehalem is nowhere to be found.
Too bad we live in the current real world, not one many months into the future.
Pete
Elmer:
Where are the SPEC submissions then? Intel boosters say 2P i7s will be out in Q1/09 which starts in less than 2 months and no more than 4.5 months to the end of Q1/09. As of this time, all Intel 2P submissions need 50% more cores and a lot more clock just to get close to AMD in 2P SPECfp_rate2006. And at 16 cores, AMD wins against all Xeons in SPECint_rate2006, 249 (AMD and HP submissions both) to 221 (Intel only (all other OEMs are lower)). 4P Nehalems might be higher, but in Q4/09, 4P Shanghai likely will be substantially higher too.
Intel does win in that 4P/24C Dunnington scores 294 (IBM with their Hurricane chipset), but the 8P Shanghai scores are not yet submitted by AMD, HP or Sun. Likely those 8P/32C 2.7 Shanghais (lowly "failed" 8P/32C 2.3 Barcelonas scored 280) will beat Dunningtons in SPECint_rate2006.
Some trouncing when Intel gets beat by AMD in SPECxxx_rate2006.
Of course that may all change with SPECxxx_rate2009(2010?).
Pete
Ephud:
Lets look at each submission instead. Results come first and then base below. On my lists, where I don't display base, ridiculous as base doesn't mean much especially given the way Intel's SPEC special icc and ifc compilers act (although other OEM built compilers may have the same tendencies), the result scores mean more. Unless you use gcc or other third party compiler for both, base is typically poor to compare systems using normal developer usages of compilers (compile using make, test application, fix source and make scripts, do again until tests pass).
As for the score, the SPECint_rate2006 score of 125(117) score of the 3.2Ghz i7 965 is worse than the 2P 2.7GHz Shanghai score of 136(113). Using the same compiler as the third party one for Shanghai of Pathscale's v3.2 and Portland Group Inc's v7.3, likely the base and result scores would be wider for Intel. I also notice that Intel's submission was all 32 bit code whereas AMD's was mostly 64 bit. The Pathscale and PGI compilers are far more widely used than Intel's. And wrt to Microsoft, GNU (Gcc and variants), HP, Sun and IBM compilers, Intel compiler's share is tiny.
Of course that was desktop memory without ECC for i7 and over one day, some 12 bit changes during reads occurred whereas the AMD submission using 32GB likely won't have any unfixed bit errors over years of operation (fer more certainly any bit error that were not found). Phenom II will likely have higher clock and faster PC2-8500 CL3 memory and no need of cache coherency as the 2P had to endure in a long tested OEM platform.
BTW, given that no 2P or 4P i7 scores were submitted to SPEC, it looks like Intel is assuming 2P Nehalem won't arrive for at least 6 months. That puts it into late Q2/09, a full quarter beyond what the Intel boosters would like. Also I notice that a good deal of the 77 errata on i7 stepping C0, involve data corruption in some way or lock ups, with a possible BIOS based "workaround" for some of them! Perhaps they are waiting for a new stepping to fix these nasty errata. Given how BIOS workarounds usually decrease substantially performance (one only need to look at Barcelona stepping B2 to see how), this reported 160 in SPECfp_rate2006 could vastly overstate the performance of a production CPU based system.
OTOH, with HT3.x (possibly DDR3) with dual HT3.x links between the two CPUs and the clock boost of the Opteron SE CPUs, the Opteron scores may get a substantial boost as well.
As always, we need to wait and see.
Pete
Ephud:
Base isn't before Result, its below Result on any single submission. Its also why SPECfp_rate2006 is the Result (previously referred to as Peak) and base has got another symbol as SPECfp_rate_base2006. Thus any discussion about SPECfp_rate2006 necessarily talks of the Result score, not base. How its put into tables isn't relevant, especially since the configurable query form allows one to not display the base score in any table.
Also no ECC registered DDR3 DIMM is currently available for purchase. No server would be run using non ECC memory as a common rule of thumb is one soft error per GB per day. Most servers run 24/7, thus a test 24GB server would have 24 errors every day, something that would be unacceptable for any mission critical server or HPC type work.
The three i7 submissions (920, 940 and 965) all used PC3-8500 CL7 memory. Registered ECC memory with the same DRAM as base would be CL11 and may be slower to boot, such as PC3-6400 making the same speed as PC2-6400 CL5 ECC registered DIMMs used in the Shanghai entries. Either of these would drop scores and both changes would drop them more. And that is on top of any cache coherency overhead from simply going from 1P to 2P.
I also notice that no system configuration as to what the specifics of the memory used in that estimate other than it was 24GB of memory (I assume 12 DIMMs of 2GB Samsung DDR3 PC3-8500 CL7 memory as was used in the i7 submissions) for that 2P server.
One other note from the i7 submissions is that SMT is turned off in 2 of the result subtests (410.bwaves and 450.soplex) having only 4 copies run instead of 8. So they get a higher result in those without one logical core trouncing on the other. For the SPECint_rate2006 subtests, they turn SMT off for one there too (429.mcf). Thus SMT isn't a win some of the time.
Pete
Elmer:
As usual your claims leave a lot to be desired on accuracy:
Top current 2P Opteron 2384 SPECfp_rate2006 score is 118/105, not 108/105:
http://www.spec.org/cpu2006/results/res2008q4/cpu2006-20081024-05684.html
Only current 1P i7 965 score is 86.1/82.9:
http://www.spec.org/cpu2006/results/res2008q4/cpu2006-20081024-05710.html
Only current 1P i7 940 score is 82.3/79.2:
http://www.spec.org/cpu2006/results/res2008q4/cpu2006-20081024-05714.html
That is a 3.2GHz and 2.93GHz desktop CPUs with non ECC desktop memory, server memory will be slower and SPECfp does not scale at 100% with CPUs, even the much stronger glued IBM Power6 CPUs. I7 based Xeons won't get 2X the 1P i7 2.8Ghz score. But scaling to 1P 2.8GHz i7 Xeon EPs from the above two scores gets 82.3 - (86.1 - 82.3)*((2933 - 2800)/(3200 - 2933)) = 80.4. Using Opteron scaling from 2P to 4P as a basis, 80.4 * 210 / 118 = 143 less what the slower ECC registered memory drop will be. I doubt it will show more than 130. Far less than the claimed 160 (likely some silly 2x estimate from a 1P 2.8GHz score).
Still that isn't too bad, being 130 over 118 assuming that getting cache coherency to work won't drop scores too much (none needed with 1P being all on one die). 10% more score for 4% higher clock. Although by the time i7 Xeon EP shows up (currently thought to be Q3/09), Opterons will be on a new AMD server chipset and be running HT3.x speeds and dual HT links between the CPUs on the 2P configuration. That might make up all (even possibly more) of the difference in SPECfp_rate2006 results. Also the Opteron SE variants will be out by then which would get to 2.8GHz (238(6,7) & 838(6,7)) or even 3.0GHz (239(0,1) & 839(0,1)). And AMD might decide to pull in DDR3 into 2009. The advantage would be Shanghai well over i7 Xeon by then.
BTW the top 4P Opteron 8384 score is 210/188:
http://www.spec.org/cpu2006/results/res2008q4/cpu2006-20081024-05686.html
Pete
Chipguy:
Have you seen the Anandtech 2P serving benchmarks? They show that Shanghai does better than Harpertown Xeons at performance per watt and even just performance at higher loads. And this from a site that gives most breaks to Intel.
You are the one living in fantasy land. If you would actually read the tests and try to understand what they are measuring, you wouldn't stick your foot in your mouth.
Funny how all 4S quad core Xeons lose in SPECint_rate2006 versus Shanghai. And even with 50% more cores, Dunnington loses against Shanghai in SPECfp_rate2006.
Next, where are the 1S, 2S and 4S Nehalems? 2S, 4S and 8S Shanghai systems have been shipping for weeks prior to launch, while multisocket Nehalems are nowhere to be found.
So oil must drop more than $5 every day and the moon must be always full in your fantasy land. Your ridiculous post is already invalidated.
Chipguy, how wrong you are!
Every Xeon system with 16GB of memory has a lower result than a 2S Opteron 2384. The Opteron 2384 system got 860 while the best Xeon (2.8GHz) only got 854 using IBM's JRE vs Opteron's BEA JRockit. The highest Xeon (3GHz) using BEA's JRockit was the 778 from HP's DL180. The only higher result Xeons were using a desktop chipset, not a server one, using DDR2 in some form in a 4GB or 8GB configuration.
Of course to match using the faster lower power desktop memory, wait for Opteron 137x and 138xs or Deneb based Phenoms with PC2-6400 and PC2-8500 memory. As for DP Nehalem, it has slipped into 2009. How far is anyone's guess at this point. When, and if, it arrives, we shall see at that point. At least Shanghai is a hard launch with systems arriving in customer's hands before the official launch.
That was a nearly month old post (10/02/08 - 10/30/08). Recent Shanghai report said that HT3(.1) was activated in Shanghai shown to resellers coming out real soon. A lot can change in 28 days.
For a supposedly smart guy, there was no mention of percentages of COGS for the various suppliers and no mention of Intel in those parts in the last 10K of SGI. They did mention that Intel Xeon based Altixes were sold starting Q4/07 (a year ago according to their FY). They also stated that overall core products were $109 million including Xeon based and Itanium based for FY2008. Of that $79 million was for shared memory systems. 10% of that isn't much. Xeons probably got a much higher ratio for their $30 million since those were clusters. So Xeon CPU revenue from SGI likely was higher than IPF CPU revenue especially considering that the chipsets, etc from the latter might also come from Intel. Certainly none came for storage products nor service revenue for the overall $354 million total revenue.
Also they lost $3.03/share in the last quarter and $13.55/share in FY2008. At a $153 million a year loss rate, their $142 million of stockholder's equity won't last the year. They will be bankrupt.
Either way, HPC is a drop in the bucket compared to IPF commercial systems.
Again your numbers and their interpretation is of very poor quality.
Think again fantasy boy, of course HP has to say such things given that they want to keep their old PA-RISC customers. They don't want them to go AMD64 to Opterons and Xeons because they would lose quite a bit of their high margin service and hardware business.
Every time I hear of a large HPC bid, the customers said, Intel gave away (donated) their IPF CPUs for it. So they aren't getting your claimed 40% CPU revenue for HPC server contracts, Intel gets 0%. If they didn't, the vendors wouldn't sell much IPF into HPC because Opetrons and Xeon give much higher bang for the buck at street and list prices. And Intel needs the volume and mindshare or else they would sell even less IPF into the commercial world. That is where they get the bulk of their IPF CPU revenue. They also can take 38% of the full list price off of their taxes for the CPUs donated. That is likely higher than their processing costs. The implied revenue coming from their cash cow AMD64 business.
In the commercial world, Intel gets less than 10% (HP gets a hefty margin on top of that even with 60% discounts) for CPU revenue and when bundled software is included, far less than that. HP lives for the high margin hardware (chipsets, glue, chassis, disk, memory, comm, etc), high margin software and pricy integration and service support. Its there to make HP money, not to pay a lot of it to Intel.
Don't like what that does to your rosy vision of the IPF market, TOUGH! That is the real world, not some ivory tower view of some what should be fantasy land.
Yet Intel does give away CPUs to HPC clusters and you don't take it into account. And it isn't just a few hundred a quarter that they do it for.
The real money is in the commercial segment and there they almost always use a RDBMS like Oracle or DB2. Since that is bundled in software, the CPU portion of server revenue is tiny. Ask Dan3 at SI. He doesn't want to go to quad core CPUs because the extra two cores costs his company $30K for the extra two cores worth of software license fees (plus the extra $6K in annual license and service fees). And that is per socket. Now if the server is virtualized, that might be covered by the extra servers that don't need to be bought and the license costs for them.
However at $30K a populated socket, even a cost of $9-10K a CPU is buried underneath the socket software costs. At $1K a CPU, its the BMOC. Until software licenses are again priced by CPU performance instead of per core, this will be the drawback to going the multicore route. And as of now, likely pushes the use of SQL and database accelerators to go around this overriding cost.
Thinking otherwise puts you into fantasy land, while I am in the real world, complicated and nasty as it is.
No one pays anything to Intel as they donate the CPUs in a HPC cluster. You are the one who thought the CPUs in a blade were 40% of the price, not I. You still have nothing to back that claim up.
When one buys servers in bulk, one gets discounts on all of it and the ratios still stay the same. Ideally for most HPC clusters, one gets refurbished chassis, HDs and adds the cheaper CPUs and 3rd party memory. Thus the CPUs are at the bottom end, yet deliver more performance per buck. Those CPUs still garner only 10% of the system cost because memory is maxed out (usually costs more than the CPUs they are attached to) and a good deal of the HPC server cost is in the ethernet/IB switch.
Besides, if you go for that type of HPC cluster, you are better off with commodity AMD64 hardware. Opterons and Xeons are cheap compared to Itanium on a performance per buck basis. The only real niche Itanium has in the HPC market is in the large sockets in a single image market. And that requires lots of expensive glue. There even you admit, the percentage of total server cost in CPUs is low.
Lastly, when IDC says that bundled software is in their server revenue, that includes the OS licenses per core and the typical RDBMS licenses and cost as well. The latter is usually far more expensive per year than the CPUs are. Oracle for example charges $13K per CPU core for their Enterprise Edition with the typical options. On a dual core Itanium Montvale, thats $26K for a $1K CPU. Annual support payments come to 20% of that, thus $5.2K per year versus a one time CPU cost of $1K.
Even if the customers pull with Oracle and HP gets a 60% or so discount over the entire server cost, the CPU portion going to Intel is still quite small. So using IDC's server revenue numbers, looking at HP's list prices for the CPU revenue percentage and multiplying them together is using two vastly different bases for your final CPU revenue estimate. Thus your estimate of Intel's IPF CPU revenue is just plain garbage.
So you see what it costs a VAR to build a samll white server box. While in the AMD64 world, that might be close to the final cost to the customer, VARs have to make money and they do it on integration costs that the customer still pays. And the necessary software add ins including the OS and RDBMS. They get a cut of those from the OEM and RDBMS maker. Plus the bundled in service contract. That all adds to the server cost and Intel gets none of it. How many VARs have you worked for?
I know how much the VARs I worked for paid for the hardware and how much we got for that hardware and the total price the customer paid. It was not unusual to pay $500K for hardware we sold to the customer for $1M on a total contract price of $12M. For an HP box with PA-RISC CPUs the CPUs fetched $38K for 4 of them (server CPUs were expensive back then) and that included the boards they were in, quite a bit of the glue required and the service and support bundled in. The CPUs themselves were likely 60-75% of that. The customer paid list price minus 40% and still was charged $60K. Those servers had max memory and quite a bit of disk. Also included were 6 high performance RF transcievers and associated hardware (3 each in two servers). Thus the CPU revenue from that $1M for 2 servers was only 2.5%. I doubt if the economics changed much with Itanium.
IPF gets its big SPECint_base2006 scores from its huge caches. Without them, it does poorly. And while base maybe ok for Xeon, peak is a better handle on non SPEC special compilers.
BTW SPECint_2006 scores for the top three are 17.0, 14.7 and 11.1 for a 18MB L2 Itanium 2 9140M, a 8MB L2 Xeon E5310 and a 2MB L2/2MB L3 Opteron 2344HE. Of course current flagships of each are 17.0 for Itanium 9140M (1.66GHz/18MB), 30.3 for Xeon X5470 (3.33GHz/12MB) and 16.2 for Opteron 2360SE (2.5GHz/4.5MB). The Xeon L7345 (1.86GHz/8MB) gets 17.0 as well. So 12% faster clock equals 1.5 times the cache. Of course most of the gains in each over the other is how well they do on the small footprint 162.libquantum. Pull out that test and they are much closer together. And the other thing is that relatively few AMD64 class CPUs are tested at the lower clock rates while Itanium maxes out at 1.66GHz.
Besides clock doesn't matter as much when you switch between architectures. If you want performance per clock, the current GPUs flattens EPIC to a smear. 1.2Tflops at 750MHz beats 13.3Gflops at 1.66GHz. Some would argue that performance per die area is a better look at an architecture. There the IPF falls down too. The real reason IPF is 1 or 2 prcoess generations back is that it needs those large caches to do well and the resulting huge dies need mature processes to have decent yields. Besides its revenue stream can't afford process development. Not true of AMD64 CPUs both Opteron and Xeon.
The 2 socket blades require a blade rack plus the OEM markup on the CPUs themselves is over 100% (over double the supply price) Thus 2S rack costs $13K with 4GB of memory and has less than $1K of CPUs in it (2 gets $14K with $2K of CPUs) for about 7.5% (14% with 2) (Thats a 9010 1.6GHz/3MB L2):
http://www.google.com/products/catalog?hl=en&q=Itanium+2+server+1.6GHz&cid=9113511524447458734#ps-tech-specs
Fairly far from your 40% estimate. And that is before adding HDs, comm boards, service and bundled software. With the servers, a 4S one goes for $42K (street price) with 4 9010s installed with 8GB memory and 73GB HD with the CPUs being less than $4K of that (HP charges $12K for 4 9010 upgrades at list). That is under 10%.
Frankly at this level of number quality of yours, no wonder your calculations are so far off. You look like you think HP pays list prices to Intel for CPUs. Far from the truth.
I have had lunch with a senior HP engineer who worked
on Merced and talked with him for an hour about what
went wrong on that program. And I went out to dinner with
the Intel engineer who, along with another Intel engineer,
designed the one cycle access L1 dcache in McKinley,
a big factor in this chip's large performance advantage
over Merced despite being made in the same process.
And still performance underwhelmed what the AMD64 CPUs did. VLIW processors should have been faster than they turned out to be according to the white tower intellectuals. Trouble was that the same compiler improvements helped the CISC and RISC designs to more than succeed against the static scheduled VLIW ones.
Yet more fantasy. In 2003 Intel had $100m in IPF MPU
revenue as reported by iSupply. IPF server revenue for
the year was $479m as reported by IDC. In Q4 2007 IPF
server sales were about $1.4B or about 3 times higher
than all of 2003. That implies Intel Q4 IPF MPU sales
were close to $300m. In contrast Mercury's numbers for
Q4 2007 show AMD's server MPU revenue as $170m.
As best as I can tell the break down was approximately:
AMD server: 570k MPUs @ $300 ASP = $170m
Intel IPF: 150k MPUs @ $1800 ASP = $270m
Same old tired assumptions of yours that have not proved out in the real world. IPF servers have more devoted to things like memory, I/O, disk and such infrastructure on a $ basis. Server revenue can't be back figured into CPU revenue so easily. "Implies" isn't proof of MPU revenue and you very well know it. Especially as the server size goes up. There the percentage of the total server price allocated to the CPUs goes down. And that doesn't cover the freebie CPUs given to many HPC installations by Intel.
Here is IDC's fine print:
IDC's Server Taxonomy maps the eleven price bands within the server market into three price ranges: volume servers (servers priced less than $25,000), midrange enterprise servers ($25,000 to $499,999), and high-end enterprise servers ($500,000 or more). The revenue data presented in this release is stated as factory revenue for a server system. IDC presents data in factory revenue to determine market-share position. Factory revenue represents those dollars recognized by multi-user system and server vendors for ISS and upgrade units sold through direct and indirect channels and includes the following embedded server components: Frame or cabinet and all cables, processors, memory, communications boards, operating system software, other bundled software and initial internal and external disk shipments.
That is straight from one of your posts in the Anandtech Forums:
http://aceshardware.freeforums.org/hp-looks-poised-to-overtake-ibm-in-server-market-t295-45.html
From the above, you cannot take IPF CPU revenue from IPF server revenue. You see as servers get bigger, more percent are in the other than processor areas. Opterons and to a lesser extent, Xeons, have this ability to have simple plug in and go upgrades. Those are mostly all CPU and labor revenue. And are a larger proportion of AMD64 server sales, especially Opterons. Many users simply plugged in quad core Barcelonas into dual core Windsor based servers. Boom, a upgraded server with twice (or more) the CPU power for only the cost of the CPUs themselves and the small amount of labor required. And a large part of Barcelona server sales.
And the last thing about server sales is that the amount allocated to the processor(s) also includes heavy markups of same. Its not atypical for HP to double the cost of the underlying CPU in the CPU price charged to the buyer. This is less done at the AMD64 CPUs because of the heavy competition in that market. So a $2000 HP IPF CPU charge means $700-800 for the IPF CPU itself, $50-100 for the HSF and card used and the rest for integration charges. So using the street price for Opterons and the marked up and added ancillery costs OEM IPF CPU price is comparing using two vastly different looks. And typical for your calculations. Mathematicians, scientists and engineers know the errors inherent in shifting bases (foundations, views) during calculations. It almost always leads to garbage results.
BTW, AMD64 CPUs covers the Xeons too as EMT64 was copied straight from the AMD64 manuals (with the same errors in the older manuals which is usually enough proof of plagurization). As I said, AMD64 CPU revenue stomps IPF CPU revenue into thin road veneer. I notice that you don't want to show the x86 server or CPU revenue (which is almost all AMD64 at this point).
Annual IPF server sales have increased by about $1B a year for the last five years and in a few months Intel will release the most competitive IPF processor ever WRT to its RISC and x86 contemporaries. And everyone will finally see what a *well designed* "native" quad core 65 nm processor can really do. :-P
One that comes out well after the competition did? What happened to your claims of 32nm IPF CPUs? Problem is that AMD will be nearly all 45nm or less by the time the 65nm IPF arrives. IPF has to hang on to the tail of the AMD64 CPU cash cow and couldn't make it on its own. And won't for the foreseeable future.
BTW, I notice the multi socket QPI CPUs (Nehalems) are pushed back into H1/09 amidst reports of QPI problems. That will likely include IPF CPUs using QPI. AMD reports that it will be 100% 45nm (or smaller) by the end of Q2.
Face it chipguy, Merced was a failure. It took HP engineers to make a viable Itanium CPU and even then, it wasn't that good compared to previous hype (including yours). Sure big pockets can help it keep alive, but "lowly" Opteron still beats it in absolute revenue (CPU revenue). AMD64 servers (both AMD and Intel) far exceed it in absolute revenue. Even the growth of them in incremental dollars exceeds the Itanium CPU overall revenue. In fact without the AMD64 servers and their cash cow revenue, Itanium would fail for the inability to keep up in R&D costs both CPU and process.
Soon the viability of Itanium will be questioned even with Intel backing and it will die off as its niche shrinks to virtual nonexistence as Power eats it from the top, down, and AMD64 keeps chewing on its bottom, up.
Opteron will be in 32nm SOI before you hope it will get there.
Ephud:
90nm was late, because Prescott was late, but you knew that. As usual your posts are worth less than nothing.
Pete
PS: Dothan shipped later than Prescott was supposed to, showed it was a design flaw with P4 rather than the 90nm process. But like so many Intel boosters, problems are forgotten like Merced.
VBG:
Easy, 90nm process (although it was a design problem with the CPU to use it) slipped.
Pete
VBG, you hope that will happen. But as we all know, things tend to slip and be cancelled especially a few years out. Saying that it won't happen to Intel, belies history. And as you know, those who forget history, tend to repeat it. Although with TFC being split off, AMD might not slip anywhere near as much as before because of Intel revenue destruction. They are going to 40nm SOI for Fusion next year. The half step down to 32nm won't take as long as you seem to think.
Your predictions are noted. I won't be crass and fling them back at you when they fail (unless you do it first).
Pete
Funny how samples are here and many third parties have them. AMD, unlike Intel, does like to have sufficient quantities available before launching them. Sometimes some retailer jumps the gun and sells them before AMD launhes it. I have heard that AMD is shipping Shanghais and Denebs, but Dirk likely will comment on that in the CC at some point.
At this time, no retail house is selling either one.
Pete
Smooth2o:
You hope it will be late. 45nm at AMD is in production now. Samples came out in Q3 and will ship this quarter. While Intel may have shown 32nm memory dies on a 300mm wafer at the Fall 2007 IDF, CPUs usually take 2-3 years after that. IBM/AMD/etal has shown 22nm memory dies in August of 2008. Both Intel and IBM/AMD/etal said that they will have 32nm CPUs in 2009, but it will more likely be in early 2010. But since 22nm showed up in 2008, IBM/AMD/etal will like be at 22nm in late 2011, a year before Intel.
Especially since Intel said they would go to 450mm wafers for 22nm in the 2012 timeframe. Trouble is I don't think without heavy investments by customers would equipment makers do 450mm wafer equipment research. They think 2014 is more likely given that there are no 450mm scanners currently, even as one off prototypes. And that was before taking the current financial crisis into account. With it, its hard to think such investments would be likely and plans would be thus be pushed out.
Besides, AMD has said that 40nm SOI is planned for mid 2009 for Fusion. And 2010 for 32nm. TFC may make it even faster as 2010 has production ASML 80-100WPH 13.5nm 0.35NA EUV scanners being put in Fab 4x Malta, NY. That means production 22nm CPUs from TFC in 2011. That makes sense as 45nm in 2008, 40nm in 2009, 32nm in 2010 and the jump to 22nm in 2011 (AMD may do 27nm in 2011 with 22nm in 2012 to boost yields, but IBM doesn't need high yields for Power 8(9?)). Of course such an event as AMD having a smaller process than Intel at any given time would make Intel boosters apostastic. Thats heresy to them and evidently to you.
Pete
That's Intel hopes talking. Likely TFC will be on 22nm SOI using EUV in Malta in 2011. That's when the shit hits the fan at Intel Fabs.
Wbmw:
Sure you are lying because the EE C2Q CPU power usage didn't include the chipset, VRMs, 2GB DDR3/1600 memory and all the other stuff on the MB running 3DMark06 HDR/SM3.0 in software. That is the CPU on its "board". They didn't look at just the 2 R770 GPUs, they looked at all that was on the GPU board's total power.
Absolute power is meaningless unless you look at performance too. The CPU running 3DMark06 HDR/SM3.0 in software uses way more power than a 790GX IGP with a 1GHz Sempron running the same thing and it performs far slower than the 790GX IGP to boot. That is a big condemnation against EE C2Q CPUs being used for GPU loads. Does it do better with CPU loads? Doesn't matter as GPUs and their loads were being dicussed.
Besides given the performance, that test is still CPU bound with the 4870x2 as the the tests showed that it does far better relative to the GTX280 at 2560x1600 than at the tested 1600x1200. Perhaps both power usages would have been higher, but the performance per watt would have been higher for the 4870x2. Most of the game tests by all the reviewers showed that the 4870x2 really didn't stretch its lead until resolutions went to 2560x1600, where many times it was the only one to produce playable frame rates.
If you've looked at the last few generations of GPUs, they've all been increasing the high end power envelope in order to push the performance bar higher. That's not a sustainable strategy, no matter how much performance it can deliver.
If you have looked at the last few generations of CPUs, they've all been increasing the high power envelope in order to push the performance bar higher. That's not a sustainable strategy, no matter how much performance it can deliver.
While it's true that the CPU is not going to run 3DMark06 as well as a GPU, it's also true that the GPU won't run the majority of general purpose applications as well as the CPU. The CPU is the only general purpose processor in the system, no matter what nVidia and ATI marketing would like you to believe. Amdahl's Law works for the GPU, just as it does for the CPU.
But the 4870x2 will only be used for GPU and GPGPU work, read HPC of very parallel loads. They don't check highly serial loads that push against Amdahl's Law. In that area the EE C2Q doesn't do well either given its high power usage. It loses against a fast single core Athlon 64 using far less power because of the high latency to memory of FSB attached chipset memory.
Like I originally stated, we were talking about GPUs and their loads, not GP CPU work. That is your red herring or irrelevancy. In the GPU area, GPUs are the correct tools for the job. Even for highly parallel loads like typically found in the HPC area. There a 2.4Tflops processor even at 300W is quite efficient in a Tflops per watt basis. Even the 1Tflops GTX280 is efficient, just not as high. The 25.6Gflops EE C2Q is nowhere even close at over 200W (all using SP). The CPU gets a little higher when DP is used with the 4870x2 getting 480Gflops below 300W, the GTX280 getting 100Gflops below 200W and the EE C2Q getting 12.8Gflops over 200W still far behind.
The well used engineering rule is "use the right tool for the job." For GPU loads, that is the GPU. In the HPC area, it depends on just what the task is. But I suspect many will move to the GPGPUs as they deliver Tflops class performances for under $1K and under 1KW usage. The CPUs will be more used for control than HPC application work because they are better at it. Long term, I don't know whether loosely coupled (attached on die APUs) or strongly coupled (VPUs alongside FPUs inside CPU cores) will win out. I suspect that loosely coupled will win over the short (<4 years) and medium term (5-10 years) and strongly coupled with take over slowly after that in areas that APUs are used nearly always.
As for useful performance of GPUs, I think that the growth will slow, but much more slowly than CPU performance growth will over time. The resolution and polygon counts needs to climb quite a bit, but as time has shown, that grows rather faster than tasks do. APUs for graphics, construction (flesh out trees, buildings, etc.), physics, collision detection, simulation, etc. will continue to take loads off the CPU during games to add realism while taking into account the slow growth in available GP CPU performance. We are a long way from artifact free virtual worlds that we can interact both visually and aurally without disbelief rearing its ugly head and tainting our enjoyment. We have come a long way though from the early days of Pong or more recently, Castle Wolfenstein.
Pete
Wbmw:
Your are lying to yourself, in error and complete fantasy land since we are looking at GPUs, not CPUs. The loads for GPUs are embarassingly parallel. When you compare CPUs to GPUs in the GPU's forte, then you had better use GPU loads. And there, EE C2Qs are hopelessly far behind. That is why even a small IGP can blow past the fastest CPUs when it comes to GPU loads. Go ahead try running most current games under Vista with only a framebuffer for the video side with the CPU having to do all the work of the IGP. Then play any demanding 3D game, even 2-4 year old ones. Turned into a slow slide show. With older IGPs, it turns into something with motion. With top end IGPs (Intel ones aren't good enough), it becomes quite playable. The current flagship GPU, the 4870x2 runs them as fast as the CPU can handle the non GPU portion. In short, the bottleneck becomes the CPU, not the GPU. For those games where the frame is quite large (2560x1600), HDR is on (48 bit color) and the eye candy is maxed where even the 4870x2 slows down, the EE C2Q CPU using only a frame buffer likely will not display a frame more than a few times a minute.
Then compare just power usage in watts which is what your ridiculous post did. And I just showed the world comparing power usage without looking at performance or including the same things was incredibly stupid. You either should have known that or you have no business posting here at all. And the subsequent posts just showed more of that ridiculousness.
Pete
Wbmw:
But HDR/SM3.0 3DMark is an extremely easy parallel load, one that can be loaded with 1600 stream processors, 40 texturing units and 16 ROPs. And that load is far more than 100 EE C2Qs can do at the same performance of a 4870x2. Windows at idle still has to redraw the screen at 60-120 FPS with effects. That load is not that serial either. So Amdahl's law doesn't apply very much. And that is quite standard for GPU type loads. Besides, if the vertex or shader program was completely serial, its applied to each vertex or pixel and how many are on a screen? Far more than 1600. Thus Amdahl's Law doesn't apply. If you didn't know that, you have no business posting about performance, absolute, per watt or per dollar.
Besides you have to prove that an EE C2Q including chipset and memory on a totally serial task uses 1/200th of the energy per task completed of a 4870x2 card. Do know that even on a totally serial task that the 4870x2 runs at 750 million instructions per second, lets see that a EE C2Q core even getting 75 billion instructions per second. Or even with the power difference at that serial task. I extremely doubt that an EE CPU on a totally serial load would have even 5 times the performance of a 4870x2 at less than 1/20th of the idle power of the 4870x2 or about 5W total system power (EE C2Q CPU, memory, chipset, VRMs, etc.). Given that, you made a silly conjecture. Typically silly as was your previous post.
Pete