Register for free to join our community of investors and share your ideas. You will also get access to streaming quotes, interactive charts, trades, portfolio, live options flow and more tools.
Register for free to join our community of investors and share your ideas. You will also get access to streaming quotes, interactive charts, trades, portfolio, live options flow and more tools.
10,000 ASICs costing $50 each (my WAG) amounts to $500,000. That's not including
R&D, but then again it's pointless to try and break out how much of the total R&D was
dedicated to this ASIC alone because this supercomputer is one big custom job anyway.
Holy sh!t are you kidding me? Have you ever priced ASICs for ~10k lifetime purchase? LSI
Logic told one of my previous employers to get lost when we RFQ'd for an ASIC with higher
minimum volume than this. This ASIC ain't no 845 chipset that sells in the 10's of millions
of units per quarter.
Your figure barely covers mask cost for a 130 nm ASIC. I doubt IBM would charge less than
$500 and probably a heck of lot more. I know of a telecom firm that did a big router ASIC
through IBM Micro, over 1000 I/O, and that thing cost them over $2000 each, although I
don't know what IP that included.
So the performance of one individual Itanium II is only partially important. Look at the slide 31,
where effiency drops to the 20% of cpu.
That is a generic MPP scaling issue.
Please no conspiracy here. Just Opteron is better than Itanium II, plain and simple.
Apparently not for commercial HPC products. Compare the performance and
scalability of existing Opteron systems to their Madison based equivalents. Of
course that's hard when you go over 4P.
The fact that each Opteron talks to its own router ASIC tells us that the glueless MP
features in the Opteron were useless for Cray's application. The only advantage the
Opteron has is the integrated memory controller and lower power dissipation. Future
130 nm IPF processors largely eliminate the Opteron's power advantage but those
either weren't available in time or weren't considered. Speaking of availability, protos
of both Madison and Opteron would have been available to Cray for evaluation at
the same time. It is curious that Madison wasn't in the comparison table.
On balance, I think the integrated memory controller was the feature that won it for
Opteron. Adding the pins for a double channel DDR interface to the router ASIC
might have been quite problematic. In addition, Cray and SGI are competitors.
With SGI offering the IPF based category killer in mid range HPC, Cray
probably thought it wise to get design experience with the *other* merchant
64 bit processor.
IBM has been positioning itself as a competitive fab house for some time now, so I'm
not sure why you think they are most expensive than other quality shops.
Perhaps from knowing the results of vendor replies to RFQs for many IC projects
over the years. Next stupid question?
Also, I think it's interesting to check out slide 53, which lists the relative performance on Sandia's apps. I
find it surprising that Power4 and Itanium 2 both receive lousy scores on this benchmark, but it is nice to
see that Sandia did their due diligence here. Based on other HPC apps that I have seen, I would have
thought that Power4 and Itanium 2 would achieve close to what EV7 could get, but on the other hand,
applications are not all created equal. There are simply some issues in the micro-architecture of a CPU
that will allow it to perform faster or slower in some situations, and Itanium 2 is clearly not the solution for
this project (at least, not without some major fine-tuning of the CTH and Alegra applications).
I was quite frankly shocked by the processor comparison data in this presentation. Clearly there
are factors here, real or artificial, that don't allow all the machines to perform up to their potential.
The EV7 is capable of only 2.3 GFLOP/s peak, the Itanium 2/1000 4.0 GFLOP/s peak. The huge
disparity presented here doesn't pass the smell test. I hate to sound like an conspiracy theorist but
it appears that the Sandia team cooked the tests. Why were I2 tests reported as "about the same as
P4@2GHz" instead of an actual number? Did they even bother to benchmark it? The national labs
are traditionally big Alpha boosters and this machine was architected in the bittermess and after
shock of Compaq killing off Alpha in favour of IPF. Anyway, this is a one-off custom machine. Far
more HPC work will be done on just the Altix 3000s that sell by the time Red Storm is operational.
Alright, ring topology is out of the question, but there could still be some cross linked connections
between the CPUs. After all, it's kind of silly to take away the advantage of Opteron's on-die interconnect
by putting another piece of hardware between each and every CPU and leaving each one stranded.
Not if you don't like the way multiple Opterons talking directly to each other implement cache coherency.
Presumably by IO count you mean pin count?
Yes. It has 7 high bandwidth bidirectional links.
Why do you think this chip will be so expensive?
1) It is fab'd by IBM Micro.
2) The ASIC uses at least one HT link - that is an expensive and, for now at
least, non-standard thing to test. And the necessary phy macro likely wasn't
already in IBM's IP porfolio. IBM probably designed one for their process as
part of their NRE for this ASIC and that is usually very expensive.
3) Designing and verifying a high performance CC router is a very complex
piece of work. Consider how long the EV7 was in development and they
basically reused the EV6 CPU core. Worse yet, the design, fabrication, and
verification costs are spread over only 10k units plus spares.
As I see it, those Opterons will need to have multiple HT links to meet the bandwidth requirements
of a 10,000 CPU system. It's ridiculous to assume that there is a custom ASIC attaching to each and
every processor in the mesh. That would be silly, and a complete waste of hardware. There will almost
certainly be multiple processors on each ASIC, and they will need to connect either by ring topology or
some other type of cluster.
I think at this point you are being obstinate beyond reason. The architecture of the machine has been
clearly defined as a 3D mesh 27 x 16 x 24. You need at least 6 links in your switching fabric element
to support a 3D mesh organization and the ASIC clearly fits the bill. One of the Red Storm documents
UpNDown referenced clearly states "Each compute node processor has a bidirectional connection
to the primary communication network". There are no rings or clustering as you suggest.
WBMW, No soup for Apple! soup Nazi? Intel :)
You mean the vendor with the best products, so good that its customers will jump
through any hoop and suffer any humiliation just to have? I wouldn't quite go that far. :-P
That puts the cost of CPUs at about $2.5million, ie not all that significant in the context of a $90m project. I
don't think a discount was required, Opteron just naturally isn't all that expensive.
I have to agree with your general point. The large, high I/O count system fabric ASIC made
by IBM Micro is likely many times more expensive than the Opteron it connects to.
Imagine dividing the N CPUs in a large scale machine into two groups of N/2
CPUs each. The division itself has nothing do with the physical organization
of the machine, it is a purely arbitrary "dotted line" that divides the CPUs into
two equal groups. Bisection bandwidth is the total bandwidth possible across
the "dotted line" for the worst case division of CPUs.
Thanks for the correction, but it seems that the actual implementation probably has a number of node
vertices connected in a cube topology. I'm thinking that six out of the seven links in each ASIC will
connect between such vertices, with the seventh branching out to a cluster of CPUs. Therefore, the
max number of hops is going to be primarily from node to node, not CPU to CPU.
That is a reasonable assumption to control costs in a commercial product but that doesn't seem to be
the case here.
27 * 16 * 24 = 10,368 ASICs
It seems that this bloody great thing has one ASIC per CPU. To paraphrase the billionaire in "Contact",
why would the government buy a computer with 4 CPUs per ASIC when they can buy one with 1
CPU per ASIC at four times the cost in custom silicon. ;^)
YB, you need to stop using milliseconds when the correct measurements are in microseconds. That's us, not ms.
That was confusing. The only time I want to see ms used with relation to computer hardware is disk
performance and DRAM retention time, otherwise something is severely broken.
In the worst case, if the topology
really is a 27 x 16 x 24 cube, then one processor communicating to another on opposite corners of the cube
will need 27 + 16 + 24 = 67 hops in total. As Chipguy said, that is 136us
If the design is a traditional hypercube then the six "faces" connect to their opposite. E.g. top to bottom,
left to right, front to back. That is why the maximum hops is 27/2 + 16/2 + 24/2 or 34 hops. If you are
more than half a dimension from a processor then you go the other way and loop around. My 136 us
figure was for a worst case *round trip* operation (e.g. remote read) assuming 2 us per hop. The worst
case message passing delay is half of that or 68 us.
Opteron is an example of a processor with a 3 dimensional architecture, Athlon64 has a 1 dimensional architecture.
No. The EV7 supports a 2D mesh because it has 4 interprocessor links (and a fifth auxillary
link for I/O and bootloader etc) and the necessary router logic on chip. To support a 3D mesh
you need 6 data links and that is what the Cray ASIC has (6 + 1 auxillary = 7).
On its own the Opteron isn't even true 2D because it only 3 links.
YB, Re: The single-hop latency is 2 ms!
I think you mean 2us.
Yes and even that is very unimpressive for a 130 nm ASIC. The 180 nm Alpha EV7 single hop latency
is more than a hundred times less - 18 ns. EV7 systems only scale up to 256 CPUs but a factor of
hundred for routing logic to go to 10k CPUs? Give me a break. That gives a worst case P2P round
trip delay of 2 * 2 us * (27/2 + 16/2 + 24/2) = 136 us. So where is the EV7 team led by that Peter
Bannon fellow working now? Can anyone refresh my memory?
Saying that Consistency is "Foolish", sounds to me like a very strange thing for a Process
CONTROL Director to say. Process Control by it's very definition is all about Consistency.
My thoughts exactly. It like the head of NOW saying she approves of
wet T-shirt contents because, after all, boys will be boys.
Hey Semi,
I think you'll get a good laugh out of this:
Critics say the Intel approach is overkill. At Advanced Micro Devices, Intel's
biggest rival, engineers say the copy-exactly process stifles innovation.
"Foolish consistency is the hobgoblin of little minds" says Thomas Sonderman,
director of advanced process control for AMD. He describes his company's
process as "copy intelligently". Mr. Sonderman says AMD prefers to innovate
and introduce cutting-edge equipment right on the factory floor. Copy-exactly
systems "are slow moving beasts. They are like aircraft-carriers", he says.
"We have changed the aircraft carrier into a destroyer. We can quickly change
to what the market demands".
http://www.realworldtech.com/forums/index.cfm?action=detail&PostNum=1495&Thread=1&entryI...
Whatever happens, it's probably the most exciting competition going on in the business world today.
AMD vs Intel is like watching a dog chasing an accelerating car. Even if the Intel car stalled the AMD
dog wouldn't know what to do when it caught up.
More exciting competition in the business world is Boeing vs Airbus, GD vs LM, or even Dreamworks
vs Disney. And in the second half of this decade, the IBM vs Intel showdown.
I dont think AMD cares at all about Itanic - leave alone positioning Opteron against it: Which
position would that be?
Itanic? Is that anything like an Itanium 2? Name calling reveals far
more negative about the one who does it than the object of derision.
As far as positioning, the modest FP performance and scarcity of major
OEM support takes it out of running in the RISC replacement market.
AMD tries to position it as a Xeon killer but AMD's precarious financial
situation and lack of track record hinders it greatly.
According to a few individuals here Opteron is the gaming chip of choice
for wealthy teenagers. That's probably the most credible positioning to
achieve short term sales traction given the difficulties the chip faces but
it may exacerbate AMD's image problem in IT.
Intel has a big brass butt. The world is its china shop.
Considering the forum it would be more appropriate to say AMD enthusiasts's
dreams grow like grass but Intel is a hyperactive lawnmower set to mulch.
I'm curious. With this software, will Intel remove the IA-32 functionality from the Itanium core?
That's isn't likely to happen until a third generation IPF core at the earliest. The x86 box is seriously
wired into a critical section of the chip and it would be a non-trivial job cutting it out and reclaiming
the area to reduce rectangular die size. Intel would also probably leave it in there anyways as a
backup for OSes that don't support the software based compatibilty scheme.
Was there any data on how much die space this occupies?
Based on die photos It appears to occupy about 24 mm2 in McKinley and 15 mm2 in Madison.
Intel's 32-bit on Itanium Preview to Come in Windows 2003 SP1 Beta
That should give MS and Intel plenty of time to shake down any bugs or performance
potholes in it before the low cost, moderate power 90 nm Deerfield follow-on appears
in 2005. See slide 37:
http://www.lanl.gov/orgs/ccn/salishan2003/pdf/golliver.pdf
Seems to me that intel got the ones motivated to continue, and AMD got the pissed off,
disgruntled ones. Which would you rather have?
Or in the case of Jim Keller they got a motivated key designer of the EV6 but then pissed
him off to the extent that he left and joined SiByte where he helped design a dual quad
issue GHz class MIPS device with a big load of peripherals that burns only 10W.
Orwell would have loved that. AMD ministry of truth anyone?
I wonder to what extent this is a blip due to pent up demand for Banias being satisfied
or an indication of a long term trend. If the latter then Intel couldn't have had any better
timing with Banias.
As I have pointed out before, PPC970 has a 16 stage pipeline and will top at 2GHz on their
130nm process, AMD can do at least 1.8GHz on their 130nm process. Ok, you can't really
compare the two processors straight up, but you'd generally expect a processor with close
to twice the number of pipeline stages to clock a bit higher, assuming roughly the same
amount of work per stage.
Keep in mind that about half the processor was created using logic synthesis of HDL,
code, the 970 can dispatch and retire up to 5 native instructions per cycle, it is only
118 mm2 despite having a 512 KB L2 cache, and it can run at 1.8 GHz at only 1.3 V.
All that, with only 6M L3 Cache. Or, will it be 12M by then.
6 MB L3. There is a 9M Madison on the roadmap but the extra 3MB
will probably improve SPECint scores by only 3 to 4%.
AMD does a better job on the speed of their SOI process.
LOL. AMD takes an existing basic design, adds two pipeline and SOI processing and
loses 400 MHz of clock speed and you claim that AMD does a better job on SOI than
IBM?
IBM has taken a 64 bit PowerPC processor design with 34m transistors (RS64) and
ported it from 0.22 um bulk to 0.22 um SOI and got a 22% clock speedup while keeping
the pipeline length the same.
Also, IBM has taken a 180 nm SOI design (POWER4) and shrunk it to a 130 nm SOI
(POWER4+/PPC970) and apparently got a 54% clock speed up. How much of a clock
rate speedup does Barton or Hammer represent over the fastest 180 nm Athlon? AMD
can't teach IBM anything about process or processor design.
Elmer, that would make some difference, although I'm not sure that those
improved integer scores will make it the top integer processor when released.
Let's revisit in three months, it will be interesting to see updated Itanium and
Opteron scores.
Why wait? The 1.5 GHz Madison is shipping now.
Opteron will need two speed bumps (to 2.2 GHz) and a fair bit of compiler
magic to match the current Madison in SPECint_base2k. Do you think
Opteron will reach 2.2 GHz before the leader moves ahead some more?
A little Fortune 10 IT bird told me that Madison will be bumped to 1.6 GHz
in late 3Q03 and willl reach 1.8 GHz next year. IMO the latter device should
put out around 1550 SPECint_base2000. Or in the 1600 range if annual
compiler upgrades continue. It would take roughly a 2.8 GHz Opteron to
match that. Of course AMD has yet to demonstrate it can reach 2.0 GHz. :-P
Also, would *you* want someone doing benches on your beta project?
Intel let anyone buy and run benchmarks on Merced. Ahhh, I see your point.
DIfferent system architecture, OS, and compiler. The same thing happened with
the McKinley. HP got 810 SPECint_base2k while SGI got 683. With Madison HP
gets 1319 SPECint_base2k in the zx6000 workstation and 1322 in their 2 way
server.
I think Mageek was joking there, but the 34% improvement seems a lot more realistic, so
Opteron may beat Madison by nice margin in SpecINT and come sufficiently close in SpecFP.
Hold on now. You seem to be taking an unproven relative performance gain claim for a compiler
that AFAIK has no previous official SPEC submissions and seem to making assumptions based
as if you are taking the Intel compilers as a starting point. For all we know it might need to improve
50% to match the existing Opteron SPECfp score that use the Intel compiler.
I would be a heck of lot more impressed by the "Portland group" if it didn't hide behind relative
performance gains with previous versions of its own compiler and threw out some real scores.
However, it seems that Centrino has become a much stronger revenue stream than
Intel had orginally forecasted, and has allowed them to take market share from AMD
and Transmeta in the mobile space. There seems to be the potential for an upward
surprise.
You might be on to something here.
http://biz.yahoo.com/djus/030702/0952000619_1.html
Dow Jones Business News
Laptop Computer Sales Topped Desktops in
May, Firm Says
Wednesday July 2, 9:52 am ET
By Donna Fuscaldo
NEW YORK -- Notebook computer sales surpassed sales of desktop computers for the
first time in May thanks to an increasing desire for mobility, according to a survey by
market research firm NPD Group.
Notebooks accounted for more than 54% of the nearly $500 million in
retail computer sales in May, the Port Washington, N.Y., concern said
Wednesday. That compares with January 2000, when laptops
represented less than 25% of sales volume.
All I can say is wow. If this is true and it is more than a blip it could spell a nice uptick
in Intel's ASP and margin. The Banias is a category killer in the mobile segment and
at 84 mm2 it is cheaper to make than an Athlon and probably gets 4x or 5x higher ASP
than Athlon.
Because no one is interested in IPF specifically for its x86 performance. I would have
thought the answer is so obvious it would have occured *even* to you. Apparently not.
The Alpha was a "me too" processor line? Do you know anything about computers?
Alpha was an original and a class act. But engineering elegence and being 3 years
ahead of competitors 5 years ago counts for very little with customers today.
The simple fact was that for the amount of support Compaq was willing to put into
Alpha development it couldn't demonstrate a cost or performance advantage over the
devices on the IPF roadmap. In that sense it was a "me too" processor. Compaq
decided it didn't need to spend several hundred million dollars a year to develop new
processor designs and then spend more money fabbing them at IBM to achieve no
better performance or lower system costs than it could by buying off the shelf IPF
processors.
maybe the reason 246 and its benchmarks are delayed is that a 64-bit version of Quake III or some
other game might become available in a week or two
as Opteron speeds increase well beyond 2 GHz, and more and more games become available
in 64-bit versions.
ROFL. Un-frickin believable! That's exactly the message AMD is trying get out - Opteron is
a gaming chip. They really hope the message reaches IT types in every Fortune 500 company
and government lab. That will sure sell those $2k 844s.
BTW, explain the business model that could possibly lead EA to develop a 64bit version of a
game with unique capabilities when there are hundreds of millions of powerful 32 bit PCs
out there and approximately zero AMD64 machines bought to play games? Heck, even Xbox
will be a far more interesting platform target than AMD64 for years to come. :-P
NaS, well, many AMD developers will work on IBM payroll, as well as the work of some
people on AMD payroll will contribute back to IBM. Should I call it IMD for simplicity?
Good idea. Calling it ABM will offend the left.
t's factors like this that allow the Opteron to do so well, even on existing 32-bit code.
So the Itanium submission ran a 32-bit JVM, eh? And it still outperformed Opteron? Well,
what does that say about Itanium 32-bit performance?
I doubt that this JVM was an x86 application. The 32 bits probably refers to the fact that
the JVM supported 32 bit addressing internally. Like 64 bit RISC ISAs, IPF does support
32 bit pointers if the designers of a toolchain choose to support it.
Tell me something, why do you think that Alpha failed?
We just found out yesterday. Compaq management knew that EV7 and EV79 couldn't
open any performance lead over McKinley and Madison. The EV8 would have been a
monster but it wasn't realistically due until 2005 at the earliest and was at great risk of
slipping further to the point it wouldn't have any lead over IPF either. Compaq decided
spending hundreds of millions of dollars a year for a me-too processor line was a poor
investment. HP and SGI had already accepted this. Sun is still in serious denial and
is imploding as a result. IBM has its own fabs and is rich enough to stay in the game.
POWERx is another me-too player but the combination of the market's natural desire
for diversity rather than monopoly and IBM's consulting business's ability to recommend
POWERx systems should sustain it for quite a while.
Ever heard of redundant elements in memory arrays? I bet defects in the CPU
core far outweigh unrepairable failures in the I2's L3 cache even though the L3
is 57% of the chip by area and the vast majority of an I2 by transistor count.
The reason for the I2 pricing scheme is simple. Some people might pay $4k
to get 6 MB of L3 instead of 4 MB while others might pay $4k to get 1.5 GHz
instead of 1.4 GHz. Why not satisfy *both* groups of people and maximize the
number $4k chips sold. The auto and cable TV industry use the same idea
when they group different options into indivisible packages.
374 mm2 eom.