Register for free to join our community of investors and share your ideas. You will also get access to streaming quotes, interactive charts, trades, portfolio, live options flow and more tools.
Register for free to join our community of investors and share your ideas. You will also get access to streaming quotes, interactive charts, trades, portfolio, live options flow and more tools.
Unless your application is multithreaded or you often run multiple apps at the
same time (and I do mean *run*, not just have multiple apps open) the speedup
from a second processor will only be a few percent at best (from off-loading
minor background system processes to the second processor).
Doing compilers seriously is also a big and expensive effort. Last I heard Intel's
compiler group is about 150 strong and that was before the Compaq compiler
people came on board. And Intel still funds outside compiler efforts for its uPs
like the ORC work at the Chinese Academy of Science. Similarly HP has its own
first class compiler team yet also funds the IMPACT team at UIUC.
Given AMD's size and profitability track record I don't see how it could afford a
serious and sustained effort rolling its own compilers.
I agree, big jobs tend to gravitate to the fastest workstations, especially if
the owner is in meetings or on vacation. But no one buys a machine with
a graphics head if it isn't going to be used at someone's desk. And if it
doesn't have a graphics head then it is a server, not a workstation.
We often forget how small the high end computing (personal and business) market is.
The top speed $500 chips never sell more than a few tens of thousands against tens of
millions of value processors.
Sorry I don't accept this for one second. Intel's manufacturing side won't allow a
new speed grade for a desktop uP to be released until it consistently bins out at
a certain percentage AC yield. I won't mention the figure I've been told but I will
say it is orders of magnitude higher than the ~0.1% figure you suggest for top
speed grade sales as a fraction of overall sales.
YB, high end workstations are not used like desktop computers. Usually, they exist
in completely different rooms, and users log in through a remote terminal.
Hello? That's a compute server. A workstation must be at your desk to use its
graphics head. You can run some jobs remotely on *other* people's workstations
too and direct the remotely running application to use the Xserver logical display
for your machine. But you still need your workstation at your desk.
BTW, I have used engineering workstations since the Sun-2, Apollo, and microVAX
to the present day. These are generally well designed from a system perspective
with air plenums to cool the uP and other hot components without objectionable
noise. It is slap together PCs with mix and match components that are noisy.
High end workstations is a relatively small market - less than a million per year.
Itanium 2 might be good for about 10% this year.
Probably under 400k units a year. If IPF takes 10% this year that will be pretty good.
How does that work? Suppose you have 2 threads, one says it doesn't want HT. Than
the other says it wants HT. Suppose the OS runs the HT first on one logical CPU, than
starts the non-HT. What happens?
When the OS context switches in the second thread to one of the logical CPUs it
would see that it was flagged for non-HT execution, take the first thread off the
other logical CPU and stick it back onto the runnable thread list etc, and put the
processor in single thread mode with the second thread's logical CPU active.
The 2 logical CPU turn to 1 logical CPU and the HT
is disabled for the entire system while the non-HT thread is active?
HT would be disabled while an non-HT thread was actively running. Every context
switch the OS would pick a new runnable thread and switch the processor into and
out of HT mode as necessary.
just heard from an informed source, that Semiconductor International Magazine will
name Intel's Fab11x, Top Fab for 2003 in their May 2003 issue.
Is the magazine going to publish a floorplan centerfold and a list of Fab11x's turn-ons
and turn-offs?
Chip, you are painting a nightmare scenario if you are saying that the functional groups are
being switched back and forth between K8, K9 as required
Not groups, individuals or small temporary teams. I have never worked at a company
where one got to wash one's hands of a piece of design work you did just because you
moved onto the next project. You design something, you *own* it.
Yeah, he's my evil twin, what do you think?
LOL, what a coincidence. Some on the TMTA thread think I am wbmw's
evil twin.
To say that K9 work can be done independently of K8 would be incorrect. I don't think
you are saying this, but I wanted to clarify.
No but certain specialities have little pullback after a certain point. Architects might
be pulled back from K9 etc for a short time to evaluate whether to increase the number
of X, Y, or Z resources and by how much in the K8 shrink but for the most part their
involvement with K8 ended years ago. Logic guys would get called back to help with
functional errors but for the most part the logic design doesn't change in a shrink.
Where K9 will run into resource issues these days is with circuit and layout people.
Circuit guys will be needed to raise K8 frequencies and for circuit tune-up and voltage
related circuit mods in the K8 shrink. Layout, verification, and test guys would be heavily
committed to K8 respins and K8 shrink and proliferation.
Maybe a tapeout in two years or less. But we all know that is still 12 to 18 months from
shipping commercially, depending on the complexity of the new design and how fast
debug goes. My personal expectation is that K9 will be a significant departure from the
K7/K8 mold.
That means K9 is still in the high-level architecture planning stages,
assuming they already settled on ideas and features.
The K8 has been in silicon for more than a year. That means its micro-architecture was
set in stone probably 3 year ago or more. Are you seriously claiming that the "high level
architecture" of K9 is still only in the "planning" stage after 3+ years of development?
AMD shareholders better pray that you are wrong. And Intel shareholders better pray
that Intel management isn't as equally complacent about AMD product development
cycle times as you are.
Itanium does seem to be a niche product. More than scientific work, though,
because HPQ is moving their customer base for workstations and non-stop
servers to Itanium
You seem to have a gigantic blind spot. HP is moving its big and lucrative
commercial application server family called Superdome to IPF.
Sorry I don't which OSes do this or the policy of the scheduler for the ones
that do. I am just reporting what the silicon can do.
As far as which one programs you would want to disable HT while running,
probably single threaded 3D games would be the most common case for
PCs.
Microprocessor development projects tend to overlap (pipelined would be a
good description) even at the smallest vendors simply because of the different
skill specializations involved at different stages and the 3 to 5 year development
cycle time. First come the micro-architects, then the logic guys, then circuit and
layout guys, and then the verification and test guys. The process isn't as linear
as that suggests, that is simply the order of critical tasks. For example, test guys
would generally be involved through the entire process but they would be most
critical after tape-out and during verification, characterization, and release to
manufacturing.
RIght now I would presume the micro-architects at AMD have long completed
the K9 and are busy defining and simulating K10 and kicking around ideas for
K11. The K9 is probably well along in detailed logic and circuit design with some
experimental layouts for critical elements done for feedback to the circuit and logic
people. Some circuit and most layout and test guys are probably still busy cleaning
up odds and ends for 130 nm K8 and getting ready for 90 nm K7 and K8.
One difference in this situation is that HyperThreading is either always on or off. You choose at boot time.
Wrong. It is selectable on a per process basis. An OS that properly supports HT should allow you
to set an attribute on each application as to whether you want to allow it to run in multi-thread mode
or force single thread mode.
Is that possible to just take Itanium out from the socket, plug Madison and run?
Yes. This was demonstrated at a recent IDF. In that particular system they didn't
even have to power down the entire box or reboot it, it was hot swap upgrade-able.
This x450 comes with 515 2-tier SD SAP score. Only 15% slower than 600 in the
same benchmark that we expect from 4-way Opteron 1.6 Ghz. I'm afraid AMD will need
1.8 Ghz Opteron to beat this server by more than 20%.
The proof is in the pudding but you are probably right.
But if AMD's new 130 nm copper SOI CMOS server uP in its sweet spot system size
couldn't match let alone beat Intel's soon to be replaced 180 nm aluminum bulk CMOS
server uP on a commercial workload like this it would be a pretty sad statement indeed.
BTW, how fast an Opteron do you need to hit 860 SD SAP?
Follow the thread. The McKinley processor was introduced 9 months ago.
I guess the time to beat is then either:
2 years from the introduction of the Opteron family
or
9 months from the introduction of the Opteron2
whichever comes first.
Assuming AMD survives long enough to see either milestone
or
goes under.
Whichever comes first.
IBM's new '450 server is based on the Itanium 2, aka McKinley. It was introduced
about 9 months ago.
I am actually pleasantly surprised by the failure of other major uP/system/compiler vendors
to bother duplicating Sun's stupid compiler trick on 179.art. The nature of the optimization
is apparently well understood. It seems to be just general enough not to run afoul of SPEC
rules governing customized optimization for SPEC source code. I would have loved being
a fly on the wall during the first submission review discussions though.
Everyone who is serious about HPC knows about the invisible asterix on Sun SPECfp2k
scores and takes it into account. Sun had better hope people don't get too used to discounting
UltraSPARC FP benchmark scores by 20% in the time it takes to bring out SPEC CPU 2005
(or whatever the successor to CPU2k is called) or it will have deservedly backfired on them.
Maybe we should start a list of rumors believed to be facts by the AMD crowd.
#1 Intel threatens customers who consider AMD products.
#2 Intel's compilers are designed only to produce good SPEC scores.
Somebody help me out here. I know there are plenty more...
#3 The Pentium 4 was designed to reach high clock rates at the expense
of absolute performance.
#4 The McKinley was entirely designed by HP (which is why is so improved)
#5 The geniuses left the Alpha design team for AMD in 1997 or earlier,
leaving only morons behind.
#6 Intel compilers deliberately generate code calculated to run slow on AMD uPs.
#7 FP intensive code optimized for P4 depends entirely on SSE2 and the Hammer
will see similar speedup.
Don't take my word for it, download the design rules for any foundry logic
process and I bet dollars to donuts that they have specific rules for laying
out laser fuses.
Sure. Takes about 10 um2 per fuse. I can't think of the last chip that I worked on that
didn't have use laser fuses for device configuration, debug, and when applicable,
redundancy control.
One-time programmable with laser fuses. Can do that in any process without
any changes or cost adders.
I read somewhere that IBM is #1 server vendor, but I guess it depends on how you
count. A mainframe can be considered to be a server, and that is how IBM probably becomes #1.
The number of s/390 and z-series servers sold annually is probably in the low 4 figures
and the associated hardware revenue is probably a couple of billion. So I don't think it
affects the server rankings by units at all, and by revenue by much.
Where IBM makes its big money on mainframes is in highly leveraged drag-on software
and services and I don't think this shows up as server revenue.
Consider the infinitesimal benefit of storing config data in flash. Now consider the
cost of adding flash capability to a performance logic process. Intel doesn't even
want the trouble of thick oxide devices for I/Os in its MPU processes. Flash? forget
about it.
BTW, go look at the comments said about integration of flash when Intel disclosed
its recent Xscale telecom MCM-based component with separate dice for logic, flash,
and IIRC, RF.
Could you name some other times? The Inq sometimes gets fed duff information, but
I don't think you can name a time when they just made stuff up. They publish their stuff in good faith.
They recently suggested that Intel stored uP configuration info for things like enabling HT
in P4s in on-chip flash. This is an absurd suggestion they seemed to have pulled out of
thin air. It is easy for Magee and most alledged contributors to publish in good faith simply
out a high degree of technical ignorance.
I read the inquirer and theregister every day because they often catch whiff of interesting
stuff before anyone else. But both are basically gossip sites with low batting averages and
a propensity to focus on trivia at the expense of the important but not unexpected. These
sites works best for readers with a sensitive BS detector backed by a lot of technical
knowledge and industry experience.
I'm wondering why you don't want to believe that the Opteron was so thoroughly redesigned. It's not
like they haven't had time!
I for one believe that Opteron was thoroughly redesigned - at the circuit level. It is a given when
moving from bulk to SOI. The real issue is at what other levels it was redesigned. Much of the
pipeline, functional unit characteristics, and basic floorplanning seems straight out of Athlon
(which is quite reasonable, Athlon is a sound CPU design starved for bandwidth for most of its
life. And AMD ain't exactly rolling in the dough enough to change things just for the sake of change)
As far as HDL and logic re-use from Athlon we can only speculate.
"The quote that you linked seems to suggest a lot of rework for nothing, and I don't buy it."
They increased the number of pipeline stages by two, or 20%. That would mean a general re-balancing
of all the resources including the execution units. If not, they become a major chokepoint and a likely
target for a re-work of the core. I don't think you would find any competent design team that would do
something like that...
They changed the front end to improve IPC and also had to increase the complexity of the x86 decoding
logic for x86-64. These changes are in the part of the pipeline where the two extra stages were added.
Add that to the fact that the execution back end of an x86 processor is unlikely to be the timing critical
part of the device and there is no compelling reason to assume that there was a lot of timing slack
that needed to be re-apportioned to the pipeline past the issue stage to balance the design.
The Opteron's clock lag behind Athlon despite SOI processing seems to confirm that most of the
logic evaluation time of the two new stages was used up rather than redistributed to improve
frequency scalability.
And I posted some pretty definitive links stating that Samsung was shippping Alphas in late 1997.
It was, two variants of EV56, the 21164 and the 21164PC. But the subject
of this thread is the EV6. Samsung did not manufacture the 0.35 um EV6.
It all boils down to my claim that the Alpha design team extended the EV5
microarchitecture with extra functional units and OOO capabilities to create
the EV6 and got it to clock as fast as the EV5 in the same 0.35 um process
with the same number of stages in its basic execution pipeline.
I offered proof in the forms of citations of two ISSCC papers to support my
claim of identical process between 0.35 um EV56 and EV6. To disprove
my claim in favor of yours you have to offer even more authoritative evidence
for all the following points:
1) Samsung manufactured EV6 in 0.35 um at all
2) Samsung manufactured EV6 in a 0.35 um process different from and
faster than DEC's 0.35 um process used in its/Intel's Hudson fab
3) Samsung manufactured EV6 in a 0.35 um process different from and
faster than the 0.35 um process it used to manufacture EV56 since
Samsung never shipped faster EV56 parts than did Hudson.
You haven't shown ANY of these three points to be true, let alone all three.
You have instead gone on a wild goose chase of obfuscation. You are a
sophist and a liar. I will not post on this subject again.
Re: The 0.35 um EV6 was shipping commercially since 4Q97.
From Korea. Your article was talking about the EV7.
Wrong. The EV7 is made by IBM. Samsung has never had a license
for EV7. This would have been the 0.18 um EV68.
Try this on for size, I think it is pretty definitive:
http://news.com.com/2100-1001-204819.html
"Samsung unveils 700-MHz chip
October 30, 1997,"
"Alpha chips shipping from Digital currently run as fast 600 MHz, though Digital is expected to
ratchet up the clock speed of its own Alpha chips in the coming months. Samsung licenses
Digital’s Alpha processor technology."
"Samsung says it plans to begin commercial production of the Alpha chip in the second half of
1998, according to an article in the online edition of Nikkei Business Publications, which cited
a report in the Maeil Business Newspaper of South Korea."
"The Samsung chip is made on an advanced .25-micron production process, a state-of-the-art
production technology now being adopted by many chip makers. The 64-bit chip has a
whopping 15 million transistors."
Now what about all those 0.35 um EV6's made in Hudson for DEC by Intel and shipped in
systems ranging from 500 to 575 MHz (and up to 600 MHz in OEM gear) *before* 2H98?
The Samsung part, EV67, shipped in DEC/Compaq 667 MHz systems.
But after DEC got into trouble, slashed capex, then sold out to Compaq, the Alpha
21264 had to be fabbed by Samsung.
Here:
http://news.com.com/2100-1001-234300.html?legacy=cnet
"By Erich Luening
Staff Writer, CNET News.com
December 13, 1999, 7:00 AM PT
update Compaq Computer has chosen Korea's Samsung Electronics to manufacture its
newest Alpha processor in a deal worth $500 million."
The 0.35 um EV6 was shipping commercially since 4Q97. Where did those EV6 come
from? Did Samsung manufacture them in late 2000 and send them back 3 years in a
time machine? No they came from Hudson, made in DEC's 0.35 um process by Intel.
Samsung manufactured 0.25 and early version 0.18 um EV6s.
The process technology used for the 21264 was not the same as that used
for the 21164, which is what you are claiming.
At 0.35 um the processes were the same.
Look at ISSCC 1997 papers FA 10.6 (EV56 processor) and FA 10.7 (EV6 processor)
and compare the 0.35 um process description in Table 1 of both papers, they are
identical.
Look at Microprocessor Report Nov 17, 1997, article entitled Digital Sells its Chip
Business:
"Under the agreement, Intel will build Alpha chips for Digital, since the latter will
no longer have a fab. Digital will continue to be responsible for developing and
marketing Alpha microprocessors; Intel's role will be solely as a foundry."
"In the short term, the Hudson fab will continue to run Digital's 0.35 um process,
and in the future, DIgital's 0.25 um process."
Find a link that lends credibility to your story or go away.
I have given two independent sources for my claim. Now you go away liar.
Look on slide 2 of the pdf. Focus on the heading
that says, "Foundry Fab."
http://www.decus.de/slides/sy2002/18_04/3H05.pdf
Same fab and same 0.18 um process, different owner.
Intel took ownership of the DEC fab as part of the
IP suite settlement and provided foundary services
back to DEC.
Previously you said:
But the move from EV5 to EV6 was also the move
from DEC's end-of-life FAB to leading edge foundry
technology from Intel and Samsung
That is wrong. Alpha was never fabbed on an Intel process.
I'm sure you just made an honest mistake. I await your apology.
PS - You might look into lining up a link before hurling insults....
Like I said before, are you really badly misinformed or
do you just make this stuff up?
Wrong. The last EV5 (EV56) and first EV6 were both made in DEC's 0.35 um process
in its Massachusetts fab. Samsung made the 0.25 um EV67 and the first generation
of 0.18 um EV68. The second generation EV68 and EV7 are made by IBM.
BTW, are you just badly misinformed or do you deliberately makes things up?
Sorry to disagree, but that's not what I was told by the product engineers in Folsom. Maybe they bought
some and I don't remember the mix but most was made in house.
Just the Xeon full speed cache SRAM. Intel bought the chips for desktop Pentium IIs and IIIs.
http://www.e-insite.net/ednmag/archives/1998/110598/23cs.htm
"When Intel moved its CPUs from Socket 7 (Pentium) to Slot 1 (Pentium II) packaging, it began buying the
SRAM chips itself for integration onto the CPU modules. Previously, the SRAMs came on separate memory
boards, which PC manufacturers purchased and installed, along with the CPU and other components and
add-in cards, on the motherboard.
Only a few SRAM companies supply Pentium II L2 cache. Intel does not identify them, and nondisclosure
agreements prohibit the vendors from identifying themselves, but an often-repeated rumor indicates that
the companies are Mitsubishi, Motorola, NEC, Samsung, and Toshiba."
That was tongue in cheek. The Klamath, Deschutes, and Katmai processors
used commercial SRAM that Intel bought from specially qualified vendors for
the module level L2.
Intel made its own custom stand-alone SRAM devices for PPro, Merced, and
some models of Xeon but never sold them by themselves, only packaged
with a CPU.
Since Itanium is doesn't support OOP, and (IIRC) doesn't have to decode instructions into
micro-ops, it doesn't do a lot of the work done by P4 (and Athlon and PIII). If you remove P4's
decode stages, and remove that stages that order, re-order, and resolve OO instructions, how
many stages are left?
That's true. The Alpha guys took the worlds fastest (in-order) processor at the
time, the EV5, doubled the number of integer units, added the most aggressive
OOO execution machinery the computer world had ever seen, and kept the clock
rate the same.
And to do this they had to add ZERO more pipe stages.
It already is.
BTW, it probably exceeded the SRAM bit/year shipping rate of Motorola, Cypress, IDT,
and NEC put together sometime in the ramp of Coppermine.
I just hope no one convinces the feds to force Intel to unbundle its SRAMs and CPUs
and sell them separately. D'oh!