Register for free to join our community of investors and share your ideas. You will also get access to streaming quotes, interactive charts, trades, portfolio, live options flow and more tools.
Register for free to join our community of investors and share your ideas. You will also get access to streaming quotes, interactive charts, trades, portfolio, live options flow and more tools.
This isn't anything like the production systems, with final bios and dual memory channels
that are due to be released next Tuesday.
Of course not. The Opteron will be released on Tuesday. This review was of an
Athlon 64 with a 1 MB cache.
AMD isn't going to release this *monster* until the fall. No doubt Intel execs will
bail out of their INTC shares before the competitive shock wave of a PR 2800
AMD processor with at least a 3x higher manufacturing cost than an Athlon hits
home, ROFLMAO.
Either way, it looks like a year from now AMD plans some 90% of wafer outs to be Hammer.
Why do you say that? About a year ago AMD execs were saying the K7/K8 crossover wouldn't
occur until mid-2004. Since then the intro of A64, the high volume Hammer, has been delayed
for whatever reason by almost half a year. That's hardly compatible with pulling in the K7/K8
crossover as you claim.
I don't see anything wrong with their technology.
Besides inserting an extraneous layer of software between the instruction
set and the transistors?
Someone should scoop them up, but I don't know who.
Hint: the French call them "caninettes".
http://www.dogsinthenews.com/issues/0103/articles/010313a.htm
I am intrigued. If they are skipping 90nm, AMD may be playing a very powerful trump card.
This is nonsense. Moore's law works because the entire industry busts its butt to keep
up with this self-fulfilling prophecy. AMD can't get to 65 nm any faster by skipping 90 nm,
the equipment, masks, resists etc won't be there. If AMD chooses to skip 90 nm and go
straight to 65 nm then it will be selling 130 nm devices for twice as long and against
competing 90 nm devices and it won't survive to see 65 nm.
if the power dissipation of Madison is not such a problem, why would they release a castrated and
pretty slow Deerfield?
Pretty slow? I guess that's in the eye of the beholder. Compared to the SunBlade 150 I had on my
desk until recently, I'd take a Deerfield box over it in a millisecond if the right software is there.
As for why, that is quite obvious. Why was the 386SX marketed? Or the 486SX?, Or the Celeron?
To establish lower price/performance points to sell more silicon and expand market share. The
Deerfield will allow Intel to provide an IPF processor at 3 figure price for systems that sell at low
4 figure prices while continuing to sell its big brother (probably the same die, that's the funny part)
for 4 figures.
Why also Sun has this requirement that the cpu used in blades consumes no more than 16 W?
Their mech eng and system packaging guys can't design blades worth crap?
And because so much area is sram the normal defect density predictions are likely to be too conservative.
Quite so. The Madison L3 architecture can fully repair up to four severe defects without loss
of capacity. From a susceptibility point of view Madison is comparable in effective area to
Opteron if not a bit smaller. Add in the effects of relative process maturity, BEOL differences
and parametric yield problems unique to SOI and I wouldn't be surprised if Madison yielded
much better than Opteron over most of their respective lifetimes.
So Opteron will be behind Madison this summer, as we don't expect anything above 2 Ghz.
Keep in mind that Madison will also be scaling up in clock rate over its life time too and
a 9.0 MB version is expected next year.
But please pay attention that Madison has 6Mb L3 cache. It's not a big shame to loose
to such a monster. Intel won't be able to make it in volume, while Opteron may be (not enough
info to claim, but anyway) quite popular. At least die size is much smaller.
The Madison is a large chips (about 2x bigger than Opteron) but most of the area is L3
cache (~60% IIRC) and is protected by redundancy. And it is manufactured in a mature
130 nm bulk CMOS process so I have little doubt that Intel can turn these puppies out
with good yield and in whatever quantities the market will absorb. OTOH, the Opteron is
made with a SOI process with more layers of interconnect so I think the question of
manufacturability is more of an issue for AMD. The slip of A64 to the fall and the purchase
of process help from IBM suggests it is an issue which hasn't been completely resolved yet.
Bottom line is that CPUs either have trouble booting, or they're perfect, or they corrupt data.
LOL. Thanks for providing me all I need to know to judge your level of "knowledge"
of semiconductor engineering. Or any kind of engineering for that matter.
It is more likely a new speedpath recently discovered that needs to be added to the vector set.
I think the fact that this part doesn't set a new high for CPU frequency tends to argue
against this cause. Intel has been shipping a slightly faster part (3.067 GHz) for quite
a while now. I think it is more likely related to signal integrity or long term reliability
(i.e. overshoot or undershoot related stress margins) on the FSB interface. The fact
that it was caught so late suggests the issue only mainfests itself on specific OEM
designed motherboards.
I agree it is racist but not in the way you claim. It is in fact an admission
that black fighters have dominated boxing since the 50's and the few
pro white fighters to get top billing since were overhyped tomato cans. In
a sense there is an air of desperation, of cynicism, of being a long shot
with the term great white hope. As such it is an interesting choice for
describing the Opteron.
Can you point to any reports of data corruption involving the 3.0 GHz P4? Once
again you demonstrate your fleeting acquaintance with the truth.
All we know is that Intel halted shipments. There are a variety of reasons that
this may have been done. But at this late stage of the game it is IMO most likely
because the ATE guard band on some minor data sheet spec had been set too
aggressively and some samples showed up that failed this spec for one of the
operating condition corner cases.
If so then the delay should only be a few weeks to rescreen parts. If this is
not the case then we can expect the delay to be much longer. All we can do
is wait and see.
If MSFT sees dollar signs and a possible revenue stream with AMD's
Opteron this could be a strong signal to businesses that Opteron is a
strong competitior and alternative to Itanium.
And this is enough to convince businesses, the vast majority of which
wouldn't touch 3 figure priced Intel compatible AMD-based PCs, to buy
4 or 5 figure priced non-Intel compatible Opteron based small to mid
sized servers? Sorry, I don't see it. AMD faces a long, slow, up hill
battle to gain credibility.
Hmmm. Why is it, do you think, that Intel's attempt to market the Centrino as a premium part has failed?
It is just indicative of the huge pricing discounts Intel provides to Dell to bribe
them from adopting Opteron and Athlon 64. ROFL.
Couldn't possibly be due to Dell's ability to squeeze margins and efficiencies
beyond any other PC vendor while still making money. :-P
So why doesn't AMD talk it up more?
Perhaps because it is a server chip. When your customer is a business or
institution instead of a teenage male gamer it is pointless to engage in a lot
of silly braggadocio. Assembling a decent list of OEMs and ISVs that support
your chip is far more impressive.
Bull. Products can and will continue to receive speedpath optimizations, process improvements, and minute design changes, all within the lifecycle of the product.
Some vendors like Intel can do that, others don't have the money, time,
design resources, or simply the same economic incentive to continuously
respin a working and released device.
In the case of AMD I agree that it also engages in spin for bin. My comment
wasn't to argue that it doesn't do this but rather that if AMD did have a 2.5 GHz
part in 130 nm IMO it would have *already* been squeezed to death and it
wasn't getting any faster until 90 nm.
Chipguy, your theory may work in a vacuum
It's not "my theory", it is the way business works. Graph the performance vs time of Intel desktop
processors, Sun's fastest server/workstation, or IBM mainframes. You will see a remarkably
smooth slope, not a staircase.
but I don't think AMD will be competitive against Intel with anything less than 2GHz. The notion
that they have 2.5GHz in their back pocket, which is frequently propagated by AMD hopefuls, does
not hold any water in a competitive environment.
What if they did have it and release it now? Realistically it would be the end of the road of
the 130 nm part. It would cause a splash but Intel and others would respond to it immediately
with Xeon price cuts, increased marketing, etc. AMD would then be basically dead in the water
until their 90 nm process ramps. That would provide an stationary target for Intel to tee off against
for months if not years. No company wants to lose the initiative and be seen to be stalled while
others catch up and pass.
Sgolds, everyone releases a processor at the top speed as soon as it's available and tested.
Then, as speed path optimizations and process improvements allow, that speed can be increased
incrementally over time. If 2.5GHz samples are already available without special means, then I see
no reason why they won't launch at this speed. There is absolutely no reason for AMD to hold back
at this point. They absolutely need Opteron at its top speed to compete in this market.
I am surprised to hear this from you. You know as well as I do that major step discontinuities in
performance vs time is hugely disruptive to sales, manufacturing, product positioning etc. The
principle that a new product is held back early in its life cycle to allow smooth and predictable
"upgrades" and "mid-life kickers" later in its life cycle has been a critical principle in the computer
industry long before ICs.
If uPs were released at their maximum potential we would have a situation where the performance
of any vendor's product line would look like a staircase with sloped landings. Each process shrink
would produce a huge step jump while design and process improvements would produce a much
gentler slope until the next shrink. Can you imagine the disruptions and wild swings this would
cause to customer buying cycles, manufacturing process transitions, vendor revenues?
That being said I don't buy the Opteron is now running at 2.5 GHz rumors. It may or may not approach
2.5 GHz at the end of its lifetime in 130 nm but that is irrelevent for the launch and ramp over the next
few quarters.
Talk about a recipe for disaster! No, they will release at safe speeds and step it up every few months.
Step [Opteron] up every few months? Do you have any idea the kind of evaluation and
qualification exercise a server vendor puts any new uP relase, including speed grade
bumps, through before releasing it in production hardware? Practices that work in the
PC world often don't apply to the server world and can actually backfire on the vendor.
Mageek started a rumour that Opteron 2.5 Ghz already exists:
LOL, I wouldn't bet a wooden nickle based on this one. Didn't mad Mike
infamously predict the Opteron would be released a year and a half ago?
Either Magee or Dvorak would make an admirable replacement for the
Iraqi minister of information.
neye_eve, should AMD be successful in cornering 64-bit computing then yes, there will be
books written on the development of Hammer. If they don't then no one will care.
Everyone loves a winner. If you want an insightful history, AMD has to win.
I disagree. Probably the most famous and successful book ever written on computer
engineers is "Soul of the New Machine" by Tracy Kidder. It was about the development
of the Data General Eclipse MV/ 8000. Yet both the machine, and DG subsequently,
would have to be considered failures.
In more recent times there was an excellent and detailed book by IEEE press on
the design of the AMD K6 processor. IMO at best K6 was a marginal success either
technically or commercially, but I am sure there other opinions here on the matter :-P
If AMD succeeds and corners the commodity 64-bit market then all of this will be forgotton. If they don't, it doesn't matter.
Not to be pedantic but people should try to keep a broader view. I think you mean "if AMD succeeds and
corners the 64 bit desktop uP market". MIPS and its licensees and semi partners already have the 64 bit
commodity uP market nicely sewn up and the Hammer family is far too expensive and power hungry to
make even a small dent there.
Why would you ever expect speed degradation? What critical processor timing path would run through a peripheral?
I wouldn't. That's the point.
Yet you still advanced the theory that the on-chip peripherals could be affecting Hammer's
clock rate on several occasions IIRC.
Your acknowledgement that this idea is nonsense seems like a clear admission you were
deliberately and knowingly spreading FUD about Hammer. No surprise considering where
your loyalties obvious lay but it is nice to get it on record for posterity.
Hammer is a K7 with 2 more pipeline stages, a memory controller and aHT ports. The
additional pipeline stages should have added to the frequency.
Good argument if the Hammer *was* a K7 with 2 more pipe stages. But it isn't. The extra
pipe stages are in the front end which have been significantly reorganized to 1) raise IPC
by reducing structural hazards etc, and 2) to handle yet another layer added to the instruction
set. These changes could very well use up most of the logic eval time of the extra two pipe
stages. FWIW my own WAG is Hammer will eventually make it to 2.4 GHz , may be a bit
higher.
I gave you an example of Intel adding a memory controller to their CPU core plus a
graphics controller and seeing no speed degradation whatsoever compared to their
mainstream cpu.
Why would you ever expect speed degradation? What critical processor timing path
would run through a peripheral?
Please... Don't point me to a textbook, customers don't buy textbooks. Show me a
product on both bulk silicon and SOI where SOI shows an advantage.
Compare the IBM RS-64 PowerPC processor implemented in CMOS7S
(0.22 um bulk CMOS) and CMOS7S SOI (0.22 um SOI). Both contain 34M
transistors, both have the same Leff, both have the same 6 Cu stackup,
and both are 139 mm2 in size.
The bulk version runs at 450 MHz while the SOI version runs at 550 MHz.
The SOI version dissipates only 9% more power than the bulk version
despite clocking 22% faster. If I am not mistaken this 64 bit uP powered
the IBM AS/400 mid range server circa 1999 and more than a few have
been sold.
Where is the evidence to support this?
LOL, do you deny the fundamental physics that SOI eliminates the PN junction
capacitance associated with the source and drain active areas of a MOSFET?
Look at the capacitance breakdown on a typical net and do the math.
Does this mean that a) AMD didn't redesign their circuits? or b)AMD didn't
know what the hell they were doing?
There are hundreds of possible reasons why K8 isn't yet clocking as fast as
the latest K7s. Have you considered something as simple as this is the first
implementation of the K8 microarchitecture while AMD has re-implemented
the K7 microarchitecture around half a dozen times? Maybe they've learned
something along the way. Engineers tend to do that.
IBM also seems to be saddled with an SOI process that, at the 0.13 micron level, cannot
get speeds beyond 1.4 or possibly 1.8 GHz for their Power4/5/Apple products.
That is an over simplification. The Power4 processor family used a design methodology
much closer to an ASIC flow with embedded memory than the full custom design flow
used by Intel, HP, and others.
I have talked to circuit designers on a high end uP design team (non-IBM) about their
experience with IBM's bulk and SOI processes. If you redesign all your circuits from
the ground up, and you know what the hell you are doing, then you can realize 10 to
15% higher clock rates with SOI (IBM claims higher speed ups but they use much
less dynamic logic than other design teams and thus start from much further behind).
The thing most people don't realize in these SOI vs bulk religious wars is there isn't a
one size fits all solution. It is not a contradiction when Intel says it can best meet its
goals at say 130 nm with bulk CMOS while IBM says SOI is just the ticket at 130 nm.
The difference is in manufacturing volume. Intel makes so many chips that engineering
costs are spread out thinly and manufacturing cost is a major chunk of overall chip cost.
For IBM, its Power4 chips are manufactured in such small numbers that even if SOI
doubles the silicon cost of each uP that it is still far less expensive than an extra $100m
in design effort and speed path tuning respins (especially for a server chip that has an
long and expensive verification and requal).
TIA for any clarifications/explanations
The "magic memory" I spoke of isn't so much zero latency memory as the removal
of the non-cache memory component of CPI. That not only removes latency overhead
but also bank conflicts, page misses, precharge delays, cache line burst transfer time,
address and data setup time, internal transfer time, and memory controller overhead.
That is why I called it magic memory, not just zero latency memory.
Latency reduction helps integer code a lot more than FP code but most integer code is
fairly cache friendly. If you could find magic zero latency memory most integer apps
would improve in performance by only 40% or so. Going from 112 ns to 87 ns is a
very small change in the bigger scheme of things and probably corresponds to a few
percent higher improvement. I would be very surprised if it was more than 4% gain on
a 256 KB A64 and a 2% gain on a 1MB A64 on integer code. For FP code the effect
would be less than measurement noise.
If it turns out that Centrino has less than a full speed cache, will you be willing to
refer to it as the Celeron (cache crippled) version of Pentium III-M?
Intel has said that they added an cycle of latency to the L2 cache to allow
a greater amount of dynamic power management. However considering
the historical great disparity in L2 speed and bandwidth between AMD
and Intel uPs this will likely still leave the Banias well ahead of Opteron
on both counts.
Then, 1 IPC seems wrong, I think Athlon has something like 3.5 IPC.
No, the 1 IPC is approximately correct although it will vary a lot from code to code.
Also keep in mind that with a modern (decoupled execution) x86 processor there
are actually two IPC figures for a given program run (trace). The first is the native
IPC - the average number of x86 instructions executed and retired per clock cycle.
This averages 0.8 to 1.0 or so in a PIII or K7 class processor. The second is the
uop IPC - the average number of micro-operations executed and retired per cycle.
This number is typically 1.3 to 1.5 times higher than the native IPC.
The only processors that have achieved 3.5 IPC or more on a non-trivial set
of large scale applications are the Alpha EV68 and the Itanium 2, both of which
are six wide issue CPUs matched to tremendous cache and memory systems.
The 3.5 IPC level is achieved on FP codes. With pure integer code IPC is closer
to 2.0 to 2.5.
If Astro’s performance turns out to be 50% better than Crusoe’s on a constant-clock basis,
Astro will likely have a bright future in the marketplace. Outside of some tech geeks, few will
especially care how the improvement was achieved. .
The point is that aside from increasing issue width, the other avenues to increasing
architectural performance are even more difficult, complex, power hungry, and tend to
have deleterious effects on clock rate (the brainiac curse) of the same order or greater
than the gain in IPC. That is why I strongly doubt the 50% figure is close to being realistic
But if that was the design *goal* then it would be easy to see a tremendous increase in
complexity, die size, and clock normalized power consumption compared to Crusoe
even for a moderate increase in architectural performance. Pick your poison Dew.
I would be willing to bet that the improvement in throughput from Crusoe to Astro will be in the neighborhood of 50% on a constant-clock basis.
The performance gain from 2 to 4 way issue OOO
execution superscalar RISC was about 50%.
With ILP diminishing returns and the problem
of memory stalls in an in-order machine, going
from 4 to 8 way issue VLIW will not come close
to 50% speedup.
If the Astro is architecturally 50% faster than
Crusoe, which I doubt, then most of the speedup
would come from other, quite substantial, design
enhancements unrelated to issue width.
I just find it amusing how confident Dew is that flyspeck TMTA will be able to generate
great code for an 8-way VLIW using on-the-fly profiling and binary recompilation of an
x86 image. Compare that with the moderate success HP and Intel have had generating
great code for a 6-way VLIW starting from program source code and enjoying basically
indefinite compile/optimization time and extensive run time profile information collected
from a heavily instrumented CPU.
I see your penchant for playing bad hunches hasn't changed. TTFN
My highest-cost shares are at $23 and my lowest-cost shares are at $0.79; I bought a heck of a lot more shares at <$1 than at sky-high levels, and hence my average cost is around $2.
All that is immaterial, however, is analyzing TMTA as an investment going forward. I base my investment decisions on where I think the company is going –not on where it has been. Obsessing about
past losses is a gambler’s mindset and mostly a waste of time
Well that sure goes a long way to explaining what you've written on the Intel thread. Thanks
for painting the picture in such detail, or should I say, with extensive profiling. LOL, See ya!
I beg to differ. I use my TC1000 daily as my primary computer. I develop software
on it using JBuilder and Visual Studio and I never ever have any performance
problems (never ever)
LOL. Not just "never" but "never ever". And you just had to say it twice.
The ability of the human (especially male) mind to rationalize bad gaget buying
decisions is simply amazing. Combine that with a positive outcome based self-
selecting sampling process common on the net and I bet we won't hear from
any individuals who find it disagreeably slow regardless of how many now sit
on closet shelves.
By any chance are you also a TMTA investor? IIRC I have seen your name
on the TMTA thread.
Pardon me if I ignore anonymous personal testimonials about subjective speed
and instead consider the limited range of independent benchmarking performed
and published to date. I think that a lot more needs to be done in this area but
the results to date hasn't been exactly kind to TMTA's "software based processor".
That's progress I guess. You previously claimed I wasn't aware that CMS did
run time profiling. Now you claim I was wrong because I don't think it does
*extensive* profiling. You're nothing if not amusing Dew.
Moreover, chipguy was not even aware that Crusoe’s CMS does run-time profiling.
No kidding? Then I couldn't have possibly written the sentence
"At the same time
because CMS is in operation on the fly interleaved with program execution it has to be
very parsimonious in the types of profile information it can capture and examine and
the types of computationally intensive code optimizations it can employ compared to
a state of the art compiler for IPF or superscalar RISC."
in one of my earlier posts, say # 4601.
So despite chipguy’s prevalence for buzzwords, he does not seem to understand how Crusoe actually works.
LOL, I think the problem lies in your RX rather than my TX. :-P
However, I'm puzzled by your snake oil reference and want to understand it
To put this in perspective you have to consider all the pre-IPO buzz about Transmeta and
the rumours it spread about what it was doing. The early buzz was its would create a
software based x86 compatible chip that would blow away Intel and AMD.
When things progressed far enough that it became obvious that it was a pipe dream
Ditzel and company changed tack and decided it would tackle the low power niche
Intel and AMD were ignoring. A combination of excellent spin control, a credulous
trade press and voila, Transmeta's failure became a success. No one seemed to noticed
that its arrow fell far short of its target and it slyly drew a new bullseye around where it
landed. Transmeta's IPO was a success and Crusoe was launched to pretty good
press despite the incredible obfuscation about processor performance in all the
technical whitepapers that accompanied the launch. But where were the third party
reviews in the months that followed? Even MicroDesign Resources, the people that
publish Microprocessor Reports complained that after many attempts they were unable
to obtain an evaluation board for independent performance testing. The general rule
of thumb that has emerged is that a 5x00 series Crusoe is about as fast as a mobile
Coppermine PIII of half the clock rate. Crusoe design wins are few and far between
and the company is losing monet hand over fist. Anyone who bought the hype and
bought into TMTA on the ground floor sure got burnt. Hence my snake oil comment.
I am happy you like your Crusoe based computer. Obviously for your needs compute
power isn't as crucial as it is for others. But for many others they need a performance
level approaching that of a desktop processor and for them a Crusoe base system
will not come close to being sufficient. The market place decides winners and losers
and it is hard for me to see how Transmeta will deal itself a winning hand sticking to
its code morphing on VLIW paradigm and simply making ever wider issue processors
like the 8000. There are reasons some things are best done in application specific
silicon - be it running 3D rendering pipelines or executing x86 binaries. At least that's
my take on it.
The performance will be less than a 1MB Athlon-64, of course, probably by about 2 to 3 speed
grades (of .2 GHz in this case). But then, the smaller chip area may allow a higher clock speed to
gain back some of those speed grades.
Why would you think that? Only a very badly designed processor would have a speed path
affected by L2 cache capacity. Don't bring up heat, a 256 KB A64 would probably burn
more power than a 1024 KB A64 because it will be accessing memory nearly twice as
frequently on average.