Register for free to join our community of investors and share your ideas. You will also get access to streaming quotes, interactive charts, trades, portfolio, live options flow and more tools.
Register for free to join our community of investors and share your ideas. You will also get access to streaming quotes, interactive charts, trades, portfolio, live options flow and more tools.
chipguy,
Step [Opteron] up every few months? Do you have any idea the kind of evaluation and qualification exercise a server vendor puts any new uP relase, including speed grade bumps, through before releasing it in production hardware? Practices that work in the PC world often don't apply to the server world and can actually backfire on the vendor.
What if the vendor designs for the worst case thermal characteristics to begin with, and tests on a cherry picked sample? I don't see a big deal in plugging a slower, cooler CPU into such system.
Joe
Tenchusatsu,
doing the true NUMA that you suggest is only worth it for large-scale systems where the ratio of remote-to-local access times is huge.
If I understand the concept of NUMA, it is basically software implementation (some may be automatic, some requiring API calls) on basic level, and possibly, on the next level, interaction with hardware, to direct hardware that some "shortcuts" can be implemented for local memory access (possibly skipping coherency checks).
Anyway, on the software level, if, say MSFT writes the software, it is just a question of making it available for lower end version of Windows, rather than just "datacenter version".
The benefit would be even in a 2 way system.
In other words, you're talking big iron. 4-way Opteron is nowhere close to big iron, nor should it be.
Well, 4 way Opteron would derive the same benefit as 16 way Xeon system, since there are 4 local pools in both systems. So if there is a benefit of NUMA for 16 way Xeon system, the benefit should be on the same order of magnitude in 4 way Opteron system.
Joe
sgolds,
I seem to recall that there is some NUMA support for Intel CPUs in Windows 2003 Server. The reason for this is systems > 4 way, which are built from blocks of 4 CPUs, with CPUs in each block sharing memory. This memory would be local for the 4 CPUs and the rest remote memory.
Joe
sgolds,
this is consistent with the speculation that the September release of Athlon64 is timed for the Windows release.
I still don't buy it. IMO, the delay is a combination of
1. not enough resources to launch Barton, Opteron and Athlon 64 all at the same time
2. Athlon 64's multiple problems, such as 256K version not performing well, 1 MB L2 version performing barely ok (with single memory channel) while costing the same to produces as Opteron
3 clock speeds below target
4 not ready to start high volume production right away (slower ramp of low volume Opteron first, later increasing and adding A64 making more sense)
5. Excellent results of Barton
Windows OS probably comes after these 5 reasons.
Joe
wbmw,
Sorry, that's not how it works. Coherency checks are there to check that other processors do not have a cacheline in modified state.
That's how it is done in a system that treats all memory the same, but a NUMA implementations treat local and remote memory differently. Not that there is a NUMA OS for Hammer out there today, but one day there may be one.
Joe
wbmw,
I understand the esoteric nature involved, but now that the cat is out of the bag, the industry should be kept informed, don't you think?
There were number of references on MSFT web site refering to AMD64 version of windows. But this is still a pre-beta stuff, that goes out only to a small select group of partners. They mentioned beta by mid year.
Joe
yb,
John Dvorak used to be right on the money with most of the stuff he wrote. But on the subject of Mac on Itanium, I think he is completely wrong.
Joe
yb,
On something like a database, I am not sure if it makes sense to move bulk of data to the memory connected to the CPU working on it, at least not the the database server's global cache. But I think what would be the most beneficial is for local memory of the process running on the processor to be allocated from memory connected to the CPU and flagged as local by the OS. Then, all the accesses to this local memory would be able to bypass the all kind of coherency checks, and would be able to be accessed as fast as memory on a single CPU system. And, it would not create any traffic on the HT bus.
Joe
wbmw,
As you should probably know by now, 64-bit compatibility mode has some overhead with 32-bit applications. The OS needs to be able to separate 32-bit and 64-bit code. Windows does this by thunking, as it did with 32-bit Windows running 16-bit apps, and that will incur an even greater penalty than Linux. Most users will probably run their Opteron systems in 32-bit mode, and wait for software to become more robust before they migrate to 64-bits.
There is one obvious advantage to 32 bit apps running under 64 bit OS, which is memory. The system is able to address > 4 GB of memory, and each app will be able to get its max. Currently it is 4 GB for the whole system, 2-3 GB for all apps.
With a 64 bit OS, each app would get the 2-3 GB (and actually, 64 bit version of Windows may increase to much closer to 4 GB, since most of the OS can reside in its own address space). Plus, the disk cache is in OS part of address space, and it doesn't need to take any memory from the apps.
This may not be the strongest selling point today, since today, typical volume server has < 4 GB of memory, but it will become stroger selling point a year or 2 years from now.
Joe
sgolds,
I am under the impression that Alchemy chips are 32 bit MIPS compatible. Is AMD / Alchemny planning on moving it to 64 bit? Is there any need for applications that Alchemy / MIPS is targetting to be 64 bit?
I seem to recall that China is inesting money in a 64 bit MIPs processor. (I wonder if there is any relation between that and the Alchemy orders for chinese education market).
Joe
yb,
P4 was actually running 800 MHz FSB on Canterwood chipset. I have to say I expected a little more from Canterwood.
But I think it may be a good idea to wait for more Canterwood reviews to form a definite opinion about it.
Joe
yb,
I agree that 333 Mhz A64 must have the same IPC as 400 Mhz XP. Plus one more speed grade if the code will use extra registers (I hope Doom III will do that even on 32-bit windows).
Well, Hammer has SSE-2, so if programs such as Doom III use SSE-3, Hammer will have some advantage there over Athlon XP. But the extra registers of Hammer are not available for programs running in 32 bit mode, only 64 bit mode.
Joe
Klaus,
Thanks for the translation.
Joe
yb,
SpecInt is a single thread benchmark that doesn't benefit from additional processors.
Joe
Elmer,
I didn't run this through the translator. Does it say whether these are compiled with a 32 bit or 64 bit compiler?
Joe
Tenchusatsu,
Like I said before, Intel defines the rules of the marketing game. Market segmentation has been defined by Intel for years now, and all AMD can do is follow.
Is that a problem? AMD managed to stay in the game as a follower, and got to the point where it can have its shot at trying to lead.
Although we could argue DDR vs. Rambus, the real test of AMD's ability to lead will be x86-64 (plus other slightly less significant technologies, such as, HyperTransport, glueless multiprocessing, on die memory controller).
Joe
UpNDown,
Thanks for looking it up.
Joe
UpNDown,
The complexity of keeping the cache exclusive seems to be counterproductive for Opterons though, especially when the caches are getting to the 512KB, 1MB and 2MB level.
I don't have any idea which is more complex, but if there is more complexity for exclusive, it would be a question of tradeoff between the complexity and total cached memory. On 256K model you would be leaving 50% of total cache on the table.
So don't take this as a confirmation, but I don't see any way that the crossbar logic will be able to handle exclusive cache when keeping the caches coherent over cHT.
That's probably a good point. It is just a bit over my head. It would be nice if AMD published some white papers on the subject.
Joe
UpNDown,
x1x = 256KB cache (note: inclusive, not exclusive like Athlon)
I have not seen the reference to this. Is that true for all cache sizes and models of Hammer?
Joe
wbmw,
I don't think AMD's model numbers have been conservative since the 2200+, and in the direction that they are going, AMD could end up with a PR nightmare.
Yawn....
Oops, sorry about that. It was completely involuntary.
Joe
Klaus,
There is also 2250 MHz part, Tbred 2800.
Joe
yb,
I think this news may have something to do with the market action:
U.S. Troops Push to Within 19 Miles of Baghdad
http://www.washingtonpost.com/wp-dyn/articles/A10275-2003Apr2.html
The market seems driven primarily by the situation in Iraq.
Joe
keith,
I have seen a quote (I believe from one of the presentations) where AMD hedged on the Q2 profitability.
I think there is an important benchmark before profitability, being cash flow positive, and at minimum, I hope AMD will meet at least that in Q2.
Joe
yb,
Your comparison is valid, but to be fair, Piii made to 1.4 GHz, and Banias (Piii based) a little higher (1.6 GHz?).
Joe
wbmw,
It's because it is nothing more than a marketing gimmick.
I don't think so, because AMD processors are built with Quantispeed Architecture, which guarantees Performance for Today and Tomorrow. These processors are designed to power an Internet experience filled with rich audio, video, animations and 3-D graphics that makes information come alive. Whether on or off the Internet, processors built with Quantispeed Architecture offer high performance and Internet Streaming SIMD Extensions. It also provides AMD's most advanced computing experience for business users as e-commerce, data visualization, streaming audio, video and speech recognition applications become more pervasive.
Joe
Elmer,
Now, this brings up the question, do you think Intel is pricing to eliminate AMD or to maximize profits? I don't want to put words in your mouth but you are probably going to pick something in the middle. Do you think it's possible that Intel does not consider AMD at all when pricing, but simply what they can move and how much they can make?
I commented on this subject on another board. I expected Intel to turn the screws on pricing for some time, but it never materialized, to my surprise, which convinced me that Intel's desire to make some $$$ and move the stock out of teens is now much higher priority than putting pressure on AMD.
I think there is a change in AMD's behavior under Hector. Initially, Hector's hands were tied, since he didn't have the product. But now, he has the product, and he could go for more market share (as Jerry would), but Hector seems to be going for the revenue (to minimize loss), rather than market share. Not that he has a choice. It will be interesting to see what happens when AMD has a choice between more profit or more market share.
Joe
Elmer,
Like I said before, you demonstrated nothing but AMD's product offerings. That's not binsplits. You showed nothing about how many AMD sold at each bin and more importantly you showed nothing about how many they could produce at each speed. You could just say that AMD's offerings increased faster than Intel's and leave it at that.
I agree, but my claim goes a little beyond that. The one place where I differ is that I know that there is a real availability behind what's offered. This is a useful site I refer to once in a while:
http://www.insightcomponents.com/ic/apps/tinas/index.php?page=13
As far as what's being sold, I agree, I have no idea. The fact that AMD is downbinning parts illustrates that the demand at the lower bins (lower prices) exceeds what's coming out of the fab. It probably illustrates that the prices have not been set correctly WRT the bin splits, but hopefully, they have been set correctly WRT maximizing revenue.
Joe
wbmw,
Model numbers (while not perfect) reflect real performance better than raw frequency.
Joe
Elmer,
Please... This demonstrates nothing when you don't know anything about volumes at a given speed. Even then it would only reflect the market demand, not the production capability, which was the original claim.
I conceded that I don't know the sales volumes at individual speed grades. I was talking availability. Availability is there for all speed grades. Since AMD is now downbinning parts, it means that from manufacturing point of view, AMD can make more, faster parts than it is able to sell.
This means that AMD manufacturing has gone from a struggle not to lose sight of rapidly advancing Intel, to the point where they are cruising now, almost hitting the rear bumper of Intel's car.
I can't comment on where Intel's sweetspot in manufacturing is or how it has changed over the last few months but I wouldn't assume AMD's has moved up more than Intel's if I were you.
I go by data that's out there, and I see that AMD went from being dead in the water to almost parity in what's available for sale. I don't know what Intel could make available for sale, only what Intel does make available.
Joe
Elmer,
Please read his original statement. He said "relative to Intel".
And I meant it. This is what I said, in somewhat torchured English:
"The production bin splits have improved significantly relative to Q3 and Q4, relative to Intel. The availability is good. The pricing for AMD processors is good at individual bins."
I was talking about recent quarter (Q1) relative to Q3 and Q4, and comparing this improvement to Intel improvement. To express this mathematically, I would say:
Amd Q1 Intel Q1
---------- > ------------
AMD Q3toQ4 Intel Q1toQ4
Elmer,
re: The production bin splits have improved significantly relative to Q3 and Q4, relative to Intel
Where is the data to back up this claim?
May 06: Intel 2.53, AMD 2.1 Intel/AMD +20.5% (paper)
Jun 10: Intel 2.53, AMD 2.2 Intel/AMD +15.0% (paper)
Aug 26: Intel 2.80, AMD 2.6 Intel/AMD + 7.0% (paper)
Oct 01: Intel 2.80, AMD 2.8 Intel/AMD + 0.0% (paper)
Nov 14: Intel 3.06, AMD 2.8 Intel/AMD + 9.3% (paper)
Feb 10: Intel 3.06, AMD 3.0 Intel/AMD + 2.0% (paper)
--------------------------------------------------------
May 06: Intel 2.53, AMD 2.1 Intel/AMD +20.5% (real)
Jun 10: Intel 2.53, AMD 2.1 Intel/AMD +20.5% (real)
Aug 26: Intel 2.80, AMD 2.2 Intel/AMD +27.3% (real)
Oct 01: Intel 2.80, AMD 2.4 Intel/AMD +16.7% (real)
Nov 14: Intel 3.06, AMD 2.4 Intel/AMD +27.3% (real)
Jan 01: Intel 3.06, AMD 2.6 Intel/AMD +17.7% (real)
Feb 10: Intel 3.06, AMD 2.8 Intel/AMD + 9.3% (real)
Mar 01: Intel 3.06, AMD 3.0 Intel/AMD + 2.0% (real)
yb,
AMD follows Intel, puts CPU prices up...
I don't think AMD shuold miss badly this time.
The production bin splits have improved significantly relative to Q3 and Q4, relative to Intel. The availability is good. The pricing for AMD processors is good at individual bins.
What we don't know is the unit volume and sales bin splits (therefore we don't know anything <g>), but this quarter, the ball is the court of the sales force, the manufacturing side has performed well.
Joe
Edit: You pretty much said the same thing in your follow up post.
wbmw,
The 374 square millimeter "die" size is a bit smaller than the 421 square millimeter size of Itanium 2
thanks for the info.
Joe
chipguy,
Do you know the die size of Madison (or rumors).
Joe
wbmw,
Businesses don't buy servers to run with beta software. Intel learned this lesson already with Itanium.
Agreed. But I was commenting on possibility of General Availability Windows AMD64 Server prior to Longhorn. I think the probability is better than 50% that there will be such a thing.
Joe
wbmw,
It will be much longer than that. Unless you are aware of additional Opteron support, the only thing I have seen are Newisys DP reference designs. Those are hardly systems that will infringe on Itanium 2, since they will be aimed at mainstream segments, not the high end enterprise.
I have seen some pictures of 4-way Opteron reference system from Newisys, but systems like that are higher end, and I don't think Opteron will penetrate these until it has proven itself in 2-way systems.
Joe
wbmw,
That may not be far from the truth. Microsoft's first x86-64 release may not be until Longhorn, scheduled for release in late 2004, which will probably slip until 2005.
Longhorn will support x86-64 from the start, but there is a beta of x86-64 version of the .Net Server 2003. If and when it will be released is unclear.
Joe
yb,
I wonder why AMD still can't make a quad-pumped bus.
Because it doesn't make sense for AMD right now (it makes sense for Intel though).
K7s have the EV6 bus, and there is no point in changing it now. Hammer doesn't really have a FSB, only memory bus and HT. Memory bus depends on memory, and if there is a quad pumped memory, Hammer can support it. On HT, it has a plentiful bandwidth, and a roadmap to higher bandwidth (per pin). Also, bus can be widened from 16 to 32 bits using the current spec to achieve higher bandwidth (if needed).
Joe
Dan,
I think the E8870 was targeting only Itanium 2, or specifically June 2001 ship date. Then, during the announcement of Itanium 2 they said that September 2001 would be the ship date, and there was further slippage.
Apparently, systems are ready now, so now we will finally see how much real demand there is for Itanium. I think the clock really starts from beginning of 2003 for systems, and April 2003 for Windows OS.
BTW, E8870 delays erased at least 6 months of lead of Itanium 2 (compared to Opteron). So Itanium starts basically with about 4 to 5 month lead in availability of hardware (assuming all is well with Opteron launch).
Joe
wbmw,
If anything, it makes a good case for Itanium architecture, which is aiming at bringing scalability and robustness to enterprise level computing. That's why Microsoft is eagerly on board that ship.
Eagerly on board that ship? (That's a huge opening but I am going to pass).
The first General Availability version of Windows for Itanium will ship next month, which is about 23 months since Itanium was released.
When it comes to Opteron, I would be seriously dissapointed with this level of eagerness, since it would mean release of Windows AMD64 in March 2005.
Joe