Register for free to join our community of investors and share your ideas. You will also get access to streaming quotes, interactive charts, trades, portfolio, live options flow and more tools.
Register for free to join our community of investors and share your ideas. You will also get access to streaming quotes, interactive charts, trades, portfolio, live options flow and more tools.
Haddock,
After all, a major component of .NET is the byte-code-to-native compiler (I think it's install-time, not JIT), which is totally different between IA64 and x86-64. MS have had more time to work on the IA64 version.
True, and I am as surprised as you are why it has not been done yet. If you start from an assumption that the .NET Framework is already totally 64 bit ready, and all you need is the native code compiler, than the projects of AMD64 and IA64 can be seen as totally independent. But the assumption that the .NET framework is 64 bit ready may not be correct.
I seem to recall seeing a posting that a new version of .NET Framework will be released in order to make 64 bit happen, and I don't know if it will have only the native code compilers (runtimes) or some additional fixes.
Any rumours on the relative timings? As you so elequently explain it's more critical for IA64, but it would be nice for x86-64 too.
I don't know about timing, but I heard rumors that AMD64 is way easier and straight forward than IA64 in this regard.
Joe
wbmw,
I am not Nostradomus, but I can see you touting a >3.4% boost in SPECint with the newer Opteron model
I was actually thinking a little down the road with Athlon64 with dual channel DDR-400 CL 2.0 unbuffered.
even though part of the boost will come from added bandwidth, while my 1-3% figure is based solely on the part of the boost for which I feel that latency will be responsible.
Which is why I offered you to revise your prediction.
Joe
wbmw,
How are you going to prove whether bandwidth or latency is affecting a given program, if you transition from DDR333 memory to DDR400? Are you just talking about Cachemem, because my response was meant to break down actual real world performance?
I was thinking more in terms of real world performance, but since there are million apps with results all over the place, SPEC_int could be a good proxy.
Joe
yb,
Latency was always a week point of P4, so P4 is especially good in all data-streaming applications, like almost all audio-video processing, where latency penalty is close to zero.
This was the case when Rambus memory was needed for P4 to get good performance. Ever since DDR, and especially dual channel DDR, Athlon and P4 are on the same level playing field, except P4 has a huge bandwidth advantage.
Joe
yb,
Tyan released an Opteron board. It has embedded graphics. Does it have the AGP tunnel? I'm not sure that graphic performance will be good. Probably not.
This is supposed to be a really cheap server board. The fact that it is a server board means you don't need AGP. The graphics chip is just a separate chip that most likely is on PCI.
Well, it would be nice if the price of this mobo dropped to something like $100 to $150 so that you could really make cheap servers out of it.
Joe
Petz,
Not sure if MS is extending AMD64 support to Visual Basic
The last version of old style Visual Basic was 6.0, and that will always stay in 32 bit.
The new style Visual Basic is part of the .NET Framework fanily, meaning that it doesn't compile to machine code but to an intermediate form, which then runs on the machine specific Common Language Runtime. So you basincally don't do anything machine specific in you VB.Net code. Ints are 32 bit, longs are 64 bit, and you can use either one (of course 64 bit long on a 32 bit processor will have some overhead).
So anyway, you can write your code without thinking whether the CPU is 32 or 64 bit, and when MSFT delivers 64 bit capable version of the .NET Framwork, and AMD64 CLR, your program is automatically 64 bit, without you doing any work whatsoever.
It would be nice if the 64 bit .NET Framework was released together with the AMD64 version of Windows. That way, we could get instantaneously a bunch of apps running in 64 bit long mode.
But, either way is fine with AMD, IMO. If there is no 64 bit version of the .NET Frameowrk, it really cripples Itanium, and leaves Hammer unaffected, because Itanium has to run the .NET framework apps in 32 bit turtle speed mode, while they fly on AMD64 processors in 32 bit mode.
BTW, the lack of .NET for IA64 is one of the biggest barriers to entry for Itanium since it prevents you from running Itanium machine as a web server (Windows ASP.NET is the most common way to write web apps), you can't deploy any state of the art custom windows apps, since just about everybody in the windows world is using VB.NET, and C# these days for custom app development. There are bunch of other languages that are being ported to .NET, including Java language syntax. None of these apps run acceptably on Itanium currently, while they run fine on Opteron.
You want multi-tier app? You would write the middle tier in .NET (in the windows world) and have a freedom to decide where to deploy the middle tier. Choosing Itanium for database server denies you that machine as a valid choice. You need to get another machine, to get any kind of performance out of your middle tier app.
And as you may know, there is a current trend to consolidate servers in various ways, to reduce the number of servers to manage. It's a no go with Itanium, because of its inflexibility and being limited to native 64bit apps.
Joe
yb,
Good point. Intel was ahead of AMD in power comsumption ever since Athlon was launched. .13u Athlon XPs, AMD evened the score and is sligtly ahead, but the difference is starting to grow with .13u Hammer, and is going to become significant, if the rumors of > 100W TDP Prescott are true.
I wonder about .09u Hammer vs. .13u Hammer in power consumption.
Joe
wbmw,
Actually, CAS latency is incurred on every single transaction. Period.
That's what I was trying to get across, so it looks like we are in an agreement.
Regarding your other comments, however, I think you are confused about the nature of memory accesses. For light traffic, latency may be more of an issue on the memory side, but it also means that there are fewer memory accesses, and the processor can work on other data while it's waiting for memory. Under heavier traffic conditions, latency is hidden by clever pipelining, and bandwidth becomes more important.
I disagree. The difference is in whether the application is of a streaming type, where you get few big chunks of contiguous data, or if many random accesses of small chunks of memory.
The first one streaming one is what benefits from good bandwidth primarily, second one is where latency matters. And of course most apps are somewhere between those 2 extremes.
But it doesn't really matters much these days in AMD vs. Intel competition, since both Intel and AMD are more or less on the same technology. It made more difference in the old Rambus vs. SDR/DDR, where Rambus generally had better bandwith but worse latency. The claim I am making is that Athlon64 performance will improve with improved CL, but the same avenue is open to Intel as well (if they have not already taken advantage of it).
Moving from DDR333 to DDR400, however, will result in more of a bandwidth benefit than a latency benefit. The actual latency benefit may only be on the order of 2%, give or take 1%, on the total performance side.
All right, we have a prediction. My prediction was that a dual channel Athlon64 with DDR-400 with CL of 2 will have a better performance than dual channel Opteron with registered DDR with CL of 2.5 by 3.4% or more, your prediction is 1 to 3%.
We can revisit the issue when we have some data for comparison.
Just to make it fair, I want to point out that we don't know on that Dell systems (I refered to in my previous post) what the memory timings were. They could be the same, or even worse for DDR-400, but my assumption (for the performance increase) is an improvement in CAS 2.5 to 2.0, and faster memory (DDR-333 to DDR-400) so if you want to revisit your prediction, you can go ahead.
I think this improvement of Athlon64 will be higher in the reviews on hardware BBSs, which already use CL 2.0 for P4, but they could not use anything better than CL 2.5 registered on Opteron. In spec submissions, it is not clear if AMD will use CL 2.0, or more conservative timing. CL 2.0 is used for Athlon XP 3200 entry, (which I haven't noticed until now):
http://www.spec.org/osg/cpu2000/results/res2003q2/cpu2000-20030505-02154.html
Joe
wbmw,
Think of pipelining in a CPU. Let's say you send 10 instructions, each with 4 clock cycles of latency. Let's also say that the CPU is fully pipelined so that it can accept a new instruction on every cycle. Since each instruction has 4 cycles latency, your argument would try to prove that it would take 40 cycles to execute them all.
That is why I said every "new" memory access, or something like a transaction. You can request say 64 bytes at some address, and you suffer the CAS latency penalty on the acces of the first byte (or actually first 16 bytes, but the subsequent 48 bytes don't suffer this penalty.
But you must pay the CAS latency on every new transaction, which is something you seem to disputing, if my understanding of your posts is correct.
The short version is that the memory controller does not know which accesses are important to the processor, but it does know whether the memory addresses will cause a page miss, whether they will cause an address conflict, whether a particular DRAM is being refreshed, whether there is a posted write that must complete before a new read enters the queues
That makes sense. It may be an area of potential improvement in the future, with more intelligent memory controllers.
Joe
gb,
My question is whether anyone has any evidence that this has been done for other logic blocks and functional units as was speculated.
It was just that, speculation, with no data to confirm it. L2 is the biggest block on the die, percentage-wise, so has most benefits. Other parts, such as memory interface and HT interface are smaller, but it may be possible to use the same concept here.
Joe
Paul,
the 3.4% estimate would be quite low, since the change would be from DDR333/ECC to DDR400/non-ECC, not from DDR333/non-ECC to DDR400/ECC.
I think so too, but I wanted to pick a conservative number. The difference is that Athlon64's DDR400 will have a small benefit of not being ECC, and the Opteron's DDR333 has a penalty of being Registered, so on both end there is a potential for improvement. Also, of SPEC_fp, which is more memory bandwidth dependent, the difference between those 2 memory types amounts to 9%.
Joe
wbmw,
Some things in your post don't make any sense to me such as:
Re: but 10 CPU cycle reduction on every cache miss is going to have measurable impact on performance.
Not really, because the latency benefit does not apply to every memory access.
I would think that every single new memory access benefits by this amount.
Memory transactions are buffered in the memory controller, and they can be serviced out of order.
This doesn't make sense to me either. How would the memory controller know that the requests it is putting to the end of the queue is not the critical word that the processor needs for the highest priority process?
The mistake you are making is that you are using best case latency, rather than realistic latency measured by a benchmark application. Xbitlabs uses Cachemem to rate the Opteron 144 processor at 67ns memory latency. Best case latency may be lower, but that's irrelevant, since memory accesses do not all behave like the best case. Upgrading to DDR400 may improve the best case latency by 5ns, but measured latency may be less, and actual performance benefit will be much less.
I don't understand this at all. I have never said what the latency would be, and I explicitly said that I don't know what the 45ns means. All I said that DDR-400 CAS 2.0 will improve latency by 5 ns vs. DDR-333. I think this is across the board improvement, regardless of page hit or miss, improvement of say 50 to 45ns in one case, 100 to 95ns in another case.
We've seen it multiple times before. Bandwidth affects performance far better than a few cycles of saved latency.
This depends on application. Some are more sensitive to bandwidth, some are to latency. Bandwith itself is a latency reducer on requests of multiple bytes of data. And, on top of the 5ns, we get 20% latency reduction from the data transfer portion of the access due to DDR-400 vs. DDR-333.
Athlon 64 won't differ too much from Opteron 1xx.
Depends on your definition of much. But let's see from SPEC site:
P4 2.8 DDR-400 ECC: 1091
P4 2.8 DDR-333 non-ECC: 1055
Difference: 3.4%
Well, not much in a way of memory timing is documented. So my assumption is that it will be this much or more.
Another thing about much is that much is not needed to gain rankings on SPEC_int, and the difference between being #1 and #2 or #3 are very small.
Let's go through an exercise. Opteron 1.8 has a score of 1170. Opteron 2.0 with scaling of approximately 85% will have performance of approximately 1280. Now when I add 3.4%, which would be the ballpark improvement for DDR-400 vs. DDR-333 improvement which Athlon64 will most likely get, I get 1324. Guess what the current #1 entry? 1322.
So something that is not much can still mean a lot.
It can't be fused off like cache associativity. I'm sure you can try to think of other examples
ok, ok, I said twice that this was just speculation on my part.
Joe
subzero,
Just like Opterons need a complete rewrite of every aspect of the software to make use of the 40/48/64 bit addressing.
What rewrite?
And how many DIMMs does it take to reach memory capacity of 2^40?
Joe
Andy,
I don't think Elmer has any problems with Algebra, but his predictions are basically that AMD performance will be lower than AMD investors predict, and Intel performance will be higher than AMD investors predict.
Which is basically a prediction that you can make with almost 100% accuracy, since if one invests in a company (such as AMD), he is optimistic about its future and less optimistic about the future of company's competition.
Joe
Elmer,
If your prediction is that the outcome will be less than what AMD investors predict, that has the probability of this prediction being correct that is about equal to the probability that the sun will rise tomorrow.
I expected you to go a little bit on the limb, especially in qualifying what you meant by:
Demand constrained, probably but it's a good think too because I don't think AMD can make them in any volume.
Statements such as: Yields with a constraint of a bin target being lower than yields without the constraint is in the category of Sun rising from the east tomorrow.
As is the probability that the first yields on new generation of process having lower yields than a yield on a mature previous generation process.
Or yields likely being lower with higher number of metal layers as it would have been with lower number of metal layers.
Joe
chipguy,
Once again you miss the point.
I think you are right, I did miss your point.
If Intel can turn out Willamettes in huge quantities for mass consumer markets then it can certainly manufacture Madison, which has less die area unprotected by redundancy than Willamette, with incredible ease.
While it remains to be seen, it is within a realm of possibilities. I always thought cache on the CPU is one area where Intel can really squeeze AMD. Why do you think Intel is not doing more of it, in markets where it matters most (volume-wise), the desktop market?
Joe
Elmer,
Do you remember the predictions I made for Hammer starting a couple of years ago on SI? Did I nail it perfectly or what?
You made a lot of predictions, some of which were very good, some less so. You were right about about Hammer not taking lead in SPECint. Basically, Hammer is underperforming AMD investor expectations, but overperforming your expectations. But I give you credit for sticking to the SPECint prediction, and it turned out to be true.
We can throw up tables and projections until the cows come home. I've given my opinion and you and others have given yours.
Can you summarize what your opinion is? (just to see how it differs from the prediction I came up with). You can fill in your data in my table.
And as always, don't forget that when AMD sells X number of CPUs, doesn't mean that this is the maximum number of CPUs that AMD could make, just as when Intel sells Y CPUs doesn't mean this is the maximum number of CPUs AMD Intel could make. (See Merced and McKinley).
Joe
Elmer,
constrained, probably but it's a good think too because I don't think AMD can make them in any volume.
"Any volume" needs to be qualified. What do you mean? The entire market of server CPUs is 1M per quarter. It would be a dream come true if AMD could gain 10% of that in its second quarter of availability. IMO, an optimistic scenario is for AMD to gain 5% this quarter, which is 50,000 units, and that is a piece of cake, if the demand is there. Even at very low yield of 50 good die per wafer, you need 1000 wafers in total, or 77 wafers per week to meet this demand, which is about 1.5% of possible wafers per week output.
The long delays and absence of A64 tells me AMD can't produce them but I know you guys would prefer to attribute these things to lack of boards, no Win64, Intel threatening OEMs etc.
I never claimed A64 delay to be because of any of this. It seems clear that it is delayed because the clock speeds AMD was getting were not competitive.
Itanium on the other hand was never intended to be the high volume product and I'd be the first to say that Intel couldn't meet the desktop demand with a 374mm2 die.
AMD couldn't meet the desktop demand either. AMD could meat the entire server demand (If Intel decided to stop selling server chips) and still have capacity left over for desktop.
The way I see the ramp of K8 this:
Wafer starts 1000s CPUs out 1000s
Barton K8-130 K8-90 Barton K8-130 K8-90
Q203 57 3 0 8,550 0 0
Q303 48 12 0 8,400 150 0
Q403 39 21 0 7,200 600 0
Q104 29 27 4 5,850 1,050 0
Q204 12 12 36 4,350 1,350 500
Q304 0 0 60 1,800 600 4,500
Q404 0 0 60 0 0 7,500
Barton 150
K8-130 50
K8-90 125
Petz,
Opterons / Athlon64 can potentiall sell chips with defects in other places outside of L2 such as dram link, aHT links, and, in case of substantial or multiple defects in L2. There will be CPUs with a single memory channel, with a single aHT channel and with 256K L2, which all can be salvaged Opterons.
Joe
Oops, sorry wbmw, I thought I was talking to Windsock. Which itself is an insult (to be confused with Winsock), for which I apologise as well.
Joe
Elmer,
This brings us to the question of AMD's low output of Opterons. These guys aren't flakes and we must assume they have their particulate defect density under control, so how do we explain the low output?
Where did you get the idea of low output of Opteron? If you want an Opteron, you cen get one, or 2,000, or 10,000. If I wanted to pursue this argument, I could ask about reasons for low output of Itanium, but obviously both of us know that both Opteron and Itanium are demand constrained not supply constrained.
And, like Itanium, Opteron has ways to improve yield by either redundancy, or by disabling portion of cache, and selling it as a lower end product later.
I think both Itanium and Opteron sales will improve this fall, since there will be a lot of newsprint, electrons on CRTs used to discuss endlesly benefits of 64 bit, since it is going to enter mainstream market from both AMD and Apple.
Since there will be a lot of talk about using 64 bit processors for writing e-mails and playing solitaire, I think a lot of IT executives start to think more seriously about 64 bit in their server rooms.
Joe
chipguy,
The Madison die is over half L3 by area and the L3 is protected from defect failure by sub-block redundancy, a technique superior and more robust than the row and column redundancy techniques used in uPs (and memory chips) prior to I2. The non-L3 portion of the Madison is about 160 mm2. That is less than the non-L2 portion of the Willamette
P4
LOL. You are comparing .13u itanium core with .18u Willamette core. How about comparing it with .13u cores? They are all smaller, including L2. Opteron minus memory and DRAM controller is also smaller even with its extremely large L2 than just the core portion of Madison. And Madison, to get any kind of performance, needs 6 MB, some 200mm^2 of L3, more die size than an entire second current generation CPU.
And what you get for all this? Some 5% gain in SPEC_int, higher power consumption, and a loss of compatibility with all the existing software. That's a lousy deal, if you asked me. And the potential customers have voted with their wallets by staying away from Itanium in droves.
I think Itanium just has to offer substantially more performance, and needs to lower the price to x86 level to compensate for loss of compatibility, in order to become an attractive choice.
Joe
chipguy,
And why are these RISC vendors dropping their proprietary line? Because they are adopting IPF. Compaq, HP, and SGI decided they couldn't afford to keep their Alpha, PA, and MIPS competitive in the face of Intel's IPF family.
I won't deny that IPF has had a good bark that scared a lot of people away, but so far no bite. But what has been a bigger problem for the RISC vendors than the bark of IPF is the fact that x86 has been eating their lunch.
RISC based systems represented nearly a $20B market last year in the teeth of a terrible IT spending slowdown. The RISC market will continue to disappear in the coming years but that will be due to IPF taking away market share.
There is an inconsistency in that sentence in the word "continue". The only thing that can "continue" is x86 taking share from RISC. Itanium can "start", but not "continue", since Itanium has not done anything in 2 years.
Joe
Windsock,
Re: It [high end market] is large, but shrinking fast
Fast is a pretty relative term. I think the market will remain huge for at least the rest of this decade, and some businesses will continue to demand high end solutions, even after that.
How did my comment on RISC got substituted with [high end market] in your quote? My comment was about RISC vs. x86. x86 is entering high end, and while doing so, lowering the price somewhat by offering standard hardware and software.
This has more to do with Sun's underperforming architecture, rather than a global problem with RISC itself.
Underperforming compared to what? It is underperforming compared to x86. You may have missed that, but over last year or so x86 chips surpassed RISC chips in performance.
You should be glad, since Intel's x86 Xeons are the primary beneficiary of this trend. I find it strage that Intel is trying to derail this momentum by distracting the market with Itanium. What Intel risk is that people will decide that they want both, 64 bit and maintain the current 32 bit standard, and will decide to go with Opteron.
Joe
Elmer,
Come on John, you know what I meant. Athlon runs at 2.2Ghz and Opteron with an Athlon core can't get above 1.8GHz.
That's what the biggest disappointment has been so far. And, it seems that AthlonXP is limited to that speed mainly by power issues. Opteron package, and the power savings features are as good to much better than Tbred (which means it should not be heat limited), the process technology should expand the envelop further, yet we see clock speeds that are 40% below AthlonXP.
Based on some posts on Aces, this is mainly due to layout, and the next revision should help. We will see in 2 weeks, I guess, when 2 GHz is supposed to get released.
Joe
Petz,
Isn't Intel making Itaniums on 12" wafers? Seems like the waste on the edges would be, proportionaltely, a lot less.
It probably makes more sense to reserve 12" wafers for high volume chips, and use 8" for low volume. How many wafers do you think it took to for the entire 2002 production of Itanium?
Well, I am assuming good bin splits, but as it turns out, 1 GHz was very difficult to achieve on .18u, and as we know it turns out that some Itanium 2 chips sold as 1 GHz can only run reliably at 800 MHz. So maybe Intel did have to run a lot of wafers and then had to cherry-pick few that could reach the magic 1 GHz clock speed.
Joe
wbmw,
The RISC market is quite large, revenue wise, and not likely to disappear overnight. I believe Intel is using this market as a foothold to establish a high end presence and reputation, before they bring Itanium 2 down to mainstream markets
It is large, but shrinking fast, no thanks to Itanium, but because of x86.
much like IBM is trying to do with Power4 (except Intel will get there faster).
IBM is already established in that market, so it has much more of a foothold to start from. Itanium has not yet established a foothold anywhere, and the market Intel chose to get started is drying up fast.
The way to look at it, a lot of customers that are Sun shops may continue to buy Sun, at slower pace. But I don't know how many greenfield instalations there will be in a shrinking market.
BTW, the latest Sun results shows their hardware revenue dropping 20% from a year ago.
Joe
Windsock,
Time after time, the Extended Memory addressing ability of Xeon processors is ignored. For some time Xeon systems with this capability have been available from Tier 1 server vendors.
Yup, very capable stuff, and a lot of money has to go into it, from turning straight forward programming into a kludge like this. But it works.
Don't you think there would be value in preserving all this investment in software, rather than throwing it out the window, as choosing Itanium forces you to do?
BTW, you want Extended Memory addressing? Opteron has that too, but it has a flat 64 bit addressing as well.
Joe
chipguy,
You not only don't have the picture but you are looking in the completely wrong direction. Intel developed the IPF family to compete in the high end, high value system market against RISC processors, not to replace x86 (duh).
That very well may have been the plan when Itanium was conceived almost a decade ago. But as Barusa correctly pointed out, in this time frame, x86 has been eating the lunch of the RISC chips vendors, and these vendors are one by one dropping out of the market.
End, BTW, x86 has been outperforming the RISC chips lately as well.
The only thing lacking in x86 chips was 64 bit addressing, and now, with Opteron, there is a complete x86 chips to compete with RISC in all segments, and provide compatibility for the code base developed over last 2 decades. Intel could have done the same thing.
Itanium is basically pointless, aimed at the market that is disappearing.
Joe
Jerry R,
What if the most important consideration is 64 bit performance and not compatibility? What if they want the best performing 64 bit machine, and 32 bit x86 compatibility is lower down the list of consideration factors?
- 286 buyers bought 286 primarily because of compatibility with 8086
- 386 buyers primarily bought 386 because of compatibility with 286 and 8086
- 486, Pentium 1, 2, 3, 4, K5, K6, K7, Cyrix, Via, Centaur, Transmeta, they all bought these CPUs primarily because of compatibility with existing code.
Do you think Itanium 2 performance is so compelling that this cycle will be broken? Hardly, as far as I can tell. less than 10% lead on SpecInt, and even that is mainly because of > 300 mm^2 die and 6MB of L2, not because of the miraculous Itanium core.
Was 286 the highest performance processor ont there? Not really, but it was picked because it was compatible. You get the picture.
Joe
Keith,
Good find and a good win for AMD.
Joe
sgolds,
While I am not sure if it is a fact that Windows is not aware of applications or processes, only threads, I think it is. I think the way it works is that Application can consist of one or more processes, and a process consists of one or more threads. There is some thread management by the process or application of its own threads, if there are dependencies.
I don't think application can in any way manage how threads are scheduled regarding iHT, I don't think it is possible, other than having a some flag for the potential future OS telling it that the thread is ok with or doesn't want to operate in HyperThreading mode. All the rest would be up to the OS.
Joe
wbmw,
20% of what, Joe? Given that a cycle of DDR400 takes 5ns, it looks like the CAS latency reduction saves 2.5ns off of best case latency. And like I said, best case is when you have a page hit, no conflicts, and a single read to memory. That's not how real programs behave.
First of all, I have to admit that I have holes in the knowledge of this subject RAM, but one thing I do know that going from DDR-333 CAS 2.5 to DDR-400 CAS 2.0 gives you savings of 5 ns. That's not peanuts, given the fact that 5 ns is 10 CPU cycles for 2 GHz CPU.
I don't know how it corelates with the benchmark results, or with 45ns that yb claimed, but 10 CPU cycle reduction on every cache miss is going to have measurable impact on performance.
In this case, your speculation is wrong. Athlon 64 and Opteron use the same die. There is also no bus loading issues, since Opteron uses Hypertransport. That's what gave it a benefit in multiprocessing, but you can't expect an added benefit the other way around, as you would by going from a dual processor to single processor Intel platform.
Well, Celeron and Pentium use the same die, and Intel disables part of it. Elmer may know how exactly it is done, possibly by the die having more internal output pins than the number routed to the package, and some can be either connected or not connected. Another possibility is BIOS.
During a memory access, the CPU needs to know if it can just retrieve data from memory, or if it needs to check with other CPUs in the system, if the memory is in use. On a single CPU Athlon64 machine, this step can be bypassed. That's what I was talking about, and as I said, it is only a speculation on my part.
Joe
chipguy,
FWIW, in its DEC 2002 report on Intel's Processor business, MDR (the guys who publish Microprocessor Report) predicted 2003 sales of IPF processors of ~150k units and 2004 sales at ~375k.
Reminds me of predictions of RAMBUS sales, which always had the hockey stick shape. BTW, what was their prediction of last year? Was it met?
Joe
chipguy,
re: Well, Cas 2.5 to 2.0 is 20% a reduction.
ROFL. Why don'y you start parking your car 20% closer to
the end of your driveway and see how much that speeds up
your commute to work.
Not if your goal is to hit the pedestrians crossing your driveway, which would be a closer to the latency analogy. Also, at 400 MHz vs. 333 MHz, each cycle takes less time.
Joe
wbmw,
I suspect you are considering the move from DDR333 to DDR400 as offering a significant latency improvement, but what you fail to realize is that the latency reduction comes from a single timing, and it does not decrease the overall latency by that much.
Well, Cas 2.5 to 2.0 is 20% a reduction. And Athlon64 is a single-processor processor, so there may be some internal optimization / short circuiting of features that allow multi-processing, but this part is just a speculation on my part.
Joe
sgolds,
I think the power consumption numbers of Prescott will help Opteron penetration into server market, expecially in high density environments. IBM picking Opteron for HPC makes a lot of sense in retrospect. In full featured servers, power consumption matters less, but it can still make a difference. Of course, it is very important in blades.
I think Intel may want to introduce Banias into markets other than notebook, to compensate for the weak spot of Prescott.
Joe
sgolds,
I think the kind of thread synchronization you are talking about can happens in a single app that is multithreaded. But since it is a single app, it is all either HT favorable or not, so there would be no contention, or stall.
With regular multiprocessing, you have just a number of threads of differet procesess that are running without much, if any interdependance, so there is no problem in scheduling then.
Anyway, that would be the theory if this feature was supported.
Joe
Haddock,
IT looks like a minor variation on gang scheduling to me (you gang schedule the task with a dummy null task). Well known technology AFAIKS.
Thanks for confirming my guess at how something like this could be implemented.
Joe
Petz,
In most modern multitasking OSs, at any given time, there may be a number of active threads, but on a regular CPU, only has a control of the CPU at any given time, and the OS will every few miliseconds kick the thread out, and give another thread a chance. So theoretically, when it is time for one of the non-HT apps to get a chance, The OS may make it wait until it is time to evict another active thread from the second virtual CPU, and for the time the non-HT thread is running in its virtual CPU, the OS could keep the second virtual CPU idle.
So theoretically, something like this can probably be implemented, but it looks like a mess to me. I wonder if MSFT will touch this. It doesn't look at all trivial to implement this. It will need some work on the kernel, so IMO, support of something like this may not be there until a more major release of the OS, such as LongHorn (if ever).
Joe