Register for free to join our community of investors and share your ideas. You will also get access to streaming quotes, interactive charts, trades, portfolio, live options flow and more tools.
Register for free to join our community of investors and share your ideas. You will also get access to streaming quotes, interactive charts, trades, portfolio, live options flow and more tools.
Chipguy:
Once you again you got it completely backward. Intel has
used fully associative cache in its StrongARM and XScale
processors until 90 nm when it decided to switch to set
associative design for area and performance reasons.
Wrong again! See:
http://www.intel.com/design/network/papers/25286901.pdf
Page 5: Data and Instruction caches are 32KB 32 way set associative. The 32 entry TLBs for each are FA. For the 180nm node. Ditto for StrongARM. I guess You just show how little you read (or comprehend when you try to read). The rest of your comments are wrong too!
Pete
Wbmw:
Intel implemented the trace cache because of the long 28 stage pipeline would have made branch mispredicts very costly. They had to try something, else the pipeline would be stalled too much. Then they thought they could get away with less decode power, which didn't work out to well. Ditto with no barrel shifter, etc. caused even more problems. They had to strip these out to make it manufacturable. Look at what the 39 stage of Prescott would have done without a trace cache. x86 code has a branch every 5 instructions or so. That is why many benchmarks for the P4 became mostly data streamed through little loops. Those were quite predictable branches.
You could also look at the dropping of netburst. You think Intel would have done that, if they were still doubling clock every year or two? Or AMD going to EV6 for Athlon (wasn't allowed to use QDR FSB of Intel anymore) or P2P for K8 (multiple FSBs not working out as well as hoped)?
The linked list can be done in three cycles because the data to be written to the other two lines (up and down) can be read on the CAM succeeds cycle. The only end conditions are when the line is already at the head (up link is zero), which you skip the list update altogether and when the line is at the tail (down link is zero), whcih you only need to copy zero to the up link record and update the tail link with the up link. The second cycle updates both up and down linked lines with pointers to their opposites effectively removing the successful line. The third cycle updates the down link of the hit line with the old head link (stored from the first read), the up link becomes zero, the head link is updated with the stored hit link and the up link of the old head link is updated with the hit link.
Given that you can do multiple things at once to different areas of memory, only one operation per memory area per cycle can be done. Thus it takes three cycles to do all of the updates. Yes you can simplify the accesses and the amount of needed logic by taking longer, but complexity is done, where you can save cycles of latency (just look at carry forwards during 64 bit adds and subtracts, they use more logic to decrease the latency of those operations and another is the barrel shifter is much more logic than a shift register, but it is valuable enough to keep for the large reduction in latency). Just think about it and remember the above rule.
Even so, taking 4 or 5 cycles wouldn't be much of a problem as the current L2 access latency is 9-12 cycles for K8, IIRC.
Pete
PS, some state that LRU (or the companion, LFU) is not the optimal replacement strategy for some loads. Which is the best over the programs and enviroments run today, I don't know. BUt I can see a case or two where a different strategy would do better. I just don't know, if in general, LRU is the best overall that can be reliably implemented.
Wbmw:
That is good when you have one FSB. However when you have more than one, it has to be broadcast to all FSBs. That severely limits performance on multiFSB systems. Also cores add load to the FSB, so as the cores per socket goes up, the number of sockets allowed per FSB drops. At quad core, the bus becomes a P2P anyway. SRQ on K8 eliminates any load for multiple cores on same die. Yes I know that Intel could add bus arbitration and buffering to reduce this, but they don't ship a dual core with such right now. AMD has for quite some time. And it is extensible to more than 4 cores. Yes some design tweaks need to be made, but 1->2 is the hard part and beyond is just refinement.
Star has many leaves, but each leaf connects to the hub directly (multiplexing going in is ok (called concentration in older comm parlance)). Some leaves are CPUs, some are memory DIMMs and some are I/O (usually symbolized by a asterisk). Everything goes through the hub. That is centralization. Easy to control system at hub. Big problems are that if you lose hub, everything stops, and comm traffic gets jammed (too much talk for the bus at times) quite frequently.
Fabric has many ends and multiple internal connections (usually symbolized by a cloud). Any particular end can talk to any other piece (end or node) by multiple (usually) routes. This is decentralized. Can allow a piece to fail, yet keep system running. Big problem is that control is a cooperative effort. One node running amok can devestate the system.
Now you can see the difference between the philosophies. Intel is like a control freak. Every decision is made to keep control in the center. The hub is used everywhere. Memory hubs, I/O hubs, FSB hubs, etc. Even hubs connecting other hubs.
AMD wants to do things as a team. Most decisions are to get things to work together to accomplish the goals. Each component does what it does best and you can mix and match components into nodes. Yes it does standardize on interconnects and such, but mostly to allow interchangeability. Any HTT enabled CPU could be used at the CPU socket even those not from AMD. You can use anyone's HTT tunnel instead of another. The more efficient designs may restrict which you use, but the full selection still will work (slow maybe, but stuff gets done).
Yes, each got that way given their situation. But the latter is better for most things. Centrino had a lousy wireless standard (11b when most were 11g), not the case with AMD. You could use a Nvidia NB with a Uli SB, if you wanted. It helped Ati's NB customers to have the Uli SB as an option when theirs turned out to be buggy. Intel would have not used an outside SB, if they could help it. They didn't have that choice with Xbox.
Pete
Chipguy:
In other words you missed it and now use some unrelated barbs to take away from your mistakes. Typical.
Pete
Wbmw:
Intel's system organization where everything is copied locally and inter core communication goes through the chipset (Centrallization). Intel assumes that the chipset is the place for stuff. AMD takes a distributed approach.
Yes you could do it with FSBs and get the same result. But it is more difficult to do it with a star topology than with a multidimensional fabric. It also is incompatible with Intel's one controls (Me first) policy. But very compatible with AMD's sharing (Us together) policy. Going against the grain is always difficult and rarely thought of.
As to the distributed L3, if you can't find it locally, your hope that its in the local cache is dashed anyway. Would you rather go through the long latency to main memory or would you rather it be in a neighbor's cache where you could get it quicker? You have to check that the memory address is controlled remotely anyway. Even Intel. But the latency of that is nearly as much as that of memory. If it is controlled by another core, the memory access must be flushed. On Intel's high latency memory and interCPU comm, the result is quite bad. AMD's low latency memory and very low latency interCPU comm, this works much better.
As to where FAC are used: http://www.stanford.edu/~huanliu/globecom02.pdf
There are many others. Just use your favorite search engine like www.google.com to see them.
Pete
Wbmw:
My memory was a little faulty at how many data bits are kept. ECC uses 1 bit per byte, but the data in the cache also has 3 predecode bits per byte (for L1I). Since L1I is swapped to L2 and this is retained according to AMD. Thus each 64 byte line has 768 bits, not the 640 I remembered (I thought it was one bit for ECC and one bit for predecode). This may be one of the reasons why AMD caches use more die area per byte than Intel's.
The linked list is how I code data caches used in my programs. Even in the old days, memory latency was much faster than disk. Some implementations in data comm also use this in their hardware. They just took their software algorithms and did their hardware the same way (why change what works). Singly linked lists are more used in hardware as they have many more uses.
As to why FA caches aren't implemented more, I think its plain inertia. You use old solutions that worked for you until something forces you away. Else we would have long ago gone away from x86 and followed the each fad as it came along with every new system just chock full of bugs. By the time any system will have most of the bugs eradicated, the fad would change away and everyone would switch. And there would be no long big sustained companies in this business. No IBMs, Intels or Microsofts. Fads don't give one time to get huge.
Caches started being direct and moved to more ways as time went on. As caches are shared with more and more cores and get wider as well, taking the jump to FA will get more likely.
Pete
Chipguy:
Evidently you have no clue on how you implement a true LRU using a doubly linked list. The return of the address is used to remove the line from that list and to add it back at the head of the list. The tail of that list is the LRU once the cache is filled with the initial entries. But you were too dumb to realize the difference. You of course know that AMD would swap that line into the appropriate SA L1 pesuedo LRU entry. That portion also takes two cycles which is concurrent with the LRU linked list update. No? Too dumb to know that too?
I should also assume that you are too dumb to see that FA consumes more resources per cache port than SA. L1 needs to be pipelined and have more ports (7 (5 up and 2 down) in K8), makes it easier to be SA. When it unifies and has one port to the L1s and one port to beyond (memory or some L3), then a FA becomes better. Also the fact of an L1 cuts down the rate of L2 searches to about one every ten cycles. That allows for some searches to be initiated from beyond (MOESI and data). Oh you must have forgot about those too. And you could determine if those accesses update the LRU list or not. There are good reasons for doing it either way.
Pete
PS: To those who wonder why I have a 512KB L2 having 8,191 ways in a FA cache rather than 8,192 ways, the zero way is used as the head of the LRU linked list and thus never is used for data. Its address is zero which is required to signal the end of the array. You could add one bit to do that, but that adds more bits than simply removing one entry. Also you needed a place to store that LRU data links anyway (Most Recently Used and Least Recently Used cache lines).
Wbmw:
From your reference:
Experimental Error
The miss ratios were calculated from data collected by functional, user-mode simulations of optimized benchmarks. As a result, the cache miss ratios reported above may not be representative of a real platform. A few sources of error are discussed below.
First, only primary misses were counted by the simulator. Once a reference missed in the cache, the data was loaded and all subsequent accesses to the line hit. A modern processor may also experience secondary misses, or references to data that has yet to be loaded from a prior cache miss. There is a nonzero miss latency, and a real processor may execute other instructions while waiting for the data. The sequential model used in functional simulations is optimistic in this respect.
Second, a modern processor will have optimizations that affect cache performance. Hardware prefetching of instructions and data can have the positive effect of reducing the number of cache misses. However, prefetching can also cause cache pollution. Further, speculative execution can result in increased memory traffic for speculatively issued loads, and I-cache pollution from incorrect branch predictions. This also makes the results optimistic.
Third, the operating system was ignored. System calls cause additional cache misses to bring in OS code and data, and in doing so they replace cache lines from the user program. This increases the number of conflict and capacity misses for the user program in a real system. Since the additional misses from OS intervention were not modeled, our results are optimistic (though experiment showed these benchmarks typically spend less than 0.1% in the OS). One possibility is to flush the caches on system calls. However, this is the other extreme, and would have made it impossible to measure the compulsory miss rates.
Fourth, the benchmarks were optimized for an Alpha 21264 processor. The binaries may have been tuned to perform well with the 21264 cache hierarchy (split 64K 2-way set associative L1 caches). Ideally, the binary should not favor a particular cache configuration. Further, the binary contains no-ops for alignment and steering of dependant operations in the clustered microarchitecture of the 21264. These no-ops increase the overall instruction count for the functional simulation.
FA caches do best in high interrupt and multitask loads. SPECint and SPECfp are deficient because they have a lot of small loops. These are helped by a large simple cache. A large RDBMS has a much larger footprint of sparse code/data. These easily overload simple caches unless they are very large.
As for L3. If the data is not in the local cache, it is checked against all other L1 and L2s. If there is a hit there, the data is obtained from them. That makes them an L3. Yes a distributed, remote access L3, but an L3 none the less. Given MOESI, the virtual L3 isn't exclusive, but for all practical purposes, it becomes nearly the same as one. Because Intel's can't be used as an L3 due to their system organization, does not mean that AMD with DCA and SRQ can't either.
Pete
Wbmw:
You evidently do not know what you are talking about. FA cache can be seen as a parallel CAM memory. You do know what CAM stands for? According to you, you couldn't access an array of 1024x1024 bits in one cycle. Thats bull...t! DRAMs are composed of hundreds of arrays of that size.
Now lets see if you can conceive of this. Each cache line of 640 bits (Remeber it contains ECC bits as well), has 34 bits of the physical address (address bits A39-A6) stored. On this line are 103 gates, 2 for each bit where the address bit is compared by a NOR and its complement with AND. The result will be fed into an OR gate and then all 34 outputs plus one bit which flags the cache adress as good will be fed into a AND gate. This output feeds a success column and 10 bits using diodes to say which column succeeded.
Now within one cycle you know two things, A) whether any cache line has that address and B) if it does, what that successful line's address is. Now this is fed into a 2 element array holding a doubly linked list of cache lines where the head line is the most recently used is and the tail line is the least recently used. The two adresses could be read when the line is found during the same cycle. Each address is copied into a register as well as the previous top link. The second cycle updates the previous up link with the down link and the previous down link gets the previous up link. The next cycle, takes the master address keeper and writes a zero into the up link and the previous top into the down link, plus the head link points to the new cache line. If zero is found on the up link, no updating is done as the line is already in the correct position (extremely rare with an exclusive L1 cache in front). If zero is found in the down link the previous up link is written into the master tail link.
On cache misses, the tail is the LRU and thus the procedure is the same except the cache line data is written from memory. For cache writes, another linked list could be maintained. It could be single ended with proper management however (new dirty lines get put on the tail and the top is stripped off as each is flushed to memory) amd since it is slow, the paralism can be reduced.
With all of these gates, the area used for both the CAM, LRU and dirty lists will still be less than 1/3 of that used for data. That is not cost prohibitive. Or power prohibitive. Look at how Intel throws tens of millions of transistors to do things that add up to less IPC gain. Don't hear you complaining about that waste.
Pete
Chipguy:
Many of those studies are of the older eras. Current interrupt driven OSes and heavy multitasking show that 8 way SA caches with the current methods of address bit swapping get overwhelmed. They also take into account that the more ways have higher latencies, which a good FA design does not have. So there are three assumptions that current systems invalidate, which invalidate the studies I have seen against FA caches. If you have links to those studies that you claim show over 8 way are not needed, post them.
Pete
CJ:
The power is not as high as you imply. To decode an array of 8192 targets in one cycle (8Kx64Bytes), the straight forward way, you need about 123K transistors. To decode 34 bits (40 bit phy address - 6 bits size of a cache line) of the same of FA cache, you would need about 295K transistors, about 2.4 times as many. Although this is probably 5 to 10 times that used for cache address decode, the latter takes 8 cycles for 16 way versus 1 cycle for the former.
The linked list reorder for a perfect LRU scheme, likely takes about the same as the CAM cycle. Two cycles for that makes for 5 to 15x versus 8x isn't that much more in power especially given the latency and cache size reduction. The total power use only occurs after a L1 miss or a remote cache check. Furthermore compared to the dozens of millions of transistors in the core which together uses 24W or so, you are probably talking on the order of 100mW.
Since a FA is much more efficient, a smaller size will still get lower cache miss percentages. The above is for a 8,191 way 512KB L2. If you think a 512KB 16 way is enough, then a 64KB 1,023 way will have the same cache miss ratio. Of course the small FA will use less power than the bigger SA cache and still have lower latency.
FA caches also do very well in heavy multitasking server work. The large SA caches work with large data sets with poor blocking. K8 with DCA gets an effective L3 using remote L1s and L2s of up to 320 way 9MB. Thats getting a long way towards a near FA cache. A good multitask cache though.
Pete
You are one of those IPF supporters who can't or won't see its failures. Once may be ok, but pounding it over and over with the refrain, "wait until ..." makes you seem small. Take your shtick over where you can collect "rah rah"s, if that is what you desire. You are not getting any here.
Pete
Wbmw:
Its evident you haven't. Or know how caches work beyond some simplistic view. Par for your course.
Pete
Then AMD has more than tweaks in store. Contrary to your opinion (or hope).
Pete
Combjelly:
It may be slow the way you'd do it. But it can compare all addresses simultaneously in a single cycle, yes, its a bunch of power, but on the same order as a address decode into a memory array. Its the LRU linked list reorder that takes a couple of cycles. Parallelization can shrink cycles in exchange for more power.
As to cache size reduction, the rule is that array size needs to quadruple to reduce cache misses by half. Typically the same is true for ways. Quadruple the ways you halve the misses. More ways allow for more simultaneous processes and more cache allows for larger working sets. But if code is sparse, like code with lots of error handling, then the ways in a set associative cache can get overwhelmed, even though the working set is small enough to fit into the cache. That never happens with fully associative caches.
Set associative Cache areas are hashed to spread them evenly over memory address ranges, but the typical spreading algorithm just moves address bits around. This causes another problem in that the way in which code is blocked and compiled, the lower addresses of a page or page group get over used so those sets that map to them get used up first. Which defeats the simple address bit reordering. Going to large memory typically goes to a 2MB page for quicker memory allocation and management. That page size completely defeats any simple set associative caches of less than or equal to the page size. Even those equal to few pages don't do much better. They don't defeat fully associative or nearly full associative caches.
Pete
Smooth2O:
AMD can do more than tweak the core. It could add another FPU ADD and FP MUL units to the pipe to do a packed SSE2/3 instruction of each a cycle. That will likely nearly double the FP throughput of the CPU taking it above its RISC and EPIC competitors. At that point it would be above both integer based commercial loads and FP based HPC loads than its high end competitors.
Then the adding of a few more cHT links plus making them faster, would get systems to almost completely take away any niches for the others. Of course, Intel would have to follow with Xeon and thus do what Itanium has failed to do, replaced these high end CPU systems with a large volume GP CPU.
A tweak that would go far to boost speeds would be a true fully associative L2 cache, even if it would add 50% more die for the cache. A 8,191 way LRU 512KB L2 would be far more effective than a 16 way set associative 8MB L2 with 16 times the die area on typical loads for servers, desktops and mobiles. It also could be lower in latency, if done right. It likely would have more latency than a 2 or 4 way L1 data or instruction cache.
Pete
Chipguy, why don't you go over to the Intel thread and post your "IPF is great" crap there. That is where it belongs. Every new core or iteration has all of these IPF is great boosters that over hype the new core, only to find that when it actually hits the street, its far less than advertised.
So far not one core or revision of Itanium has lived up to its initial hype. Coulda, woulda, shoulda doesn't cut it. I have stated many years ago, that the execution is where this ISA is going to fail. So far Intel has met my expectations. The only one it hasn't yet met, is when they will get rid of this boat anchor.
Pete
Teddy Roosevelt said "Speak softly and carry a big stick!" Hector is going one better, "Say nothing and whap em when ready!" That used to be what Intel did with released products. Now AMD can do it and Intel is just going to have to just react to it. It doesn't have much experience in this, so it does quite poorly. It will either get good at it. Or try to get out like they did with DRAM. Problem is that they have failed at that, too.
Pete
Dear Thread:
From DRBES on SI:
http://www.siliconinvestor.com/readmsg.aspx?msgid=21856084
...there is no shortage of independent observers who think the chip is about to be killed off. There are even more IT directors and chief technology officers who tell anyone who will listen that the Itanium is not an attractive proposition. In recent times there have even been a few Intel executives who seem more focused on x86 compatible chips such as the Xeon than on the Itanium....
http://www.vnunet.com/itweek/comment/2145558/wants-itanium-processors
Pete
PS fpg, Tecate is a gnat. Treat her as one, put her on ignore.
PPS: Intel having a roadmap is nothing. They only change it over and over. Having a roadmap of the week makes goals a year out next to worthless. They get changed too fast.
It is like the old Don Adams lines.
Intel, "We will produce a 10GHz processor in 5 years."
Customer, "I don't believe you."
Intel, "Would you believe at 7GHz one in 4 years?"
Customer, "I still don't believe you."
Intel, "Would you believe 4GHz, next year?"
Customer, "No!"
A year elapses.
Intel, "You were right, we didn't get to 4GHz!"
Customer, "Your predictions have no credibility anymore! I'll believe it, when I see it!"
Of what, basic chip engineering and economics? Go compare
the die size vs ASP of DRAM versus MPUs. DRAM can sell
for such a remarkably low price compared to die size in large
part due to redundancy and repair. Someone who says an
IPF MPU has a large die size *therefore* it must be really
costly to make is simply ignorant.
DRAM is inherently simpler than cache, for three dozen logic transistors, you have 512 data transistors and 512 capacitors. And adding a few more arrays is easy. Cache has 5 to 10 times the logic inside of DRAM. And although only one stuck transistor out of the half million elements in the DRAM array can stop redundency of spare arrays, a few thousand can stop the redundency of spare cache. Next having extra rows and/or columns in a DRAM array is straight forward and easy due to the strong commutation of bits within an array both vertically and horizontally. In cache, there is no commutation within a row as each bit has a different purpose and there is no commutation within the number of cache lines constituting a set. If the cache is 16 way and there are 4 ways a row, then a set is four rows. Lose a column of DRAM and you may fix the array. Lose a column of cache and the whole cache array is shot. Lastly, DRAM has thousands of arrays, cache only has a few.
So logic is nearly unfixable, but a defect can fall in an area or in such a way that won't affect logic. A rule of thumb is that 70-90% of logic is unfixable when a defect falls within its area. Cache area is like 5-10% unfixable and about 20-30% is fixable by shutting it off. DRAM is less than 1% unfixable and 1-2% by shutting it off. And as the area goes up, DRAM remains with a small chance it is unfixable (that area where it stops redundency of arrays), but cache has an increasing likelyhood that a defect gets into the unfixable area.
So when cache area is much larger than logic, defects in the cache area can scrub the entire chip or force the chip to have only a fraction of its cache. Say if cache is 10 times the logic area, then the die may be unfixable as if the die had 1.5 to 2 times the logic. And there is exists a decent chance that all cache arrays have a shutdown defect in them, so the die is unusable. Lastly, Intel seems to have either 1or 2 cache arrays for desktops and 2 or 3 arrays for Itanium. AMD looks to have at least 4 and as many as 18 cache arrays per die (that may be one reason why AMD has low cache density.
That's quite funny. I look at IPF's primary MPU competitors,
POWER5x and US-IV+, and see external caches of 36 MB
and 32 MB respectively. OTOH IPF systems don't have cache
beyond what is on the MPU. That represents a huge saving
in package pin count/internal routing, signalling power, and
system level physical design complexity for IPF systems.
External caches may simply be SRAM which has commutativity in both directions and thus be highly fixable (the cache logic is on the CPU die) making it high yield and thus, low cost relatively.
Pete
Wbmw:
Unless AMD launches a competitive dual core Opteron EE, Yonah will be the only choice for dense racks needing ~30W and 15W CPUs. I think it's a market that, while small, is lucrative and growing.
The problem is that Yonah isn't DP or MP. Opteron DC EEs and HEs have the capability to be put 2 and 4 to a 1U rack. Yonah can't do this. Also the 4GB limit further limits their use. As does the not having AMD64. 1P Yonahs in 1U servers has a lower performance density of than 4P SC Opterons or 2P DC Opterons in 1U servers not to mention 4P DC Opterons in 1U servers. Power is used by other components like power supplies, fans, disks, memory, interconnect both on board and between servers. 42 1P Yonah 1U servers per rack only has 84 cores per rack. 42 4P DC Opteron 1U servers per rack have 336 cores per rack. Furthermore we do not yet know how much power at the 12V VRM side Yonah will require at what speeds.
With Opteron and the various A64s, Turions and Semprons, we know what they consume and at what speeds they do it at. Toledos (normal Opteron x75s and X2 4800+s (2.4GHz, 1MB L2 each)) use 86W by measurement during May 2005 running 2 Prime95 processes (www.lostcircuits.com), Manchesters (A64 X2 3800+ (2GHz, 512KB L2 each)) use 47.2W running same, Venice (A64 3800+ (2.4GHz, 512KB L2)) uses 30.8W running 1 Prime95 process and Toledo (A64 FX-57 (2.8GHz, 1MB L2)) uses 60.4W running same. Given the typical VRM efficiency of 84% ( http://www.powermanagementdesignline.com/products/161600289 ), we get 72.2W, 39.6W, 25.9W and 50.7W respectively. Those figures are well below the listed AMD TDP max for each, 110W, 89W, 89W, 89W.
Scaling these dual cores down to HE (2GHz) and EE (1.6GHz) speeds can be done in two ways, straightline from current speeds to idle speeds (C&Q off), looking at one Prime95 process uses versus 2 and using the c*f^2=P method. Toledo (1 Prime95 process) uses 63.6W at VRM 12V (53.4W at VRM Vcc) and at idle uses 24W (20.2W). Using the first method for 2GHz HE nets us (72.2W-20.2W)*(2GHz/2.4GHz)+20.2W = 63.5W, not likely. The second method for 2GHz HE nets us (72.2W-53.4W)*(4.8GHz-2.4GHz)*2GHz*2+20.2W = 51.5W, possible. The first method and second method neglect the voltage differences and the effect of leakage reductions thereby.
Trying the last one for the 1.6GHz EE case, we get 45.3W, unlikely. Modifying the second method for voltage drops (multiplying result by ((1.2V (HE) or 1.1V (EE))/1.4V (norm)), we get 44.1W and 35.6W. If we further modify method two for voltage drops squared, we get 37.8W and 28.0W, much more likely. They will do until real tests are performed. And thats with processors built on or before May 2005. 7 months of process refinements and respins could reduce this further.
Going from deficient to even is part of the "turning around" process, and I think Intel will make this transition. In H2 2006, they will transition from even to ahead.
H1 2006 won't get them to even and late H2/06 might get them to even, but I won't count on it as that hopes AMD won't make any progress during the next year. In any case, when independent third party sites get the chance to test the new CPUs OTS, off the shelf, we will have enough data to truly have a good discussion and/or debate.
Pete
Gnat:
Yours is a small mind. A good one has discussions and debates. A small one does neither. The last posts just show you to be the latter. When you are willing to discuss or debate the issues raised by wbmw, feel free to join in. Else you remain a gnat with a small mind.
On to discussions, debates and informative posts ignoring gnats.
Pete
Tecate:
Kate, the great, just became Kate, the gnat. The truth blows their minds.
Pete
Wbmw:
And in all cases, Intel, on head to head comparisons to AMD products, will fall flat on their face. Even after all that, they still will be behind AMD's current products much less, any being released by then. Intel's big 65nm wad will thus be shot with nothing to show for it, but we will hear the "wait until ..." cry yet again.
It brings to mind the cartoon where the hunter shoots his big shotgun at a barn door. A lot of smoke, fire and sound is emitted. The barn door just 10-15 feet away is completely undamaged. He looks at his gun barrel (an unsafe practice). The shot simply rolls down the barrel and falls into his face. Much ado about nothing.
Pete
Dear Chipdesigner:
Q1/06? I stand corrected. But when will it be available in quanity? Last time it took 6 months for Intel to deliver real volumes. Whoops, that puts it back to Q3/06.
Intel seems to be up to its old tricks. Its changing its roadmaps every few weeks. Originally Cedarmill was to be a Q2/06 release with VT:
Intel Pentium 4 6xx (Cedarmill) is expected to be released in Q2. Cedarmill will be built on a 65nm process and largely based on the (single core) Prescott 2M core. The TDP rating for Cedar Mill chips will be 86 watts, down from 95 watts for the Prescott 2M. Targeted at the value sector, Cedarmill will feature Hyperthreading, EM64T, EIST, XD and Vanderpool Technology. The initial variants of Cedarmill are expected to be the 631 (3.0GHz, No VT), 633 (3.0GHz), 643 (3.2GHz), 653 (3.4GHz) and 663 (3.6GHz).
Now its to be Q1/06 without VT (couldn't get it to work soon enough?):
http://www.anandtech.com/cpuchipsets/showdoc.aspx?i=2526
Intel lists the 6x1s as 90nm CPUs on its processor number recognition web page. And it still has Cedarmill as a H1/06 product on the roadmap.
Frankly, I don't think Intel knows when, or if, Cedarmill will see the light of day. And it doesn't know how it will number it, except as a 600 series CPU, when it finally shows up. Thus Q1/06 is a crapshoot. Q3/06 could be just a couple of respins away or never.
Pete
Dear Drjohn:
Well he forgot (being generous) to include a A64 FX 57 as a comparison. See this post: http://www.siliconinvestor.com/readmsg.aspx?msgid=21772261
So Cedarmill is better than P4E, that isn't saying much. A A64 X2 4200+ runs at 2.2GHZ which uses a Manchester die which is composed of 2 Venice cores (2.2GHz@512KB L2) each of which has a model number of 3500+. On most tests, a 3500+ Venice beats the performance of a P3.4E Prescott which Tom has tested to match Cedarmill. Thus 2 3500+ Venices use less power than 1 Cedarmill with more than twice the performance. Since Cedarmill isn't supposed to be released until Q3/06, it looks like Intel won't even match a Q4/04 AMD CPU for performance/watt (Winchester). And not even getting to half of the performance per watt of a Q3/05 AMD CPU for a year. Heck the Q4/06 performance won't even get to the current AMD flagship A64 FX-57. Intel shot its 65nm wad to still fall short of AMD's 90nm present.
Yeah, Intel will get back on the road next year. The same road section AMD passed two years ago. "Wait til ..." just got longer away. YAWN!
Pete
Dear Avatar:
Here's their problem:
If Sun Opterons are better than HP's Itaniums and HP's Opterons are better than Sun's Opterons, then HP's Opterons are better than HP's Itaniums. So as a customer would say, "Why aren't you selling me Opterons instead of the overpriced poor Itaniums?" This is usually followed by "I going to get them from Sun, at least they are telling me the truth!"
The "foul" is the odor from the HP Itanium sellers.
Pete
PS, that also does it for Xeons for the same reasons, whether they are from HP, IBM or Dell.
Wbmw:
YOu forgot the this is raw data! And all you saw was "(Final) data" It counts entries like "600MHz P3 ... we also have P3 800EB, 866B, 933B and 1000" can show up 9 times for one company and it may even be not in stock or preorder. Yet you don't see or discount the disclaimers. People could check up on it every week and do the count themselves. And they remeber what it showed and what the actual situation was. And 9 back then was like 1 or 2 now. Besides a lot of sellers back then only had a dozen. There wasn't many like Newegg who could sell hundreds, if not thousands in a week.
You can't accept third party sites that discredit your views. You can't accept other posters, other insiders, Analysts that normally are for Intel and Intel itself, when they go counter to your dreams. They understand what went on, you just can't accept it. And I do understand why. You don't like it when your wrong. Too bad! You lost! You're wrong! Period! End of story!
AS to TDP, you can't claim problems with 0.5W because 1) 2.26GHz PM is 27W far more than 24.5W and 2) when 24.5 is rounded, it becomes 24 by the rule, ">0.5 go up, <0.5 go down and at 0.5 round to nearest even". If you aren't able to remember this simple mathematics rule, YOU HAVE NO BASIS TO BE POSTING HERE. Besides AMD TDP is a absolute upper bound and Intel's TDP is a permeable number that bounds nothing. It is easier to get a MHz gain when you can change to an easier basis. But you knew that.
As for stock options, you have no idea how it works. You are given an option to buy XXX shares of company stock for YYY dollars a share at date ZZZ, if you meet certain conditions. At date ZZZ, the company stock is YYY + 5. The option holder then will exercise his option by buying XXX shares at YYY and either keep them or sell them for YYY+5 or some combination of the above. The company takes shares bought at YYY+5 and sells them to the employee at YYY for a loss to them of XXX*$5. The stock option holder is charged XXX*$5 as pre-tax income. This last is why most or all of the stock purchased by option holders is immediately resold. Evidently you never had options, were able to exercise them, know what happens when they are exercised or don't know how option exercises are treated by accountants.
And you evidently can't read about what happens either. And you neglect another simple rule understood by all who get them, Options that net zero or out of the money are not exerciesd by the holder as they could buy the shares in the open market (we are talking about a publically traded company here) for less. Even slightly in the money options aren't exercised due to transaction costs.
EVERYONE KNOWS YOU ARE AN IDIOT!
And you can't argue with idiots as they can't comprehend their stupidity. So I will refuse to answer back anymore. There's no point.
Pete
Wbmw:
You are full of it! Too full of your dreams to see reality. The only data you "see" is ones you think support your reality. You claim others who don't see the world with your blinders are idiots.
Intel P3 1GHz is still hard to obtain during Q3/2000, "You add in the fact that the Coppermine 1Ghz still isn't the easiest to obtain": http://www.siliconinvestor.com/readmsg.aspx?msgid=14489206
Also Intel blames Europe for miss saying they didn't buy enough high end PCs (high speed P3s). This site (which you ignore when it says things you don't like) says that "but blaming us because it can't supply enough of its components is a bit rich, coming from Chipzilla.": but blaming us because it can't supply enough of its components is a bit rich, coming from Chipzilla. That is six months of not being able to supply. Of course you say that, if the supply became plentiful just around December 31st, 2000, that they were easily obtained in H2 2000. Yet here is an analyst, CFSB, which says that in H2 2000, high MHz P3s being in short supply: http://www.siliconinvestor.com/readmsg.aspx?msgid=14493581
"Finally, it appears that lower output of 1-GHz and other faster Pentium-3
chips was due to hastily implemented internal mask-sets-not manufacturing
yields or design problems-that caused slower-than-expected speed outputs. A
catch-up to plan should occur by January."
But Wbmw was right, it was not limited in 2000, just that Intel couldn't make P3s using quickly made masks. Most would say that the quickly implemented masks (to make notched gates) that Botched yields. A lot of things looked great on paper, but either couldn't be made (Timna, Tejas, ...) or wasn't good enough to make performance targets (high (>4) GHz Prescott, Transmeta, Merced). But if the company is Intel, Wbmw with his "damage hiding" rose colored glasses can't see or believe it did anything wrong.
Intel didn't fess up to the real reason (not many outside of Wbmw believed them anyway), but did acknowledge that they had problems making high speed P3s.
As to the 1.7GHz to 2.26GHz 90nm ramp, it is always easier to ramp a poduct that was made deliberately slower. Then according to you, we should take the Opteron EE 140 at 1.4GHz and use that to see the 90nm ramp (30W TDP CPUs) to the Turion MT-40 at 2.2GHz for a 57% gain. The same for PM (24W TDP CPUs) is 2.1 vs 1.7 for a 23% gain. Overclockers know that the easiest way to boost speed is to boost TDP (until some other limit is reached like a speed path or can't be cooled).
Well what can I say? You calculated it wrong, and you have little concept of how to total up a balance sheet. The $11B is the total worth of all the options issued during those years. Don't you realize that when you buy a stock for $20, give it as an option with grant price of $20, and then receive $20 when the receiver exercises the option, that you haven't lost anything? Meanwhile, the receiver reaps the reward without any risk, and if the option expires, you get to keep the shares. If you don't understand this, then you really don't belong on this forum in the first place.
I didn't calculate it wrong, you did. In your example, I would have it as $0 missing, Stock bought at $20 minus Revenue of $20 from stock option exercise which equals zero. I subtracted the amount recieved in the exercise of the options from the amount paid for the stock. But you didn't comprehend what I did, or read the earnings report and looked at the lines I used (I stated that I took the first line (Stock repurchase program) and subtracted the next line (Proceeds from sales of shares to employees, tax benefit & other) and totaled up the result for all 18 quarters. So I took into account what employees paid for the stock. But you with your rose coloed glasses just didn't see that or simply "ignored it", because it was bad for your views. If anything, I probably understated the true costs because "Tax benefit and other" was likely positive over the period totaled.
Pete
Wbmw:
Wbmw doesn't accept other's words. He can't believe that his dreams aren't connected to reality. Listings do not mean availability. Lots of listings were there for P3 1.13GHz, until it was recalled (and some even then), but less than 200 were shipped.
That is the facts. You are the moron who thinks listings equal supply.
Here is the view of an insider in September 23, 2000 about Q3/2000: http://www.siliconinvestor.com/readreplies.aspx?subjectid=36138&nonstock=False&msgid=1444237...
P3 on allocation in September, 2000! Wbmw, stunned. It was supposed to be running great in H2/2000. Surprise!
As for 130nm, there is that 130nm SOI there where K8, which is similar to K7, got moved and that went from 1.8GHz (at release) to 2.6GHz for a gain of 44% (from top 250nm thats a gain of 247% (2.31*1.5)). Of course 90nm is not going to be good for either in higher top bin speeds. Intel got 3.8/3.46 = 110% (10% gain). K8 (although that ramp isn't done yet) so far is 2.8/2.6 = 108% (8% gain). Later, K8 may reach 3 or 3.2 GHz and then the gain will be 15% or 23%. Of course Intel, due to design switch, will have a loss 2.26/3.46 = 65%, a loss of 35%. It will help future gains, though.
The jury hasn't been seated yet on 65nm. Will it have a small gain like 90nm or one like 180nm and 130nm.
AS for options, that was net stock purchases minus revenue paid by employees, tax gains, etc. The number $20.6 billion was after all the amount paid to exercise the options. If you look at the earnings releases, its the line under Stock purchases. I took the net between the two and totaled all 18 quarters all up. $11 billion was what was left after the shares bought back (reducing shares outstanding) were taken out from all shares purchased.
As to Pricewatch, Back then there were a lot more vendors. Now there is only 5-8 major online stores (23 on dealtime). Pricewatch tightened the rules, tried to remove duplicate entries and started charging more. Before, you would have retailers adding comments like "we also have 933EB, 800B,..." and Pricewatch would pick up 10 entries from the same retailer for a 733B listing. This inflated the old numbers and had to be removed by hand later on. You saw the raw data, pre culling.
Also there were the entries that were pre ordering (like those for Venice (E3, E4 and E6 steppings)). There are two entries for Opteron 280s present before it was released on Pricewatch even after culling for "we also have ...".
Pete
Wbmw:
More comprehension problems and a inability to keep those rose colored glasses off. They (P3 1GHz) weren't available OTC in August, 2000 per Sharky. They weren't either in October, 2000 per Sharky.
http://www.sharkyextreme.com/hardware/reviews/cpu/amd_thunderbird_1x2ghz/index.shtml
That's all of 2000. H2 2000 availability is a, D, in your dreams. Thus your memory is faulty. Third party sites say this, not some Intel PR hack.
36% is not good when taking availability into account. Trading a little bit of speed for loads less volume only makes sense when you want reviewer only stuff. But, AMD went from 750MHz to 1.73GHz on the same 180nm process for a 130% gain with gains in yield. If you praise the first, then you must heap accolades, shout from the highest roofs and grovel before the second. Since I hear your silence, you are just a hypocrite.
Intel gets a paltry 36% gain with yield losses and AMD gets 130% gain with yield increases and you praise the first and emit silence on the second. Talk about your missing critical thinking skills first before denigrating others.
Intel 2001 Q4 Stockholder's Equity: $35,830 Million
Intel 2002 Q4 Stockholder's Equity: $35,468 Million
Intel 2001 to 2002 Equity reduction: $362 Million
Intel 2001 to 2002 Earnings: $3,117 Million
Intel 2001 to 2002 Dividends: $533 Million
Intel 2001 to 2002 Unknown Expenses: $2,946 Million
Intel 2005 Q2 Cumulative Stock Purchases: 2,393.3 Million Shr
Intel 1999 Q4 Cumulative Stock Purchases: 1,319.8 Million Shr
Intel 2000-2005 Shares Purchased: 1,073.5 Million Shr
Intel 2005 Q2 Shares Outstanding: 6,144 Million
Intel 1999 Q4 Shares Outstanding: 6,620 Million
Intel 2000-2005 Outstanding Shares Down: 476 Million
Intel 2000-2005 Shares Missing: 597.5 Million
Intel 2000-2005 Net Dollars for Shares: $20,636 Million
Intel 2000-2005 Net Dollars Per Share Purchased: $19.22
Intel 2000-2005 Net Dollars Per Share Down: $43.35
Intel 2000-2005 Shares Missing Cost: $11,486 Million
Wbmw:
Your reading comprehension stinks, as usual!
P3 1GHz was in extremely short supply, even six months later. In fact anything above 733MHz was in the same boat. Enough for reviews, not enough to sell over the counter. The volume only started going up when 130nm arrived and they could deep six, notched gates. AMD got to 1GHz on its 180nm process from 750MHz (thats 33% improvement) without notched gates and you could get any of them over the counter. That same process got Thunderbirds (K7) to 1.7GHz, 600MHz more than those "nocthed gates" that notched Intel's ability to supply over 800MHz (and 127% over the initial 750MHz).
As for losing money, Intel lost money, last year. I subtract options payments as I consider these to be payroll expenses, not distributions. A good sign of losses is when Shareholder's Equity goes down. And it went down at Intel over many quarters.
As to the links, read the prefaces. They all state that Intel's 1GHz CPUs (heck 800-1000MHz CPUs) were in very short supply. Of course once you saw that, you went instantly into brain freeze. And that was relevant to our discussion. It didn't clear up even at the 1.2GHz Tbird review in August, some five months later. By December, the AXPs and P4s were coming out and just then the 130nm P3 for mobiles started coming out in volume. 9 months is a very long time in this business.
Typical Wbmw with rose colored glasses, whenever there is information contrary to your beliefs, either A you don't see it, B, don't remember seeing it, C, make it out as not relevant or D, dreamed something else.
BTW, all of this was pre .com bubble bursting in 2000.
No problem Pete. We'll assume Conroe is the next Prescott; no, the next Merced, until Intel proves otherwise.
Finally beginning to see the light after taking off those rose colored glasses.
Pete
Wbmw:
You showed no links, so why should I? More hope and a prayer on your part. So you acknowledge Intel took a hit with Notched gates after you said you didn't know. Typical! Yes they got Intel to 733 P3s, but AMD was pushing past them with K7. Beat Intel to 1GHz and beyond. Intel lost their technological advantage and because engineering and process couldn't stem the tide, turned to their dirty tricks department.
As to growing 16% YOY last Q, AMD did better. And without all of that huge capacity.
You can ramp two new products on separate lines. AMD did it with one fab, but Intel couldn't with their multiple fabs? Get real!
And it helped tremendously. Before this announcement, Pentium III ran at 733MHz, and with the manufacturing enhancement, Intel launched 800MHz in late 1999, and eventually scaled to 1GHz by March 2000. Once 1GHz ramped up, it accounted for a total 36% improvement.
http://active-hardware.com/english/reviews/processor/athlon-1ghz.htm
http://www.sysopt.com/reviews/athlon1g/
Perhaps you were using this one for date comparisons: The on die cache Athlon Thunderbird came out months after the 1GHz Athlon Classic external cache:
http://www.sharkyextreme.com/hardware/reviews/cpu/thunderbird_1ghz/
http://www.anandtech.com/cpuchipsets/showdoc.aspx?i=1189
Here you have it, Intel experiencing problems supplying their high end CPUs. 1GHz in March a few days before Intel.
http://www.sharkyextreme.com/hardware/reviews/cpu/amd_thunderbird_1x2ghz/index.shtml
More supply problems for Intel.
As for Conroe, the devil is in the details, none of which we have seen,just a set of goals. So we should really wait and see.
Pete
Wbmw:L
Dothan was delayed because of power problems, but in dreamland, you want it to be capacity. Where was all of this huge Intel capacity you shoved down our throats? It goes past credulity that Intel, the supposed manufacturing powerhouse, couldn't have enough capacity to make Dothans. Only two real credible possibilities remain, Dothan wasn't working or Dothan wasn't good enugh. I gave Intel the benefit of the doubt by using the it just needs a respin reason. If it was Dothan wasn't working, Intel was in even more trouble. Besides, the rumors out at that time was of the second reason.
When Montecito slipped, it became a different product with dual cores and multithreading. I guess in a contrived way, you can call this original redefinition a "slip", though in my book, if it takes them a year to add new features, that's a planned event, not a slip. As for your other dates, Intel had only advertised "2005" on all their roadmaps. Becoming Q4 2005 is not a slip, but a narrowing of the window. AMD does this, too, by the way, and you'll be the first to defend them when "H1 2006" becomes "Q2 2006". The only slip after that is from Q4 2005 to Q1 2006. If it's later in the quarter, you might consider it a full quarter slip, but if it's early in the quarter, probably only a few week's slip, nothing compared to the 1.5 years that Hammer slipped.
If Hammer was a slip to add features, Montecito is an even bigger slip, at Q4/05, it is 1.5 years, Q1/06 makes it 21 months, at least. K8 went to 130nm SOI from bulk and 130nm SOI wasn't ready. THe first year's delay of Montecito was for the same reason, Intel's first 90nm process stunk. They needed to redo the 90nm process, so they added the Madison 2.0->1.6/9M as a gap filler. Now that they had another year, feature creep came in and they added some new features. But like other efforts, add ins slipped it more. That additional slip is up to 9 months, so far. The touted 2.5GHz speeds have come down to 2.0GHz, then 1.9, 1.8, 1.7 and now will be 1.6GHz, if they make that.
Unless you can provide a link that notched gates affected yields, I'll just assume you've made this one up.
Where have you been? This was the P3 era, Coppermine and the push to 1GHz. Intel even touted the notched gates as a speed booster with papers, presentations, micrographs and such. Only trouble was, that any CPU with them rarely was that fast. This was widely discussed on various boads. It was akin to AMD's K6/3's manufacturing speed booster a few years earlier that had the same results. "Notched gates" became the phrase, "Botched gates". I guess process fiascos are quickly forgotten by Intel bossters.
Not by as much as you think. Wow, an entire paragraph summarizing Intel's biggest failures as a company, and each time they get up and become stronger than ever. See a pattern forming??
Only trouble is that their competitor is far stronger than it was too. And that is causing Intel to "Consume bicarbs by the crate". That "Roadkill" AMD is doing so much better even after Intel shot their wad with all of the illegal behaviour.
They don't have a new micro-architecture showing up next year, and frankly their plate is kind of full with their current plans.
Their microacrhitecture is working. A well balanced design is very difficult to get. Once you get such a design, one tends to stick with it for a very long time. Look at Intel's well balanced design, PentiumPro. It lasted through P2, P3 and by most accounts, is the basis for Pentium M. It survived the i860 onsluaght, IPF fanfare and Netburst. Yes there were tweaks, but the basic form survives. And now they will tweak it again to Conroe (P5), et al. K7 is AMD's well balanced design that got tweaked to K8 and K9. And likely be tweaked into K10. Wholesale changes are a receipe for disaster. Intel better stick to the tweaks only, else Conroe will likely do a i860.
It's (Conroe) going to be a more competitive product that K8/K9, like it or not. The only question is how AMD will react. Will they panic, or simply accept defeat until they can launch their K10 core, which will probably put them back to a competitive position? Only time will tell.
Conroe has not seen the light of day. Only speculation and Intel's goals currently exist. K8 and K9 are currently shipping and thus real. You can get your hands on one and work it out. Lots of designs looked great on paper, but when realized (and many failed to do that), were less than promised. The original Itanium comes to mind. It was going to be this huge RISC killer, bt it turned out to be slow, buggy, hot and expensive.
I do agree with you on one point, only time will tell. Let's wait and see.
Pete
Wbmw:
More revisionist history from someone with rose colored blinders on.
Dothan was delayed because Intel had given preferential treatment of their 90nm ramp to Prescott. Not that this was a great idea, but Banias was already competitive, and Intel needed the cost savings for Pentium 4.
Dothan was delayed because it had the same power problems that Prescott ran into. They needed a few more respins to get it under Banias. And even after it was released, yields were terrible for quite a while.
Prescott had a very strong ramp, just not at the higher speed bins. Those were "as scarce as hen's teeth", as the Inq liked to put it. But the lower speed bins quickly arrived in large volumes.
Heat troubles were far more likely. If you fail to ship released speeds, your late. Shipping slow is easy. To get them out, Intel made lower speed bins for them to ship. These were released later and were not on the roadmaps. So technically from your rose colored view, they didn't slip. But, most here agree, if you release 3.4GHz and ship 2.4GHz, you have slipped and paper launched 3.4GHz (, 3.2GHz, 3GHz, 2.8GHz and 2.6GHz).
Montecito is late only if you mourn a slip from Q4 2005 to Q1 2006, and it's slow only if you miss a little frequency potential in a chip already filled with new features such as dual core, multithreading, and Foxton.
More hipocrisy! AMD dioes it with Hammer adding in SOI and that to you is a big slip, yet Montecito first coming out summer last year, slipped with Madison 1.6/9MB as a filler to summer this year. Summer became Q3/05, then Q4/05 and now Q1/06 and at slower speeds to boot, and it is a not a major slip? Because its Intel and they don't do these things. Sorry they have, they are and they will slip and paper launch.
Intel ... and planning a winning strategy. They unveiled this strategy at IDF with the introduction of Conroe, Merom, and Woodcrest, a high performance, low power optimized micro-architecture that takes the best pieces of Netburst and Pentium M, and adds a number of new technologies.
Yes lets look at their "winning strategies" of the past. RDRAM, a big bust and went with AMD's lead towards DDR. We don't need copper, they went copper. Notched gates for more speed, big yield crash. Netburst, netbust. IPF, never caught on with customers. P4 to 10GHz by 2007, didn't even make it to 4GHz. IA-64 conquers all even AMD64, followed AMD64's lead to AMD64 (That must have hurt). The only true success is the Centrino and that is under pressure now. And that wasn't on any "winning strategy" (roadmap) until just before release.
Intel has had "winning strategies". The problem is they keep failing. The only consistency is the "wait until ..." cry. Well we waited, and waited, and waited ... And so far, late, hot, slow and just plain missing.
In this I belong to the, "show me" crowd. I'll believe it when I see it.
Unfortunately, AMD has nothing to counter this in the short term, quite possibly not until K10 at any rate. No one can say for sure, but when AMD has been in these situations in the past, they tend to panic and release a number of paper launches. Surely you can't imagine something like this, but that's because you have your mind so focused on Intel's destruction to know the difference.
How do you know what they have waiting in the wings? How do you now Intel will make their goals? Intel for a long time wasn't in these situations for a very long time and now that they are in such a situation, they failed miserably. We get the "winning strategy" of the year, quarter, month and even week. The problem is that Intel has nothing now out in the field. And not in the near term.
You can disbelieve in AMD's plans, what little comes out into the open. You can discount what they say. You have that right. But AMD has followed a consistent "winning strategy". It hasn't changed and it is winning. Part of that is not to release products until they are ready. Intel had the luxury of doing it. And I know you hate AMD having it now. But it does eliminate paper launches and recalls, etc. And, unfortunately to you, those maddening launches of products just before Intel launches a highly anticipated product making Intel a provider of a similar, but less capable product. It's hard when the shoe is on the other foot.
Pete
Wbmw:
You forget the Dothan was delayed and wasn't available for months. Ditto for Prescott. Montecito is late and slow. Chipsets seem to slip and even after being announced don't show up. P3 1.13GHz was released and in a month, after bugs were found, less than 100 were shipped. If that's not paper launching, you need a reality check real bad.
You claim that AMD will go back to paper launching. Sorry, You just have a hope and a prayer. Intel has, is currently and will continue to paper launch.
Pete
Wbmw:
Intel has paper launched far more than AMD. CPUs released now, wait five months to begin to show. AMD, CPUs show up before they are released.
Pete
Wbmw:
You have a crushed head to go with the crushed ego that Dothan in a apples to apples comparison comes in third to both Turion and Sempron.
DDR2 isn't apples to apples. Perhaps we should compare the power required for Turion with Nvidia's CK51 integrated graphics chipset on games to get the same performance of Dothan with Intel Extreme Graphics (if we can get Turion to be that slow). At least with Gigabyte's the accelerator was set to the same performance as Turion's with similar screens.
Looking at power without matching up performance, makes for extreme irrelevancy. But you are good at that.
Pete
Chipguy:
You forget that Sun has their own compiler writers and they could simply put a wrapper to detect programs like Intel does, set the options to get the best performance and make base the same as peak. They could also pull some stunts like they did with SPARC and rework algorithms within some of the programs and cut one program's run times enough to make it with only compiler changes. And the output will be not optimal with Intel CPUs and optimal for Opterons, A64s, Turions and Semprons.
When Intel was the only one who could play the rework the compiler game, you didn't have a problem with that. Now that it has a competitor that has the same capability and good reasons for doing so, you now have a problem with it. Smacks of hypocrisy. Tough!
Pete
Wbmw:
On non-intel compilers or Intel compilers on non intel CPUs, you must use peak scores. Since for Intel CPUs on Intel compilers, base=peak almost always. So you must use the Peak scores for any valid comparisons.
The Icc 8.0 versus 9.0 even takes into account memory speed and FSB speed changes. If you don't believe, get the icc7.1 vs 8.0 scores on the same CPU/MB. Whoops, same hardware has higher score on newer compiler. If you use typical clock scaling factors, the difference is even greater. You can look at the Hammer Whitepapers for a SPECint2000 chart comparing Clawhammer K8 versus P4s at different FSBs (100QSB, 133QSB, 166QSB). FSB scaling is quite low too. Latency and on die MCTs scale best.
To use your logic all of digit-life's estimates are way off base and completely irrelevant. As your touting of them.
As to K10's launch date, K8/K9 already is more powerful than Conroe else, Intel would not be touting performance per watt, but performance. Besides, Intel has not been to prompt at making releases more than 12 months into the future. They seem to cancel some, slip some more and change the targets of the rest. Yonah may not even beat Taylor to volume availablity. Remember Tejas and the 4GHz P4?
Pete