Register for free to join our community of investors and share your ideas. You will also get access to streaming quotes, interactive charts, trades, portfolio, live options flow and more tools.
Register for free to join our community of investors and share your ideas. You will also get access to streaming quotes, interactive charts, trades, portfolio, live options flow and more tools.
EPoX Socket939 board. Note the interesting addition of 2 more SATA connectors next to the AGP port - for a total of 6!
Nice board. I have always liked Epox boards, although I seem to bounce between Epox and Asus. Looks like it is getting close for me to upgrade to an A64. I was holding out for 939.
I dunno about the summer being a drag. AMD and OEMs have lots of product news due out in the summer timeframe:
I think all the things you list point to a big pop late in the year and strength in 05. I think we'll get a short run up soon before Q1 earnings and maybe after depending on the results, followed by a period of being flat through Q2 and part of Q3, and then a long run up going all the way to Q4 earnings. My strategy at this point is to just hold. I have accumulated enough.
So my positions are not in contradiction, they have different time horizons.
I'm in the same position as you. Now is a very good time to accumulate for long term either with shares or leaps, which I have already done. Short term I would be wary, although a Q1 earnings surprise could overcome market weakness. I expect the summer is going to be a drag.
That could be the most useful benchmark for us, even though the code will be no more than 50 lines
I have a custom hashtable library that I could run a similar experiment on. If I find some time, I'll try it out.
New qsort benchmark results.
Last night I posted some results that were using mixed versions of gcc for 32-bit and 64-bit. I have redone everything with gcc 3.2.3 so the comparisons will be better. I also screwed up the % calculations. The % below are % faster or slower than the baseline. I used (baseline/time - 1)*100. In other words, a 50 second run is 100% faster than a 100 second run.
I have also added a 64-bit integer sort into the mix as well as runs on an Athlon MP 2400+ at 2.0Ghz with 3G memory. The Opteron's are 246 at 2.0Ghz with 4G memory.
machine o/s app int opt file sort total % faster
size compile? read time time
-----------------------------------------------------------------
AthlonMP 32b 32b 32b no 54s 55s 109s -12%
Opteron 32b 32b 32b no 47s 49s 96s baseline
Opteron 64b 32b 32b no 44s 49s 93s +3%
Opteron 64b 64b 32b no 36s 78s 113s -15%
AthlonMP 32b 32b 32b yes 53s 49s 102s -17%
Opteron 32b 32b 32b yes 46s 39s 85s baseline
Opteron 64b 32b 32b yes 44s 38s 82s +4%
Opteron 64b 64b 32b yes 36s 34s 70s +21%
AthlonMP 32b 32b 64b no 67s 83s 150s -17%
Opteron 32b 32b 64b no 58s 66s 124s baseline
Opteron 64b 32b 64b no 58s 66s 124s 0%
Opteron 64b 64b 64b no 37s 59s 96s +29%
AthlonMP 32b 32b 64b yes 67s 64s 131s -24%
Opteron 32b 32b 64b yes 58s 41s 99s baseline
Opteron 64b 32b 64b yes 58s 41s 99s 0%
Opteron 64b 64b 64b yes 36s 29s 65s +52%
if not most programs the advantage of more registers is completely counterbalanced by the fact that there is code bloat
BTW - that past presentation by Kevin McGrath (sp?) from AMD showed the average instruction size only increased around 10-15% when going from 32-bit to 64-bit. I don't remember the exact number. Anyone?
therefore QSORT would get no benefit from the extra registers
Judging by the c-code I used, I think there may in fact be some benefit to using extra registers. How else do we account for the fact that the optimized compile was significantly faster? I think wbmw may have suggested the 32-bit compiler is not as good as the 64-bit compiler at optimizations. This could be a factor. It is too bad we don't have a compiler that is identical other than enabling 64-bit and extra registers.
Anyone working on a thesis? This might make a good paper. I just don't have the time to pursue this much.
All I know is I can get significant gains in performance by using an Opteron running 64-bit Linux, even if I don't recompile the 32-bit app. That says something loud and clear. It says my company is going to buy a lot more Opterons.
what was the size of the integeres you sorted in both modes?
Both were 32 bit integers. 'int' is 32 bits in both 32-bit and 64-bit. Long and pointers track the arch. I'm considering doing a 64-bit integer sort as a comparison. I'll try it out later and report. It looks like I can use 'long long', as that is 64-bits in both 32-bit and 64-bit mode. The compiler is supposed to emulate 64-bit on 32-bit platforms and go native on 64-bit. We'll see if that holds true or not.
PS, by habit I always use long instead of int when I know that I need 32 bits.
Better change your habits! Long is now 64-bits in x86-64. From the gcc man page:
-m32
-m64
Generate code for a 32-bit or 64-bit environment. The 32-bit
environment sets int, long and pointer to 32 bits and generates
code that runs on any i386 system. The 64-bit environment sets
int to 32 bits and long and pointer to 64 bits and generates
code for AMD's x86-64 architecture.
It seems the Microsoft scheme will likely be inelegant compared to the Linux scheme.
I think the WOW performance degradation will be made up for by much faster system and driver code. The WOW layer for AMD64 is very small compared to other WOW implementations. We'll just have to disagree for now until we see what the production o/s with production drivers can do. It may be difficult to compare unless the graphics drivers are of identical base version.
It qsort is also a very small program relatively speaking so
the deleterious effect of AMD64 code size expansion won't
be readily apparent with a 64 KB icache
Good point. Yes this application fits entirely within the L1 cache. My apps are generally in the 1-4MB range, and they all run better. Maybe I'm just lucky. Your theory sounds reasonable, but I have not found it in practice.
What makes your app more "real world" than, say, Sandra or PCMark?
Like I said, I posted this because it confirms what I have found on all my real world apps. I have a wide variety of applications, from databases to user interfaces. Some of these are quite complex. Some are not. I have yet to find an application that runs slower going from the 32-bit version of the o/s to the 64-bit version on the same hardware without a recompile or an app that runs slower after an optimized recompile.
I'm hardly disappointed.
I would be glad to take any publicly available c-program and do the same experiment. Let me know if you have a real world app you want me to try out.
The extra registers don't make that much of a difference, since most of the bottle neck has already been removed from them with the advent of register renaming. If you are expecting >0% improvement in *all* 32-bit apps, and >>0% in 64-bit apps, you are going to be in for a huge disappointment.
Sorry, but the extra registers make a huge difference, even with seemingly mundane code.
Maybe my last post will change your mind (the one with benchmarks I ran myself), but I doubt it as you'll site only a limited test. I posted this because it confirms all the other findings I have come up with recompiling other applications for x86_64. I can't post these here as they are proprietary, but I know I'm not going to be in for a huge disappointment. I'm using AMD64 systems today, with great success, and I haven't even hit on using the 64-bit capability. My current set of apps only benefit from the extra registers.
I'll say it again. I think the Windows x86_64 port is going to show a marked improvement in performance over today's 32-bit version, even without recompiling your application. The O/S benefits enough from x86_64 enhancements that it will even speed up 32-bit apps running under it. My reasoning that Windows will benefit more is that the Windows system makes up a great proportion of an application than Linux.
Well I just ran some simple tests on Linux for grins.
I compiled up an off-the-shelf standalone qsort c program using gcc and various targets. I sorted 100M randomly generated integers, and I used the same set of integers for each run.
The systems are identical other than one is using x86-64 Linux. The base versions are the same. Running Opteron 246's with 4G of memory. This benchmark is not memory constrained in any way.
I took the average of 3 runs for each. The deviation was around 1/4 second for each run. Note that a significant portion of the time below is just reading in the file that contains the 100M random integers. I intend to benchmark again with this factored out, as it will probably multiply the percentages below by a factor of 2.
o/s app optimized runtime performance
compile? increase
----------------------------------------------------
32-bit 32-bit no 97 seconds baseline
64-bit 32-bit no 94 seconds +3%
64-bit 64-bit no 110 seconds -13%
32-bit 32-bit yes 84 seconds baseline
64-bit 32-bit yes 81 seconds +4%
64-bit 64-bit yes 70 seconds +17%
Do you want your wife and kids in a car without air bags? Do you want your company's business depending on computers that don't have NX protection?
Excellent analogy. AMD needs to market the heck out of NX. It doesn't have to be marketed as a catch all protection, just an added protection that the competitor doesn't have.
I suspected so from the beginning, but the benchmarks have more or less backed me up.
I think when we get to a production o/s and production graphic drivers, EVERY app, including existing 32-bit apps, are going to run better on the 64-bit o/s. The graphic driver itself will benefit greatly from the extra registers. The o/s is used a lot for performing system functions. It will also have a benefit, and I think this benefit will outweigh any WOW conversion.
So even without a recompile, I think existing 32-bit apps will run better on Win64. A recompile should help almost every app out there due to the extra registers, not the 64-bitness.
I'm basing this assumption on my direct observations in a Linux environment. I have access to identical Opteron systems other than running various versions of Linux, and I find the 64-bit Red Hat version to be the fastest for my existing 32-bit applications. Every 64-bit recompile I have done has resulted in significant gains (20-30%), even for mundane applications. I think the gains to be had in Windows are even greater as the o/s is not as lightweight. The Windows o/s itself should be significantly faster.
In any case, I'm not even sure it matters as it looks like AMD is going to have the top 32-bit performance slot for all of 2004, and maybe beyond. 64-bit is just extra at this point, and it should work well for selling Linux servers/workstations. It already has.
Revenue 598 662 740 967
Joe - I hope you are on to something here. What I really like is Q2 revenue would actually be up in your scenario due to higher k8 ASPs even though total units are down which is typical for Q2. If it worked out this way, I think the stock would get a nice boost.
Even more stunning would be if your Q4 prediction held true. Q403 was $581M with about 7M units. 3M more units would add about $100M to the costs, but your revenue increase is $386M YOY. That is $286M more profit for the chip group, for a total of $349M. That is approaching $.80/share just from that group. That ought to put the stock well over $50!!! If a flash recovery takes place as predicted, this could be stock of the year.
I'm currently not as optimistic as you, but I wouldn't call your prediction unreasonable. AMD has a great lineup for 2004, and as long as the markets cooperate, AMD has nowhere to go but up. I'm holding all my shares until at least Q4 earnings. I think we'll all be rewarded in due time.
AMD's CPU revenues should grow at least 5% as Intel's fall 8%
Seeing as AMD is larger than Intel by the ratio 8/5 you mean?
He wasn't talking about units. Q1 is seasonally down in revenue for both. His statement makes no assumption about how much unit market share is shifted. I actually think the unit share will be roughly the same, but AMD's ASPs are trending up, and Intel's will be trending down, resulting in a revenue share increase for AMD.
I don't see why people are still so caught up about this. The "dumb consumer" didn't seem to have too much trouble choosing the low-gigahertz Centrino over AMD's artificial modelhurts Athlons.
Yes, but Centrino had a billion dollar marketing push built around its wireless capabilities, not performance. AMD doesn't have that luxury. Most consumers look at numbers and compare them without knowing what they mean. The best AMD can do is modelhertz, and hope for a future where they can have a larger marketing push. I don't like modelhertz as a technologically smart consumer, but I like it as an AMD investor.
It is too bad AMD doesn't have a billion to burn right now. They could launch a huge marketing campaign on how their AMD64 processors block viruses and worms, and their competitors chips don't (yet - they better hurry). Even the everyday Joe knows about computer viruses. This could be huge for them, but it probably won't because they don't have the money, and they don't seem to have the marketing drive. Heck, just calling the processors something like 'Athlon 64 with VirusBlock' ought to gain a few customers.
30% to 35% improvement is a good ballpark.
I doubt they'll get this right out of the chute. I would look for only one speed grade higher and some overlap with the old models. I bet we'll even see a 2.6Ghz 130nm part. The first 90nm parts will probably be in the 2.6Ghz range.
I would, however, look for significantly lower power at the same frequency, which will make Prescott look even more embarrassing.
Anand retracts FX-55 in May...
Bummer. It would have meant an early arrival of 90nm or good bins on 130nm. From a performance standpoint the FX-53 should still hold the crown.
FX line is also multiplier unlocked... and even now, the FX-51 isn't that far ahead of the 3400+ on most apps.
True, but if you look at this roadmap from 2 months ago from Anand, the FX-53 (2.4Ghz) and A64 3400+ (2.2Ghz) were supposed to come out at the same time in Q2 04:
http://www.anandtech.com/cpu/showdoc.html?i=1947
A64 3400+ came out early, but the FX-53 did not, so they became misaligned. I don't think FX-51 was supposed to be that close to the top A64 in performance. FX-51 is getting kind of old now.
I think AMD's intention is to keep FX ahead one speed grade in addition to the extra cache, and they are getting it back in line with the FX-55.
I too am surprised by how early it is (originally targeted for Q4), but I think having a part out there that leaves no question as to being the top performer is a good marketing tool, even if they don't sell that many. A64 will benefit from this as well as the fallback parts.
edit: I was a little confused by Anand's older roadmap until I realized there were seperate 3400+ listings in the 754 category. So maybe AMD didn't intend for FX-53 to be out at the same time as the original 3400+. I still think they need the FX line to be a clear leader, and to me that means an additional speed grade.
I think I see why the FX-55 is now in May.
FX-55 (2.6Ghz) 1M May 04
A64 3800+ (2.4Ghz) 512K May 04
A64 3500+ (2.2Ghz) 512K May 04
A64 3700+ (2.4Ghz) 1M Apr 04 (socket 754)
FX-53 (2.4 GHz) on 754 in March
FX-53 (2.4 GHz) on 939 in May
I didn't think FX was going to hit the 754 socket.
2.6 Ghz FX-55 in May 04.
I didn't think we'd see 2.6 Ghz until 90nm. So this is one of two good things if true. One being that 130nm can bin 2.6 Ghz parts, and the other being 90nm parts are coming earlier than Q3 and this will be one of them. I'm not even sure which I would prefer to be true. Probably 90nm, as it would indicate that 90nm is ahead of a delay schedule, and it is binning at least as good as 130nm (unlike Intel's initial 90nm parts).
4P in Q2.
Thanks. I missed that. Very nice to see 2 of the top 4 server oems will have 4P Opterons in Q2.
My question is - if we bring together promises, who is the first tier-1 oem to come with 4P system?
Scorecard:
HP - They already have a 4P model (DL585) up on their website with information, and say it will be available Q2 04.
Sun - They said this in a november press release: Future AMD Opteron Processor-based Designs: Sun and AMD will collaborate on a portfolio of future AMD Opteron processor-based systems and scalability beyond 4-way AMD Opteron processor systems. The parties will also collaborate on coherent HyperTransport™ technology implementations.
IBM - They haven't said anything in regards to a 4P server.
To me it looks like HP will have the first 4P. They came on the latest, but strongest. Sun will probably end up with the first 8P or higher system.
Q404 results which should be a blinder as 90nm K8 and leaner Flash operations coalesce, 300-400m profit I would guess.
Now that would be something. My expectations are not as high for Q4 (200M profit), but I plan on holding all my long term shares until at least next January.
With a possible $.50/share profit in one quarter, AMD is primed for a major revaluation upwards. I'm hoping to see all time highs before I sell. At $50, I could declare myself an AMDillionaire! Wishful thinking on a bleak day.
sgolds- I'm glad we cleared all that up. It was just a different use of the semantics. A couple more notes:
'Segmentation' refers to two very different things in x86, so that further confused things. In the older form, 'segmentation' meant shifting the segment register and adding the offset to generate the 8086 physical address.
This is different from Protected Mode segmentation. In that use of the word, the segment register holds a selector which is an offset (with some lower bits masked) into a descriptor table.
Protected Mode segmentation is the more complicated version in hardware. It uses hidden registers to cache away part of descriptor tables. Loading the segment registers actually triggers loading these hidden registers from memory. The hidden registers make address translation more efficient. I still wouldn't use the word "emulation", because the segmentation logic for protected mode segmentation is exactly the same as it is on older processors. Actually it is slightly more complex, as there is an additional configuration bit in the descriptors to distinguish what mode the code is in (64-bit or compatibiliity).
This supports direct sharing of memory between two (or more) processes. Since each process has its own 64-bit flat memory area (dependent on the page table), if one process wants to share a pointer to a memory area with another process then this enables that capability.
Thanks for that tidbit. I hadn't considered that.
Production wafer starts in April, output in July.
Excellent!
That is what I have been waiting to hear: specific dates for 90nm. Although significantly delayed from the original schedule a long while back, it sounds like further slips are not likely.
This was one worry I had about my AMD investment, and it is less of a worry now. I won't be 100% relieved until parts are shipping with good results (higher frequency, lower power).
sgolds- I see we have made progress! A couple of points:
in Compatibility Mode, the processor puts an emulation layer on top of the 64-bit flat mode addressing
Emulation conjures up a picture that requires no hardware support, but that is not the case. It is in fact a complete hardware implementation of segmentation in compatibility mode.
2. The emulation works as follows: 32-bit (and 16-bit, if supported by OS) apps go through an emulation layer which uses a descriptor table to translate the segmented address to a virtual address.
Again I wouldn't use the word emulation. Segmentation is handled by the processor in dedicated hardware.
3. Compatibility Mode differs from Legacy Mode in this way: In Legacy Mode, the virtual address is the physical address.
You are referring to the legacy mode "real addressing" used in Real Mode. That can in fact be dropped when Legacy Mode goes away. That is a plus. See page 12 of Volume 2 for a diagram on how real mode addressing uses the segment registers in a different way than normal segmentation.
In Compatibility Mode, the virtual address is then put through the 64-bit logic of paged memory to convert to a physical address. This is what I mean by 'emulation layer' - the underlying processor mechanism is the paged memory system, the segmented memory is converted first into a virtual address fed through the 64-bit logic.
I see. You were talking about virtual address translation to physical memory (paging), and I was talking about the segmentation logic. I didn't touch on the paging mechanism because that is not changing other than the o/s can map a 32-bit application's 32-bit address space to anywhere in 52-bit physical space. This does not affect segmentation in any way. Take a look at Volume 1 pages 14-15. It has a nice set of diagrams showing the memory management in each mode. The only difference between compatibility mode and protected mode is in the virtual address to physical address translation.
First, Legacy Mode can be removed. Second, Compatibility Mode can be removed (although this will take a number of years). Then segmentation is gone.
I thought your original argument was that segmentation would be gone when Legacy Mode was dropped. Compatibility mode is going to be around for a long, long time.
Then the hardware will have no segmentation at all.
One more thing I picked up in reading. 64-bit mode still supports a form of user segmentation through the FS and GS registers, although it doesn't do any limit checking. I'm not sure why anyone would want to use this, but it is in there. See pages V2 86-87.
sgolds-
Re: segmentation - maybe we should take this offline - I'm sure everyone else is bored with this...
From the operating-system viewpoint, however, address translation, interrupt and exception handling, and system data structures use the 64-bit long-mode mechanisms.
This statement refers to the o/s viewpoint, not the application or processor viewpoint. I think by "address translation" they mean virtual address and paging mechanisms, not segmentation. Remember the o/s isn't having to deal with this segmentation. This is all before the virtual address is calculated. A 32-bit app running in compatibility mode can set up its own segments if it wants to, as from the application standpoint, it looks as if it is running in 32-bit protected mode. It will be using processor support for segmentation (registers, descriptors, limit checking). Most 32-bit apps no longer use segmentation, but the support remains.
Please tell me if there is a specific section within that chapter which I should be looking at.
Sure. Check out these from Volume 2:
pg.6:
"The elimination of segmentation allows new 64-bit system software to be coded more simply, and it supports more efficient management of multi-processing than is possible in the legacy x86 architecture. Segmentation is, however, used in compatibility mode and legacy mode."
So the 64-bit o/s complexity goes down as it doesn't have to deal with segments, but an old 32-bit app might still use segmentation when running in compatibility mode under a 64-bit o/s.
"In compatibility and legacy modes, up to 16,383 unique segments can be defined."
Also refer to Figure 1-1 pg. 7. It shows what the hardware does to calculate the virtual address. The effective address from the application is added to the base address from the descriptor table, and then a limit check is performed to be sure the virtual address is within the segment. Fortunately the descriptor tables that are in memory are loaded into hidden registers on the chip to allow quick access (see pg 84 for details) for performing a virtual address calculation.
pg. 16:
"Compatibility mode, like 64-bit mode, is enabled by system software on an individual code-segment basis. Unlike 64-bit mode, however, segmentation functions the same as in the legacy-x86 architecture, ..."
pg. 77:
"In compatibility mode, segmentation functions just as it does in legacy mode, using legacy 16-bit or 32-bit protected mode semantics"
pg. 107:
"System software running in long mode can execute existing 16-bit and 32-bit applications by clearing hte L bit of the code-segment descriptor to 0. ... Segmentation is enabled when L=0."
pg. 138
"Except in 64-bit mode, limit checks are performed by all instructions that reference memory."
I don't know how much more clear I can make it. Compatibility mode uses all the legacy x86 hardware features that support segmentation, which include the segment registers, loading of hidden registers that save parts of the segment descriptor tables for quick access by the processor, and segment limit checking.
Help me understand. Why did he have to sell?
He didn't have to sell, but if he didn't he would have had to cough up a cool $2.7 million to pay for the shares from the exercise.
The only reason for exercising options for a $2/share gain would be that they are expiring. He could have exercised and sold at $18 not too long ago. He probably thought it would be higher by expiration, but it didn't work out that way.
It could be nothing.
Probably options expiration as others have pointed out. If you look at his exercise price of $13.44 and see that he sold at $15.62-$15.68, that is the only thing that makes sense. Who would exercise and sell options for only a $2/share gain unless they had to?
I bet he is annoyed he had to sell at $15. AMD should be trading above $20.
sgolds-
Not true. Real Mode and Virtual 8086 mode are not supported.
Yes, but Protected Mode is supported, and Protected Mode still supports segmentation, and it uses hardware support (registers and tables) to implement it. Many apps don't use this, and use a flat memory model, but the fact that it supports it means it can't be designed out.
The segmentation which is supported in Compatibility Mode is an emulation layer which translates to the same 64-bit mechanisms used in 64-bit mode. The fundamental design of Long Mode is not at all dependent on any Protected Mode segmentation.
That is not how I interpreted it. Read Chapter 4 in Volume 2 (System Programming Manual), and let me know if you still think this is the case afterwards.
The dropping of Legacy Mode would still be a good thing. I'm sure that simplifies quite a few things, but segmentation will remain. It probably isn't that big of a deal though other than the chip designers have that much more to verify as functional. Based on what I read, the segmentation logic is completely bypassed when the bases are set to 0, which is going to be 99% of the time in today's software.
Here Michael Dell says it makes sense to offer both choices, but never says he WILL. It's Dell at his best
If he indeed made this statement, it is the best statement he has ever made towards AMD even if it is noncommittal, and it may signify a change in the Dell/AMD relationship. He usually says something along the lines of AMD chips not being reliable or desirable as an excuse for not using them.
From the operating-system viewpoint, however, address translation, interrupt and exception handling, and system data structures use the 64-bit long-mode mechanisms.
I understand everything you said, and I agree it is simpler from both the app and o/s standpoint, but from a chip design standpoint, all the segmentation logic is still active and will not be able to be designed out of future chips. That is how this whole discussion started. 64-bit compatibility mode supports all legacy forms of segmentation, even if modern apps and os-es tend to use a flat memory model. Segment base values of 0 disable segment translation resulting in this flat model, but the hardware still supports non-0 values in compatibility mode, even if it goes unused by programmers.
Also, the memory protection you reference is now provided by a page protection mechanism, not segmentation. In long mode, there are no longer any segmentation protection checks.
> x86-64 long mode does not support many of the things
> you listed, save for variable length instructions.
Actually I think it supports all of them!
Well you got me there. I read some more of the manuals, and you are mostly right. Some of the things I assumed were defunct or changed in x86-64 long mode (like the top-half-of-register stuff), are still alive and kicking. I guess changing some of these behaviors would have caused drastic compiler changes, so it was decided not to do it. They took the easy route, which is sort of a shame, but I can't blame them too much. Stirring up the pot too much might have resulted in more resistance in adoption.
By the way, some of old instructions, including the two you listed, are no longer valid in long mode. That is of little consequence compared to some of the other stuff that is still in there.
So x86 sucks. I hate to even play a side to it, but I do because I'm investing in AMD. The thing is, it doesn't really matter that it sucks, because it has such a foothold, and even with all the legacy crud, processors based on it still manage to churn out excellent performance. This may not hold true forever, and I recognize that.
I presume you mean Compatibility Mode within Long Mode (there is no '64-bit compatibility mode').
Yes that is what I meant.
In AMD64, Long Mode totally gets rid of segments.
A quote from page 77 of the system programming manual (beginning of chapter 4):
In long mode, the effects of segmentation depend on whether the processor is running in compatibility mode or 64-bit mode:
- In compatibility mode, segmentation functions just as it does in legacy mode, using legacy 16-bit or 32-bit protected mode semantics.
- 64-bit mode, segmentation is disabled, creating a flat 64-bit virtual-address space. As will be seen, certain functions of some segment registers, particularly the system-segment registers, continue to be used in 64-bit mode.
My original message was based on table 2-6 from the section you had pointed out. I just went back and found the above, which is more clear.
x86 legacy lives on. long live x86!
Oh well. I'm not sure it matters much for now. Long term some of this baggage needs to be shed, as it is a disadvantage.
Floating point addendum: Volume II, chapter 2, lists differences between x86 and AMD64 architecture
From this reference manual, it looks like segmentation stuff still has to be included even in 64-bit compatibility mode. Yuck.
It looks like a lot of other junk also falls into this same bucket. Some stuff can be jettisoned with legacy mode, but not all.
I guess in 10 years there might be a cleaner x86 implementation.
I think I need to spend some more time reading this manual. It points out many of the fine details that shoot down some of the broad statements myself and others have made (on both sides of the argument).
x86-64 would have been an ideal opportunity. It's been missed.
x86-64 long mode does not support many of the things you listed, save for variable length instructions. These things can't be dropped, however, until legacy 32-bit apps are no longer important. So a phase out can happen in 2 phases.
Phase 1, drop legacy modes. This can happen when everyone has moved to a 64-bit o/s, or at least at the point when AMD and Intel don't feel the need to support older 32-bit o/s-es with their newest chips. This is probably at least 5 years off.
Phase 2, drop 64-bit compatibility mode, which allows running 32-bit apps under a 64-bit o/s. That is at least 10 years off by my guess. Look how long windows had to carry around support for 16-bit apps when the move to 32-bit mode took place.
After phase 2, pure x86-64 long mode doesn't have much baggage left.
Is the segmentation such a big deal? Isn't it just adding some registers together to get an effective address?
Address calculation in a segmented world adds to the time it takes to get a lookup address, so it is likely multiple hardware paths have to be implemented to allow both flat and segmentation modes to work optimally. This adds to the complexity of the implementation.
You point out some good cruft that is limiting.
x86 will never lose some of those things like variable length instructions, to which a couple of pipe stages are dedicated to scanning and aligning instructions for decode. I guess the benefit of more efficient use of cache/memory for instructions works in its favor. Intel gets around this a bit by having the trace cache, but that just points out even clearer the x86 baggage.
Some of the other stuff can be phased out. Many old instructions are done in microcode, so there is little affect on the hardware, although these would still have to be verified as functional, so it adds to the design cycle time. I'm willing to bet many of the old instructions have been declared as don't use by new compilers or even current compilers. Of course there may be tons of existing 32 bit apps that use these. The true phase out comes when there are no longer any 32 bit x86 apps, but that is 10+ years away, if we are lucky.