Register for free to join our community of investors and share your ideas. You will also get access to streaming quotes, interactive charts, trades, portfolio, live options flow and more tools.
Register for free to join our community of investors and share your ideas. You will also get access to streaming quotes, interactive charts, trades, portfolio, live options flow and more tools.
90nm AMD64 mobile shipping now, desktop in 1 month, then server.
http://www.siliconinvestor.com/stocktalk/msg.gsp?msgid=20405980
90nm Goldman quotes:
"1) AMD began shipment of AMD64 on 90nm this week, and expects to reach 10% share in x86 servers by year end"
...
"AMD is one of the few companies on 90nm that does not seem to have had significant delays or defect issue.[sic] Revenue shipments of AMD64 notebooks on 90nm started this week, well within the planned schedule for shipments prior to the end of Q3. Desktop AMD64 shipments on 90nm will commence a month later, followed by servers. An on-time transition to 90nm was one of the keys to our thinking in upgrading AMD, as it will allow it to cut prices in line with Intel while aggressively shifting mix to higher margin parts. AMD remains on track for reaching 50% of total MPU revenues on AMD64 by year end, and crossing-over in terms of units in Q2 05."
Very good stuff. Lots more on server share growth (up to 20% in 2005, 10% by year end given design wins already in place, etc, etc. )
Goldman: AMD began 90nm AMD64 shipments this week!!!
http://www.siliconinvestor.com/stocktalk/msg.gsp?msgid=20405774
Doug
Hey there, Science, they didn't measure CPU + northbridge. They measured system consumption. Petz already explained there are other factors-- inefficiencies in power supply, TURBOJET fan requirements on Prescott, etc.
But I suppose they could have measured 5kW, and you'd still try to justify the results.
But they didn't, and you can't deal with the results they did measure, which agree with previous experiments of this nature.
Maybe you should play a few more rounds of this:
http://www.commandercooler.com/
<GGG>
Anand review done right: Nocona 3.6 vs. Opteron 150
http://www.anandtech.com/linux/showdoc.aspx?i=2163
He managed to get all the broken Intel-only assembly out of certain benchmarks, and properly optimize for both systems, as well as choosing a more appropriate, larger set of benchmarks, and picking the right processor to compare with the Nocona 3.6.
Nocona inches ahead in one or two purely synthetic (runs in L1) tests, and gets buried in everything else. And this is just with 1P systems.
upc
Um, the northbridge?
But you can play the numbers game all you want. I don't need your exaggerations
LOL! You don't like what zdnet *measured* as power consumption, eh? Sorry! <GGGG>
His point is that AMD's TDP figures are provided for an entire family of parts. The 130nm number of 89W is intended to cover 130nm parts up to 2.6GHz.
I remember certain individuals insisting this was not the case back at the Opteron launch.
upc
Off the top of my head (since I don't feel like looking up the numbers myself), Prescott ought to be around 90W at peak. You're telling me that Athlon 64 at its peak is less than 22W?
LOL!!! 90W at peak? I can see why you don't want to "look it up". You'd be staring at 115W "TDP" (Intel-style), and something close to 130-140W "peak".
Now, add to this that the 68W difference was for the whole systems.
upc
Yes, they will be D stepping, so, while that is not sufficient (they could update 130nm to D stepping too), it would be good evidence.
upc
most obvious is that Prescott doesn't dissipate 140W, as the graph suggests for a 0.1 micron processor,
It sure does, at max power.
upc
Leakage affects all 90nm processes, including SOI.
But some more than others, and Intel hit the leakage jackpot with strained non-SOI 90nm, didn't they?
upc
That's the dumbest thing I have heard all week.
I find it difficult to imagine you've been silent for an entire week.
upc
Sleep mode? We're talking mostly about leakage under max load. The Dothan trick of shutting down parts of the chip is fine for editing documents, but look what happens to power and battery usage under load.
upc
You're right, it probably affects most of Intel's 90nm non-SOI process parts.
upc
Didn't you see slide 4? It shows exactly what is wrong with Prescott: tremendous leakage, leading to vastly increased total power requirements.
upc
and most of the time, it will be in the range of 30-60C degrees,
This is Prescott we're dealing with.
Try 70C+ under load.
And see slide 4.
Intel:
at 90nm, ~50% of total power is "active leakage power".
at 130nm, only ~28% of total power is "active leakage power".
Active power increases moderately from 130nm to 90nm, but the leakage increase just ruins Prescott.
upc
Well, his direct answer to your question was incorrect, which, I'll grant you, is 'interesting' ;)
upc
Grantsdale broken, doesn't work right with ATI and nVidia PCI-Express graphics cards.
http://www.theinquirer.net/?article=17809
upc
Leakage comes from current going through the transistors when they are in the off position. So no, it does not increase the harder a processor is worked.
Have you considered that it might be dependent on the *temperature* of the material? (It is. Heavily dependent, so your statement is incorrect.)
Power leakage comes from other sources as well.
upc
You may find this interesting:
http://eda.ee.ucla.edu/EE201A-04Spring/leakage_pres.ppt
And if they have dual core Prescott ready to go, would you guess what the power budget of such a beast would be?
Let's see. Maybe... 220W idle, 380W under load on that Prescott dual core system? <GGG>
That is just so beyond credible that it's silly.
Tell that to the power supply.
We've seen previous tests like this, with nearly identical results. Prescott leaks power like crazy.
upc
Hey sorry it works out that way, but that is in fact the case.
As for super_pi, they had no access to any source code, or build configuration, just a random binary, so it is not a reasonable test. It probably isn't even 64-bit. And who knows what processor it was optimized for?
upc
But its pretty clear that Intel has fairly good 64 bit performance from these benchmarks.
It is?
Perhaps you haven't kept up with the "evolution" of the results:
(which of course, compare ONE Nocona 3.6 to ONE Newcastle 3500+)
Lame: tie
Gzip (32bit??): tie
Pov-Ray: Nocona almost 40% slower
MySql: Nocona 4-11% slower
primegen: BOGUS, spends 75% of the time in putchar()
super_pi: BOGUS, no source code, unknown bitness, unknown optimizations.
TSCP: Noncona 25% slower
ubench: BOGUS? Buggy code noted by another review. Ancient codebase.
Encryption("John the Ripper"): completely BOGUS -- hand-tuned assembly for "Intel"-named cpus only.
So, where do you find *any* support for your claim?
upc
No, the big picture was that the guy doesn't know jack about compiling software on Linux.
He wrote a bogus Makefile, penalizing the K8 by 50%
He ran "John the Ripper" without looking at the source code, and noting hand-tuned assembly routines which are only called if the CPU has the word "Intel" in the name. So that one's completely invalid.
On top of that, he miscopied an earlier database result, again, against the K8.
Then he formulated a bogus conclusion.
The average user is not running miscompiled, largely irrelevant software.
upc
It will. It turns out ANOTHER benchmark was bogus:
The encryption benchmarking ("John the Ripper") is invalid.
See here and here:
http://www.aceshardware.com/forum?read=115094007
http://www.siliconinvestor.com/stocktalk/msg.gsp?msgid=20396039
Hand coded assembly routines looking for "Intel" in the name of the cpu. D'oh!
upc
Anand Noncona review has been partially updated, and he's working on more followup, and planning to run tests on an Opteron 1xx system ASAP.
TSCP Athlon64 score is now ~320K instead of ~155K.
upc
The fact that the design does not support an integrated memory controller is bad news for Intel. It probably suggests that BTX does not have a long future ahead of it, unless Intel intends to continue to leave performance on the table.
AMD systems are already much quieter than their Intel counterparts, because they don't require a turbojet to remove the excess heat generated by Prescott.
upc
What Anand's "primegen" was actually measuring:
*locking* and *unlocking* of the thread-safe version of putchar() was the bottleneck.
switching to unlocked putchar made the benchmark run twice as fast.
commenting out the putchar stuff entirely resulted in another factor of 2 faster.
So:
50% of time involves locking.
25% of time involved input/output
25% of time was actually doing arithmetic, calculating primes.
Gosh, I wonder if The Prescott New Instructions MONITOR and MWAIT have anything to do with the selection of this benchmark, and the performance of Nocona?
http://www.aceshardware.com/forum?read=115093892
upc
Turns out their makefile was broken, and was not feeding -O2 to gcc!
Going from -O1 to -O2, k8 scores double on that chess benchmark.
upc
Anand's TSCP makefile was broken:
D'oh! That reviewer should stick to running Doom3 timedemos.
http://www.aceshardware.com/forum?read=115093869
Ok, his Makefile is completely f*cked up...
By Foo_ on Monday, August 9, 2004 2:18 PM EDT
I have found why he has so terrible results. The Makefile he gives for TSCP does not optimize (-O2 flag) during the compile stage, only in the linking stage where it's useless.
See :
$ cat Makefile
SRC = main.c board.c book.c data.c eval.c search.c
BIN = tscp
OBJ = $(SRC:.c=.o)
CC = gcc
CFLAGS = -ansi -pedantic
OPTIMIZE = -O2
$(BIN):$(OBJ)
$(CC) $(CFLAGS) $(OPTIMIZE) $(OBJ) -o $(BIN)
%.o: %.c
$(CC) -c $< -o $@
.PHONY: clean lint
clean:
rm -f $(OBJ) $(BIN) *.core 2> /dev/null
lint:
lint $(SRC)
$ make
gcc -c main.c -o main.o
gcc -c board.c -o board.o
gcc -c book.c -o book.o
gcc -c data.c -o data.o
gcc -c eval.c -o eval.o
gcc -c search.c -o search.o
gcc -ansi -pedantic -O2 main.o board.o book.o data.o eval.o search.o -o tscp
$
With the broken Makefile I only get 140 k n/s, instead of 281 with the intended optimizations applied during the compile stage.
* Prescott is hot
So what? They've managed to cool it, albeit at much lower frequencies than originally planned.
You're kidding, right?
This is a major problem for OEMs.
Didn't you see the data posted here recently?
A Prescott system AT IDLE uses nearly as much power as an A64 system at MAX LOAD.
And at MAX LOAD, the Prescott system is using (258/162: ) 60% (!!) more power than the A64 system. 60%!
http://www.investorshub.com/boards/read_msg.asp?message_id=3759241
upc
I suspect that compile is broken. the numbers are too round. Also, 64bit gcc doesn't add much to 32bit gcc scores, which isn't surprising for this benchmark.
upc
I looked, but a lot of those comments actually miss the point. While there is a problem with choosing the 3500+ instead of the Opteron 150, the *real* problems are that the results are completely bogus!
Look at this: Duplicating his compiler options TSCP, OS, everything, here is nearly twice the score on a slower A64!
http://www.siliconinvestor.com/stocktalk/msg.gsp?msgid=20393763
That score of 285K blows away the Nocona! (258K)
upc
Is that still on Windows? Which compiler? Why did the scores vary so much run-to-run in that one?
Can you repeat?
upc
Same score on 3200+ here:
http://www.siliconinvestor.com/stocktalk/msg.gsp?msgid=20393707
What was Anand thinking?
upc
Hilarious! Looks like subzero counted his chickens before they hatched. Again.
How exactly did Anand manage to get a score that low???
upc
Yep! The Test-Select score should be 215, not 289!!!
He couldn't even copy the 64bit result from his July 19 benchmarks correctly (he pulled the 32bit one by accident):
http://www.anandtech.com/linux/showdoc.aspx?i=2127&p=5
This whole article is shoddy. Already, folks at Ace's are noting many of the results were not compiled for AMD64, and are not reproducible.
More from Vincent:
http://www.aceshardware.com/forum?read=115093828
Anand's test is a joke
By Vincent Diepeveen on Monday, August 9, 2004 11:33 AM EDT
Nah, if you objectively test, then any A64 > 2Ghz will completely eat alive that P4 3.6ghz of course.
But you definitely need good 64 bits compiles from the software and not flawed tests which measure videocard speed nor tests with intel c++ executables in 32 bits at both machines (or even 64 bits at the P4 and 32 bits at the A64).
You must compare of course objectively and they didn't do this.
Good examples is that i can compile tscp at a 2.2Ghz opteron while someone else is gaming at the other cpu, so effectively i've got less than 1 cpu and i get a score of 311k nps for a program that runs within L1 cache and they managed to get 155k nps at an A64 with it.
That's really pathetic. I just did a simple compile gcc -O3 -mcpu=k8
Now please watch the stupid other software they picked. gzip is in specint and not 64 bits either.
It's a hard fact that at 64 bits the P4 will be more dissappointing compared to opteron than when you compare 32 bits.
The small L1 caches of the P4 will really not survive 64 bits code + data. It will be outdated before it is released therefore this P4 Xeon 3.6Ghz.
They screwed up the TSCP benchmark:
http://www.aceshardware.com/forum?read=115093819
Yes the Anand test is real evil and childish
By Vincent Diepeveen on Monday, August 9, 2004 11:04 AM EDT
a) they just test tiny programs
b) for example the prime bench the author says on the homepage:
"primegen is a small, fast library to generate prime numbers in order. It generates the 50847534 primes up to 1000000000 in just 8 seconds on a Pentium II-350; it prints them in decimal in just 35 seconds. "
c) I tried to reproduce something and couldn't. Note that TSCP has *no* makefile for linux at all. One of them didn't compile at all under linux as it is microsoft only. They say they tested under linux but they only test 32 bits intel c++ executables, why?
They get with tscp 155k nps.
While someone else is gaming at the dual opteron, i take part of 1 cpu and get :
311k nps
Here is my output i get with tscp and that's without PGO even:
diep@data tscp181 $ gcc -O3 -mcpu=k8 -o tscp board.c book.c data.c eval.c main.c search.c
diep@data tscp181 $ ls -l tscp
-rwxr-xr-x 1 diep users 36536 Aug 9 10:51 tscp
diep@data tscp181 $ ./tscp
Tom Kerrigan's Simple Chess Program (TSCP)
version 1.81, 2/5/03
Copyright 1997 Tom Kerrigan
"help" displays a list of commands.
tscp> bench
8 . r b . . r k .
7 p . . . . p p p
6 . p . q p . n .
5 . . . n . . N .
4 . . p P . . . .
3 . . P . . . P .
2 P P Q . . P B P
1 R . B . R . K .
a b c d e f g h
ply nodes score pv
1 130 20 c1e3
2 3441 5 g5e4 d6c7
3 8911 30 g5e4 d6c7 c1e3
4 141367 10 g5e4 d6c7 c1e3 c8d7
5 550778 26 c2a4 d6c7 g2d5 e6d5 c1e3
Time: 1782 ms
ply nodes score pv
1 130 20 c1e3
2 3441 5 g5e4 d6c7
3 8911 30 g5e4 d6c7 c1e3
4 141367 10 g5e4 d6c7 c1e3 c8d7
5 550778 26 c2a4 d6c7 g2d5 e6d5 c1e3
Time: 1777 ms
ply nodes score pv
1 130 20 c1e3
2 3441 5 g5e4 d6c7
3 8911 30 g5e4 d6c7 c1e3
4 141367 10 g5e4 d6c7 c1e3 c8d7
5 550778 26 c2a4 d6c7 g2d5 e6d5 c1e3
Time: 1769 ms
Nodes: 550778
Best time: 1769 ms
Nodes per second: 311349 (Score: 1.280)
tscp>
quit
Illegal move.
tscp> exit
Illegal move.
tscp>
diep@data tscp181 $
> Did you see this test of an Athlon64 3500+ vs. Xeon 3.6GHz
> w/EM64T? What a joke! I have NEVER seen an Intel P4 or
> Xeon beat an Athlon64 with no packed SSE/SSE2 instructions
> (there is no way gcc 3.3 is using these) and debugging information
> in the binaries for floating-point intensive benchmarks.
>
> The author makes the statement that:
>
> "Without a doubt, the 3.6GHz Xeon trounces over the Athlon
> 64 in math-intensive benchmarks."
>
> Maybe in some integer math, but definitely not floating-point
> (look at your POV-RAY results idiot!)
Our Nocona server was setup in a remote location with little access, so we had limited time to run as many real world benchmarks as we are typically accustomed to. Fortunately, there are multitudes of synthetic benchmarks that we can use to deduce information quickly and constructively.
And just where was that remote location? Intel?
upc
Here's what TSCP is trying to measure:
gcc is considered a branch intensive program. As you can see from this graph, TSCP has even more branches and they're harder to predict, so it's a good test of a processor's BPU and ability to recover from mispredicted branches. TSCP also has relatively high ILP, so it tests the processor's instruction scheduler. It clearly fits in L1 cache, so it doesn't test a computer's L2 cache or main memory performance. Basically, TSCP measures a processor core's worst case integer performance. It may be a good predictor for compilers, other AI programs, and other branch intensive code.
I would think Prescott, with a deep pipeline, would not perform well when encountering lots of branch mispredictions.
ubench:
Please make sure you compile ubench using only -O2 or -O optimization flags. More aggressive optimizations tend to alter the semantics of the code and skew the results.
This also appears to be a very old program (the benchmark table does not contain recent hardware), and there was this description and caveat about a previous version:
Other factors affecting ubench results include quality of the C-compiler, C-library, kernel version, OS scalability, amount of RAM, presence of other applications running at the same time, etc.
Ubench is executing rather senseless mathematical integer and floating-point calculations for 3 mins concurrently using several processes, and the result is Ubench CPU benchmark. The ratio of floating-point calculations to integer is about 1:3.
Ubench will spawn about 2 concurrent processes for each CPU available on the system. This ensures all available raw CPU horsepower is used.
Ubench is executing rather senseless memory allocation and memory to memory copying operations for another 3 mins concurrently using several processes, and the result is Ubench MEM benchmark.
The following are the samples of ubench output for some systems. Attention: The MEM benchmarks for all Linux systems had to be adjusted by a factor of 8 due to the bug in ubench version < 0.32. The MEM benchmark for all AIX system had to be adjusted by a factor of 4. The bug has been corrected in version 0.32. The benchmarks submitted before 07/31/2000 have been recalculated.
As any benchmark the ubench numbers by itself have no meaning and can be used only when comparing to ubench marks from other systems.
And this is what Anand did: We compiled the program using ./configure and make with no optimizations.
Now, looking at the configuration file, I wonder if they built this correctly for either processor?
I wish they provided makefile output for both like they did with some other benches they built.
John the ripper:
This one is crazy. They couldn't build it the first time, and the generic build appears to try out different code versions and it attempts to self-optimize! There would seem to be a lot that could go wrong here.
It's going to take awhile to figure out what's going on here. Look at the output makefiles. In particular, the "bitslice" , "intermediate values" and "blowfish" tests are the subtests that radically favor Nocona.
Hard to know, without a response from AMD, if these are accurate results, or a compiler optimization issue.
upc
Well, for starters, they should've used the Opteron 150 vs. the Nocona 3.6 GHz, shouldn't they?
Why penalize the AMD part with one speedgrade (2.2 GHz instead of 2.4 GHz), and limit it to 512K L2 vs. 1MB L2???
But nonetheless, the 3500+ wins on content creation, audio encoding, pov-ray, database insert.
Nocona is ~10% faster on superPi, but they state that they don't know what optimizations were compiled into the binaries!!
They also note that they did not compile Linux for each processor, but used what came out of the box. Probably not a big deal.
Then they come to what they term more "synthetic benchmarks".
The most surprising is probably the chess benchmark. "TSCP". I suspect an error, or bad optimization options, because on another chess bench, Diep, Prescott is known to suck.
Someone will need to review the makefiles provided for 'ubench' and 'john the ripper' to see what's going on. I find it hard to believe the results, unless these are essentially L2 cache-size measurement programs.
Finally, it may be nothing, but the second half of the title is "Intel's 64-bit suggestion".
upc