InvestorsHub Logo
Followers 0
Posts 2
Boards Moderated 0
Alias Born 05/03/2011

Re: This Causes an Error post# 131246

Sunday, 03/09/2014 6:01:07 PM

Sunday, March 09, 2014 6:01:07 PM

Post# of 151692
INTEL TAKES ALL.

Dear Aeassa and WBMW,

CONCLUSIONS:
1. Intel is the current and future performance winner in servers and microservers.
2. Intel is the performance winner in the next round of smartphone and tablet SOC's based on 22nm, and will pull away at 14nm at the end of this year.
3. Intel is the current and future winner in the next round of HPC super chips for supercomputers.

The table below is a comparative analysis of SPECint_rate_base 2006 (SiRb06) per watt TDP (W), per socked, (P/S/W), for various server platforms currently available from Intel (Xeon, Avoton,), IBM Power7+, AMD opteron, and the to be available ARM based platforms from AMD (A1100) and Applied Micro (X-Gene), and the to be available IBM Power8 .

There are also columns for number of sockets (S) and cores (C), die size in mm2 (D), and core size in mm2 (CS), including L2 cache.

The core sizes in mm2, are used below in conjunction with performance, core count, clock speed in Ghz, TDP in watts, *100 to report performance per core area corrected for clock speed and power consumption (P/C/GHZ/CS/W).

PLATFORM S/C SiRb06 W P/S/W D CS P/C/GHZ/CS/W
QUAD P8 4GHz 4/48 2952 250 2.95 650 20.7 1.16
QUAD P7+ 460P 4.1GHz 4/32 1230 250 1.23 567
QUAD E7-4890V2 2.8GHz 4/60 2390 130 4.6 540 11.3 3.89
QUAD 6386SE 2.8GHz 4/64 1070 140 1.91 315
CENTERTON S1260 2GHz 1/2 13 8.5 1.53
AVOTON C2750 2.4GHz 1/8 97 20 4.85 105 3.47 7.28
AVOTON C2730 1.7GHz 1/8 69 12 5.75 105 3.47 12.2
OPTERON 6386SE 1/16 269 130 2.07 315
E3-1230Lv3 1.8GHz 1/4 135 38 3.55 177 12.3 4.01
E3-1265Lv3 2.5GHz 1/4 188 45 4.18 177 12.3 3.4
X-Gene 40nm 2.5GHz 1/8 115 59.3 1.94
X-Gene 28nm 3GHz 1/16 152 59.3 2.53
OPTERON X1150 28nm 1/4 28.1 17 1.65 122 9 2.29
OPTERON A1100 28nm 1/8 80 25 3.2
APPLE A7 1/2 21.2 8.5 2.65 102 9.7 9.9



KEY TO TABLE ENTRIES:
1. SiRb06 values are obtained from the SPEC website, other sources, or estimated as discussed below.
2. The estimated improvements of 28nm versus 40 nm TSMC is 39% smaller and 32% less power. Therefore, the estimated SiRb06 for X-gene at 28nm is guestimated as 32% higher than at 40nm, at the same wattage.
3. The SiRb06 for the AMD opteron A1100 is that claimed by AMD.
4. The SiRb06 for IBM Power 8 is guestimated as 1.6x performance per core claimed by IBM and 50% more cores.
5. The SiRb06 for the Apple A7 is guestimated by comparison to the 2 core Silvermont based Merrifield SOC performance as being comparable to that of the A7 (small advantage for Intel in WebExpert) and dividing the performance of the 8 core Avoton by 4, with a small adjustment made for the clock speed of Merrifield (2.1GHz) versus Avoton (2.4 or 1.7GHz).
6. Core sizes are obtained by measuring the core and L2 cache and total die on the computer screen, and scaling based on the known die size. The Silvermont core size was obtained from die shots of Avoton and Bay Trail, with 4 dual core modules on the Avoton die and an Avoton die size of 105 mm2, 10.5 x 10 mm, dual silvermont cores 3 x 2.3 mm, single silvermont core 3 x 1.16 mm. Haswell die 177mm2, with 4 cores on the Haswell die, measuring 2.8 x 4.4 mm each.

Intel has 2 to 3 times the performance per watt per socket lead over all of the currently available alternative server offerings, and a 1.5 to 2 times advantage relative to the announced future products from IBM P8, AMD, and Applied Micro ARM based microservers.

Thus, the most competitive processors in peformance per watt per socket are the AMD A1100 at 3.2, IBM upcoming P8 at 2.95, A7 at 2.65, versus 5.75 for the Intel Avoton C2730 and 4.6 for the Intel E7 4890v2.

Applied Micro is not competitive at 40nm and is unlikely to be competitive at 28nm with AMD's upcoming 28nm ARM server chip, and AMD's ARM microserver is in turn not competitive with the currently availble Intel servers and microservers.

TINY SILVERMONT CORE SIZE:
The small 3.5 mm2 size of the Atom Silvermont core is notable, with a 7 mm size for a dual core silvermont CPU module. I have seen previous iHub postings indicating a Silvermont core size of 8 mm2, presumably for a Silvermont dual core CPU module? The use of 2 cpu cores per CPU module probably contributes to the tiny core size, by sharing resources between the two adjoining cores. Both AMD and Apple A series CPU's seem to take this approach. Haswell core size is 12.3 mm, Ivybridge 11.3mm, versus 9 mm for the AMD Opteron X1150 core and 9.7 mm for the Apple A7 core, and 20.7 mm2 for the IBM P8.
IMPLICATIONS FOR SMARTPHONES AND TABLETS:
The performance per core, per Ghz, per core size, per watt yields a large lead for Intel, with the exception of the Apple A7. However, the Silvermont core is less than half the size of the A7 core. Part of this Intel advantage is attributable to superior 22nm process versus 28nm for Apple and AMD. Also, the Intel process has been density optimized for SOC's. The outstanding design of the Silvermont CPU has to be aknowledged. This Intel advantage is obscured by the rest of the SOC which takes up most of the area of the SOC die. Intel presumably will achieve substantially more area optimization of the uncore portion of the SOC going forward. We can see that with the upcoming Merrifield and Moorefiled SOC's, which Intel claims have double the performance of Qualcom, with a small peformance lead over the A7 on WebExpert 2013 (3). With the smaller core size Intel will gain a further advantage with the quad core Moorefield SOC, with little if any significant increase in total die size. The 14nm SOC's are soon to follow this year, which will greatly improve density and substantially reduce power consumption.
IMPLICATIONS FOR HPC:
The incredibley small 3.5 mm2 size of the Silvermont core also explains Intel's choice of an upgraded Silvermont core for the next generation Phi MIC superchip, Knights Landing (KL). KL will incorporate 72 upgraded silvermont cores, with four threads per core versus 1 in Silvermont, with two 512bit vector units per core vs one 128bit vector unit in Silvermont, and will be fabricated in 14nm FinFet. KL will incorporate 8 or 16GB of on package eDRAM cache memory markedly reducing latency and increasing bandwidth 50%, 6 channel DDR4-2400 memory controller supporting up to 384 GB or RAM, versus 16GB on the current Phi, with integrated high speed 100Gb/s Cray HPC fabric, and 36 lanes of PCI Express Gen 3 in a 215W TDP package. This will result in 3X the current performance in FP and ST, to 3 TFLOPS double precision FP per socket. KL is a standalone CPU able to address large amounts of memory eliminating the performance bottleneck due to the current requirement to transfer data from the host CPU to the FP coprocessor.

SAP SALES & DISTRIBUTION 2 TIER BENCHMARK:
An analysis below compares the Intel Xeon E7 v2 versus IBM Power-7/8 on SAP Sales & Distribution 2 Tier Benchmark.

Model SAP S&D Total (SAP/GHz/ PERF/ PERF/CM2/
Score Cores Core) CM2 WATTX100
Quad Xeon E7-4890 132.2 60 0.79 0.15 0.61
v2 2.8Ghz 12C Dell R920

Quad Power 7+ 3.41 GHz 68.4 24 0.84 0.15 0.34
6C IBM p270

Dual Power 7+ 4.1 GHz 54.7 16 0.83 0.15 0.33
8C IBM p260

Dual E5-2697 v2 2.7GHz 54.8 24 0.85 0.16 0.65
12C hp

Octa E7-8890v2 2.8GHz 259.7 120 0.77 0.14 0.59
15C Fujitsu

12x Power 7+ 3.7GHz 311.7 96 0.88 0.16 0.35
8C IBM 780

ESTIMATED QUAD P8 4GHz 260 96 1.36 0.21 0.54
12C assuming 1.6x perf
per core
with 8 centaurs added 0.18 0.45



In summary the currently just released Intel 22nm Xeon E7v2 has a performance per watt per die size rating of 0.59 to 0.65 compared to P7+ 0.33 to 0.35, estimated P8 0.54 not including centaur and 0.45 including the 8 centaur memory controller chips.

The P8 design removes the memory controller taking it back, off die, needing 8 centaur off die memory controller with added L4 cache chips to connect to DDR3 memory. To be fair these 8 centaur chips, made using the same 22nm SOI process as main P8 die, should be added back to the
CM2 area and power requirements of the P8. I was unable to get data on that. Examining the dies for the P8 and the centaur chip for comparable memory structures and adjusting the die sizes on the computer monitor until they are the same, gave me an estimated centaur die size of 35.5 mm2, ie 0.36cm2. Using that die size scaling it to the 250TDP of the P8, gives a power consumption of 13.7 watts per centaur. Obviously a guess. Cutting that in half gives 7 watts per centaur. Similar to buffer chips used of rdims, as I recall. This yields an added power consumption of 55 watts and added
area of 2.84 cm2. i.e. P8 + 8 centaurs = TDP 305, CM2 7.6

Intel's Xeon E7 v2 chip is being manufactured with a mature high yielding end of life 22nm FinFet process, with 14nm already in production, whereas IBM is using 22nm SOI immature low yielding process to make a larger die size main processor for a target market requiring over 99.99% reliability. Therefore IBM will have to accept much lower yields, as they always have. How low? Any thoughts?

In addition, the E7v2 can address 3TBytes of memory (DDR3 or DDR4) versus 1 Terabyte for the P8 (DDR3 for now). That is an important difference for the target markets of in memory data analytics and HPC, etc.

Feel free to correct any errors in the above.

REFERENCES:
1. http://www.anandtech.com/show/7460/apple-ipad-air-review/3, power in watts during mini-power-visus test goes from 3.5 to 11.5 watts on the iPad Air using the A7 processor indicating a processor TDP of +8 watts.
2. http://www.anandtech.com/show/7460/apple-ipad-air-review/2, A7 floor plan from chipworks used to obtain A7 CPU size.
3. http://www.anandtech.com/show/7789/intel-talks-merrifield-moorefield-and-lte-at-mwc-2014, Merrifield dual core SOC WebXPRT 2013 versus A7 in 5s, Snapdragon 800 in Samsung Galaxy S4.
4. http://www.servethehome.com/Server-detail/intel-atom-c2750-8-core-avoton-rangeley-benchmarks-fast-power/, Atom C2750 hardinfo benchmark FPU FFT used to estimate the floating point peformance of relative to Xeon E3-1220v3, 0.73 vs 3.74.
5. http://www.theregister.co.uk/2013/08/27/ibm_power8_server_chip/?page=2, block diagram used to estimate the IBM power 8 core size.
6. http://www.spec.org/cpu2006/results/rint2006.html, all SPEC CINT2006 rates.
7. http://vr-zone.com/articles/xeon-phi-knights-series-continues-landing-2015/64112.html, next intel phi super chip.
8. http://www.cray.com/Products/Computing/XC/, Cray XC30 100 PETFLOP supercomputer series.
9. http://www.theregister.co.uk/2013/08/27/ibm_power8_server_chip/?page=1, http://www.hotchips.org/wp-content/uploads/hc_archives/hc25/HC25.20-Processors1-epub/HC25.26.210-POWER-Studecheli-IBM.pdf, Upcoming 4GHz 12-core Power8.
10. http://www.anandtech.com/print/7757/quad-ivy-brigde-ex-60-cores-120-threads
11. http://www-05.ibm.com/cz/events/febannouncement2012/pdf/power_architecture.pdf, https://www.ibm.com/.../wikis/.../POWER8_VUG.pdf (this is cached by google)
12. http://www.itjungle.com/tfh/tfh022414-story01.html, - gives estimate of P8 2.2 x P7 performance.
Volume:
Day Range:
Bid:
Ask:
Last Trade Time:
Total Trades:
  • 1D
  • 1M
  • 3M
  • 6M
  • 1Y
  • 5Y
Recent INTC News