22 nm Larrabee

trinibwoy · Nov 2, 2012

nAo said:
How is this any different from any other Intel processor launch? It's not like IVB machines were parachuted into selected ISV hands the day of the official launch.

That's not the issue. The issue is that they're willing to boast about in-house numbers but aren't letting anyone else report their findings. That's not usual behavior by Intel.

nAo · Nov 2, 2012

trinibwoy said:
That's not the issue. The issue is that they're willing to boast about in-house numbers but aren't letting anyone else report their findings. That's not usual behavior by Intel.

Again, how is this any different from customers getting processors from Intel months earlier than the official launch? I suggest you to check some of the Haswell numbers that were "boasted" at IDF in Setpember. Do you expect Mr. EarlyAdopter NDASigned to be able to report his findings before the product ships?

A1xLLcqAgt0qc2RyMz0y · Nov 12, 2012

Xeon Phi 5110P looks underwhelming

The Xeon Phi 5110P only comes in at 1.01 Peak DP TFlops and burns 225 watts on the 22nm process.

http://techreport.com/news/23884/intel-joins-the-data-parallel-computing-fraternity-with-xeon-phi

The Tesla K20X has 1.31 Peak DP TFlops and burns 235 watts on the 28nm process.

http://techreport.com/news/23882/nvidia-intros-tesla-k20-series-as-titan-snags-top500-lead

The Xeon Phi 5110P is 23% slower than the Tesla K20X yet burns about the same power and is on a newer process (22nm). It look pretty underwhelming to me.

nAo · Nov 12, 2012

TFlops alone are hardly an accurate way of measuring performance.

iMacmatician · Nov 12, 2012

Also, from The Register and SemiAccurate, we see that the Xeon Phi die has 62 cores (which was already known), and there are also die photos (I haven't seen them before).

A1xLLcqAgt0qc2RyMz0y · Nov 12, 2012

nAo said:
TFlops alone are hardly an accurate way of measuring performance.

And yet that is how they rank the Top500 so it must mean something.

http://www.hpcwire.com/hpcwire/2012...nycore_x86_to_market_with_knights_corner.html

NVIDIA, though, is more aggressive about pointing to big performance increases over CPU-only platforms, more on the order of 5X to 30X and beyond. For its new K20X Tesla part announced earlier today, the GPU-maker is claiming a 7X performance advantage over to a Sandy Bridge Xeon. Although that makes it seem like the GPU competition is three times faster than Knights Corner, the NVIDIA comparison is GPU-to-CPU, while Intel prefers to match its coprocessor against two Xeons.

Nevertheless, NVIDIA's K20 does top Knights Corner in both raw performance and performance per watt. The 235 watt K20X offers 1.31 double precision teraflops, while the 225 watt 5110P, at 1.011 teraflops, delivers 300 gigaflops less. Advantage NVIDIA.

It appears to be even more skewed for single precision FLOPS, where the K20X offers three times its double precision performance; for the Knights Corner, single precision appears to be just twice that of its double precision results.

nAo · Nov 13, 2012

IIRC Top500 is based on LINPACK, not raw flop numbers.

AlexV · Nov 13, 2012

Outside of certain extremity waggling the top500 thing is pretty useless. There is lots of life in the real-world outside of super-tuned Linpack and FLOP counting. That doesn't make big Kettle uninteresting, or Xeon Shi less of a potential breath of fresh air.

pcchen · Nov 13, 2012

I always view LINPACK as the "best case scenario." That is, real world applications rarely outperform LINPACK performance, so it can be seen as a "real world peak."

A1xLLcqAgt0qc2RyMz0y · Nov 13, 2012

Charlie is having an orgasism

at S|A over the Xeon Phi

A look at the Xeon Phi cards and hardware

http://semiaccurate.com/2012/11/12/a-look-at-the-xeon-phi-cards-and-hardware/

What does it take to code for a Xeon Phi?

http://semiaccurate.com/2012/11/12/what-does-it-take-to-code-for-a-xeon-phi/

What will Intel Xeon Phi do to the GPGPU market?

http://semiaccurate.com/2012/11/13/what-will-intel-xeon-phi-do-to-the-gpgpu-market/

----

He actually believes that the Xeon Phi will be the end of Nvidia in HPC and GPGPU markets. Of course he also believed that Larrabee would be the end of Nvidia in GPUs completely, which he was completely wrong on. But hey it's charlie and he can never be wrong

It's also funny he hasn't even had an article on the K20X/K20 (except for his usual yield garbage).

3dilettante · Nov 13, 2012

Larrabee is more compelling in the compute space, though.

A product that is baseline acceptable for CPU-based HPC is bleeding edge for GPGPU in terms of tools and features for Nvidia.
Nvidia has added some features like the ability to self-generate tasks that are a "well, duh" thing for any CPU solution, and Nvidia is the only one GPU designer worth mentioning in that market.

Larrabee is a legitimately serious threat here.

A1xLLcqAgt0qc2RyMz0y · Nov 13, 2012

Xeon Phi DGEMM Scores 0.829 TFlops

Wasn't Intel hyping the Xeon Phi DGEMM scores months before the release?

Now it is buried deep in the footnotes here:

http://www.intel.com/content/www/us/en/processors/xeon/xeon-phi-detail.html

4. 2 socket Intel® Xeon® processor E5-2670 server vs. a single Intel® Xeon Phi™ coprocessor SE10P (Intel Measured DGEMM perf/watt score 309 GF/s @ 335W vs. 829 GF/s @ 195W)

As a comparison the Nvidia Tesla K20X scores 1.22 TFlops on DGEMM or 47% faster. Also note the 5110P is slower than the SE10P

3dilettante · Nov 13, 2012

The rule of thumb is that an exotic solution needs an order of magnitude difference in performance to really justify itself against what is already established.

A GPGPU is still a pretty exotic slave card, and Phi is rather familiar and it can operate more autonomously.
DGEMM is also a very good case for a GPU and Larrabee probably has a much more consistent performance profile.

Maybe if Maxwell brings along a decent amount of CPU capability, it can allow for the posited task-sharing between CPU and Phi workflow, or at least improve the situation where the GPU has many of its advantages muted by needing to rely on a processor across an expansion bus so much.

liolio · Nov 14, 2012

I feel a bit underwhelmed by that chip from Intel. OK they ship something, at last, but overall it looks a bit like a sub-part effort. Intel did better than that lately.
I mean how long it has been since Larrabee was cancelled? I would have expected more of a rework ( I haven't read much about where they were heading since the project was cancelled as a GPU).
Something akin to IBM effort with the power a2 cores.
As it is I don't see that product leveraging all Intel strength, it's still trying to be GPU but it seems to fails to match them on raw throughput, it is also more power hungry.
I do get Charlie argument about coding being easier and how it translates into saving quiet some bucks, but overall I feel like the issue of chip is not trying hard enough to be a CPU if that make sense.
I would have hope for Intel to unify its ISA as far as the SIMD are concerned and be compatible with their up coming Haswel CPU. I would have hoped transactional memory to make into the system. I think that they could have build a more proper interconnect to link chips together like IBM did with its power a2, where are the QPI links?
Overall I feel like Intel should have design its own Power A2, better if they could as they have more transistors to play with, power a2 still use on 45nm process.
I might be completely out of phase with the requirement of HPC or GPGPU computing but I feel they would have been better off shipping a "CPU" instead of an add-on card ( I mean mobo with 2 or 4 sockets).
They may have trade even more raw power but it seems that power consumption is off great importance they may have come with a way better chip in that regard (like less cores, 256bit SIMD, an big L3, faster clock speed, better single thread performance). As convenience (coding) and power consumption seems to be the main bottleneck in HPC, why did they still tried to attack GPU on their strong points? (I mean perf per chips, when they have other mean to reach that kind of compute density (sustained throughput and power consumption) in a server blade even though if most likely implies having more chips per blades?

Am I completely wrong on the matter?

AlexV · Nov 14, 2012

I think that Haswell is far more of a concern than Shi is (strictly on technical grounds), but Shi is doing its intended job of getting mind-set slices quite nicely. I don't think Intel is hugely keen on selling you a higher-cost Shi for the price it can sell you a nice much lower cost Xeon, but they needed some way to hedge against the possibility (which didn't necessarily materialise, mind you) of GPU encroachment.

3dilettante · Nov 14, 2012

Neat thing for comparison.

http://www.realworldtech.com/haswell-cpu/

Haswell's gather implementation is as was initially speculated as a microcoded loop.
It generates a load uop per vector element, regardless of cache line adjacency.

My earlier curiousity about which gather implementation would be more aggressive between Larrabee and Haswell appears to be satisfied.

Triskaine · Nov 14, 2012

AlexV said:
I think that Haswell is far more of a concern than Shi is (strictly on technical grounds), but Shi is doing its intended job of getting mind-set slices quite nicely. I don't think Intel is hugely keen on selling you a higher-cost Shi for the price it can sell you a nice much lower cost Xeon, but they needed some way to hedge against the possibility (which didn't necessarily materialise, mind you) of GPU encroachment.

Why do you keep calling it Shi?

ams · Nov 14, 2012

A1xLLcqAgt0qc2RyMz0y said:
He actually believes that the Xeon Phi will be the end of Nvidia in HPC and GPGPU markets. Of course he also believed that Larrabee would be the end of Nvidia in GPUs completely, which he was completely wrong on. But hey it's charlie and he can never be wrong

It's also funny he hasn't even had an article on the K20X/K20 (except for his usual yield garbage).

Yes, his analysis of supercomputing trends is spot on

Take a look:

http://5601-blogs-nvidia-com.voxcdn...ds/2012/06/Top500_list_2012_GPU_momentum2.png

This is exponential growth, with no signs of slowing anytime soon. With Maxwell, this trend should continue as NVIDIA will be able to supply very high performance and energy efficient GPU's and CPU's for high performance computing.

Regarding ease of programming with Xeon Phi vs. all other "accelerators", that is FUD to some extent. The reality is that reworking existing code to take advantage of increased parallelism can dramatically improve the performance of any supercomputing system, regardless of whether or not multi-core CPU's are used or CPU+GPU cores are used.

http://blogs.nvidia.com/2012/04/no-free-lunch-for-intel-mic-or-gpus/

Buddy Bland of Oak Ridge National Laboratory talked about this at SC12:

http://www.ustream.tv/recorded/26957654

dnavas · Nov 15, 2012

ams said:
Buddy Bland of Oak Ridge National Laboratory talked about this at SC12:

http://www.ustream.tv/recorded/26957654

Nice. That led me to Bill's talk: http://www.ustream.tv/recorded/27000612
His slide at the end with 256 SMs and 8 64bit ARM cores is ... interesting. Twice the SM count from 2010.

-Dave

rpg.314 · Nov 15, 2012

dnavas said:
Nice. That led me to Bill's talk: http://www.ustream.tv/recorded/27000612
His slide at the end with 256 SMs and 8 64bit ARM cores is ... interesting. Twice the SM count from 2010.

-Dave

Look at the SRAM size he is promising. 256 MB / chip

22 nm Larrabee

trinibwoy

Meh

nAo

Nutella Nutellae

A1xLLcqAgt0qc2RyMz0y

nAo

Nutella Nutellae

iMacmatician

A1xLLcqAgt0qc2RyMz0y

nAo

Nutella Nutellae

AlexV

Heteroscedasticitate

pcchen

Moderator

A1xLLcqAgt0qc2RyMz0y

3dilettante

A1xLLcqAgt0qc2RyMz0y

3dilettante

liolio

Aquoiboniste

AlexV

Heteroscedasticitate

3dilettante

Triskaine

ams

dnavas

rpg.314

Similar threads