View Full Version : Larrabee for HPC: Not So Fast
For those of you who thought Intel was angling for an HPC play with its upcoming Larrabee processor family, think again. In case you're not a regular reader of this publication, Larrabee is Intel's manycore x86 GPU-like processor scheduled to debut in late 2009 or early 2010. With Larrabee, Intel is gearing up to challenge NVIDIA and AMD for GPU leadership, but doesn't appear interested in exploiting the chip for GPGPU.
<MODSNIP - NO FULL QUOTES PLEASE>
The idea is that the same programs that are initially built for quad-core Nehalems can transparently scale up to 8-core Nehalems, manycore Larrabees, and all their descendents. Intel's research language for manycore throughput computing, Ct, will probably get productized into the company's software offerings at some point as these manycore products start to hit the streets. If all goes according to plan, by 2015 manycore x86 will be the dominant processor species and parallel programming will be the norm.
http://www.hpcwire.com/blogs/Larrabee-for-HPC-Not-So-Fast-36336839.html
I don't really know what to say. No ECC memory? Different die with DDR3 IMCs, then. Not fully IEEE 754 compliant?! What kind of x86 extension IS this?
If they're not going to push their own brand of GPGPU via this, what on EARTH is the point of x86 and a software renderer? Is it just some holdover product they can use to prod the market before they fork a software renderer with future manycore CPUs?
nutball
18-Dec-2008, 10:37
Sounds to me that they don't want Larrabee to eat their own very expensive Xeon/Itanium-flavoured lunch. They've already found themselves having to sell Xeons in places I'm sure they'd rather have sold Itaniums; maybe they're trying hard to ensure they don't replace Xeon with something even cheaper.
Amazing, Larrabee is just a Black Swan in the Intel roadmap with nearly no strategic cooperation from the other groups. I have some difficulty imagining many large universities, pharmaceutical companies, or investment banks deploying Larrabee clusters for GPGPU given this; they seem to be giving up the acceleration market entirely. Most interestingly, this might be a consequence of a lack of interest by HPC firms, rather than the cause.
I will say only one thing here. Texas Instruments has become an highly ideological company; they believe DSPs are the solution to nearly everything. Intel is also an highly ideological company; they believe CPUs are the solution to nearly everything. Just like TI, they will soon be in big trouble. No entity has even been able to escape the consequences of a complete lack of intellectual honesty.
The 'Large IA + Small IA' paradigm is one I really like in theory, but given that Atom doesn't impress me at all (not just the platform which is utter shit, but even just the core CPU) I'm not very optimistic here.
There are definitly many avenues they can take, and no doubt they are evaluating several of them. While Larrabee is a throughput oriented x86 chip, it may not solve the longer term scalability problems (heat, power, communication and synchronization).
I came across an interesting paper that I think gives an idea of one direction their architecture could be heading:
POD : A 3D-Integrated Broad-Purpose Acceleration Layer (http://arch.ece.gatech.edu/pub/ieee-micro08.pdf)
Slides with some more information (http://arch.ece.gatech.edu/present/hpec07.ppt)
It's from Georgia Tech, and co-authored by some Intel researchers.
3dilettante
18-Dec-2008, 14:42
Larrabee was specced to have a full TFLOP of DP capability, with the same ratio of DP to SP as any other Intel processor used in HPC.
In addition, the chip was at least initially posited to have variants capable of acting as a host processor.
Given the modularity of uncores in x86, the memory argument doesn't hold water either.
Either something significant has changed, or the individual interviewed was very narrowly defining Larrabee as the GPU variant. Otherwise, most of those reasons fall by the wayside.
A compute node version of Larrabee would be incredibly tempting in the cheap flops market--which x86 clusters pretty much own.
This "interview", which is almost devoid of content:
http://www.xbitlabs.com/articles/video/display/intel-larrabee-interview.html
nevertheless gives me the feeling that Larrabee is more GPU than anything. Quite disappointing, if true.
Jawed
rpg.314
19-Dec-2008, 09:11
nevertheless gives me the feeling that Larrabee is more GPU than anything. Quite disappointing, if true.
IMHO, that's an incorrect perception. As I see it, Larabee is general purpose compute monster. It's fixed function functionality is texture specific, which can be ignored for HPC apps. They added TMUs because gaming is a market where they can achieve the volumes. Rendering is just one of the compute intensive jobs which it can handle very well. Of course, the embarrassingly parallel nature of shading helps.
I see GPU's as chips which have a lot of 3D specific units and are leveraging their shaders for HPC apps. At any rate, the difference is getting muddier by the generation.
IMHO, that's an incorrect perception.
I'm hoping it's an incorrect perception, too. I'm simply describing the impression the interview left me with, which tallies disappointingly with the article linked by the original poster.
As I see it, Larabee is general purpose compute monster. It's fixed function functionality is texture specific, which can be ignored for HPC apps. They added TMUs because gaming is a market where they can achieve the volumes. Rendering is just one of the compute intensive jobs which it can handle very well. Of course, the embarrassingly parallel nature of shading helps.
I see GPU's as chips which have a lot of 3D specific units and are leveraging their shaders for HPC apps. At any rate, the difference is getting muddier by the generation.
I agree with all you're saying, but it seems to me that Intel is constraining Larrabee for GPU/consumer applications.
---
The interview perhaps provides the final clue about the extent of the fixed function units in Larrabee, stating:
we still have hardware HD video decoders and texture samplers
Jawed
rpg.314
19-Dec-2008, 11:31
I'm hoping it's an incorrect perception, too. I'm simply describing the impression the interview left me with, which tallies disappointingly with the article linked by the original poster.
I agree with all you're saying, but it seems to me that Intel is constraining Larrabee for GPU/consumer applications.
I think intel is doing so simply becuase gaming is the market out there with the volumes to sustain a dedicated product. They must sell it to this market to finance R&D. Since HPC is not a big enough market, they must live with whatever consumer market can come up with. (small HPC specifics such as ECC RAM exempt). I see Intel's strategy as economic instead of technical.
Mintmaster
19-Dec-2008, 22:12
rpg, I don't think you're understanding the points brought up in this thread. Of course gaming is the biggest market for Larrabee. What everyone is disappointed in is that it appears Intel is trying to limit Larabee to just graphics.
If this is all true, I think nutball's explanation is quite convincing. Intel doesn't want to cut into its Xeon revenue (one Larrabee could do the work of 10+ Xeons), so it's going to keep Larrabee from making a big dent in HPC. Maybe they'll keep it ready for more general use if AMD or NVidia are able to create a truly compelling HPC product that threatens their Xeon sales, but until then they're going to try impeding the progress of low cost HPC by keeping Larabee graphics-only.
I am pretty sure Intel will come out with Larrabee Xeon and cost it several times as much. The early presentations for Larrabee were aimed at HPC.
Why would intel WANT to replace their server/workstation CPU's when they could simply add them to it for high compute workstations/servers with backward compatibility. Tapping into the high margin "Quadro" market first would be the wisest thing to do before generalizing LRB in the consumer market and finally enable ray-tracing on all platforms.
especially as "just a GPU" it looks more intel than anything else to me.
CNCAddict
20-Dec-2008, 01:01
This "interview", which is almost devoid of content:
http://www.xbitlabs.com/articles/video/display/intel-larrabee-interview.html
That's the worst interview I have ever read. Waste of time that was....they should take pointers from the Anandtech ATI interview.
rpg.314
20-Dec-2008, 04:41
rpg, I don't think you're understanding the points brought up in this thread. Of course gaming is the biggest market for Larrabee. What everyone is disappointed in is that it appears Intel is trying to limit Larabee to just graphics.
If this is all true, I think nutball's explanation is quite convincing. Intel doesn't want to cut into its Xeon revenue (one Larrabee could do the work of 10+ Xeons), so it's going to keep Larrabee from making a big dent in HPC. Maybe they'll keep it ready for more general use if AMD or NVidia are able to create a truly compelling HPC product that threatens their Xeon sales, but until then they're going to try impeding the progress of low cost HPC by keeping Larabee graphics-only.
Ok, I think the point that was being made was that Intel will try to market it to graphics market only and will avoid positioning it against Xeon in HPC market ( ie the reasons being financial instead of technical).
Did I get your POV correctly now?
I think that a HPC larabee will come out and it will have insane margins because it can do work of ~10 Xeons. If it is a financial decision instead of a technical one, then they will test waters first and depending on margins make one for HPC.
nutball
20-Dec-2008, 07:06
I am pretty sure Intel will come out with Larrabee Xeon and cost it several times as much.
Well maybe they will, but they don't want Larrabee-Celeron to even be capable of competing in server space with Xeon, Itanium, or later Larrabee-Xeon. So they have to ensure that Larrabee-Celeron lacks important features required in that market.
GPGPU is getting a lot of press recently but it's not yet entirely obvious that it isn't just a (another) passing fad. The HPC crowd have played with conceptually similar things in the past but they've not really come to dominate that market. GPGPU is superficially attractive because of the ludicrously low price of the hardware. But if you factor in the cost of somebody who can understand how to program the bloody things, and the cost of re-writing your 100k line FORTRAN code every three years because the hardware has changed so much the old optimisations just aren't optimal any more, suddenly those 10 Xeons start to look a lot more attractive. The HPC crowd may yet just give up on GPGPU.
So Intel will hedge its bets I'd say, Keep Larrabee unsuitable for server use and watch the competition. If possible I think it'd prefer it's HPC customers didn't start thinking $100/TFLOP, but if it must I agree it will certainly address that market with a Larrabee-like something if that becomes necessary.
I think that a HPC larabee will come out and it will have insane margins because it can do work of ~10 Xeons. If it is a financial decision instead of a technical one, then they will test waters first and depending on margins make one for HPC.
Xeons will be getting AVX:
http://en.wikipedia.org/w/index.php?title=Intel_Sandy_Bridge_(microarchitect ure)&oldid=258926603
not long after Larrabee appears.
Intel's strategy is prolly to wait out (and trash-talk) the honeymoon GPGPU-HPC era but Larrabee will be waiting/developing in the wings regardless.
AVX could be seen as a mere step in the direction of a Larrabee vector processor per x86 core. Then we get into questions of whether Larrabee's L2 cache architecture (including ring bus) is of use in the x86 space.
Jawed
CouldntResist
20-Dec-2008, 14:10
Mabe there is some truth in those journalst's speculations about possiblity of Larrabee being used in Xbox <undisclosed-aspirational-moniker> ?
Well, I've seen Intel mention OpenCL support on Larrabee various times.
So they are going for GPGPU.
Perhaps not for clusters of GPGPUs, at least not from the start... but GPGPU, I'm quite sure.
3dilettante
22-Dec-2008, 15:48
Xeons will be getting AVX:
http://en.wikipedia.org/w/index.php?title=Intel_Sandy_Bridge_(microarchitect ure)&oldid=258926603
not long after Larrabee appears.
The FMA support won't be out for at least another design cycle afterwards, though.
AVX could be seen as a mere step in the direction of a Larrabee vector processor per x86 core. Then we get into questions of whether Larrabee's L2 cache architecture (including ring bus) is of use in the x86 space.
Since Sandy Bridge/Gesher was specified at one point to have a ring-bus of some sort, maybe some variation of the them would be used.
The FMA support won't be out for at least another design cycle afterwards, though.
I don't have much insight on AVX: is AVX expected to eventually encompass Larrabee's vector processing or is AVX merely an intermediate step? Though I wouldn't be surprised if no-one has an answer to that, not even within Intel.
Since Sandy Bridge/Gesher was specified at one point to have a ring-bus of some sort, maybe some variation of the them would be used.
I wasn't aware of that, so that's kinda interesting...
Jawed
3dilettante
22-Dec-2008, 16:42
The rumors have it that AVX and LRB have significant similarities, but they are not fully compatible.
Maybe Larrabee I can't find widespread use because its vector instruction set would not find compatibility in future AVX versions. Perhaps Larrabee II will?
The rampant schisms between vector ISAs throughout x86 (SSE4/5/AVX/LRB) might be a factor.
Gesher, now known as Sandy Bridge, was listed as having a ring bus in one of the leaked Larrabee presentations, though that was a while back.
rpg.314
23-Dec-2008, 04:25
I think that ISA has diverged between CPU x86 and LRB's x86. Intel's Larabee paper made no mention of SSE units in LRB which makes sense since it has a float16 VPU. AVX, OTOH, is an extension of SSE. It's not a gap which can't be closed, but we will have to wait and see if Intel is interested in closing this gap. They could always put a SSE/AVX decoder which utilized LRB's VPU to handle binary compatibility.
3dilettante
23-Dec-2008, 14:38
AVX is a pretty significant revamp of x86 instruction encoding with the additional source operands, option to not have destructive operands, a more refined encoding scheme, more forward-thinking ways of handling context saving, (maybe someday being extended to cover standard general purpose ops), etc.
AMD's SSE5 looks more like an actual extension of SSE, same x86 bugbears, yet another prefix byte, and as such looks to be in many ways inferior to its competitor.
My curiousity is where in the spectrum LRB is between SSE5 and AVX.
crystall
23-Dec-2008, 14:52
Tapping into the high margin "Quadro" market first would be the wisest thing to do before generalizing LRB in the consumer market and finally enable ray-tracing on all platforms.
To tap into the "Quadro" market you need high quality drivers, not only a fast GPU; Intel driver quality is abysmal so I guess they will be looking at other markets first.
To tap into the "Quadro" market you need high quality drivers, not only a fast GPU; Intel driver quality is abysmal so I guess they will be looking at other markets first.
The high end workstation market is limited in both hardware and software environment. Intel has been hiring for the driver department since 2007 I think and I'm pretty sure you can't compare it to the drivers of your i950IGP.
Intel has been hiring for the driver department since 2007 I think and I'm pretty sure you can't compare it to the drivers of your i950IGP.
Yea, I have to give credit where credit's due.
I bought a laptop with Intel X3100 graphics about 18 months ago. Initially the drivers were quite poor, didn't even support DX10 and all that.
Since then Intel has improved the frequency of their updates, added quite a few features, and fixed most of the compatibility issues. It's still not quite perfect, but I think they've come a long way.
Outside the graphics drivers, I don't think Intel ever had any driver issues anyway. I've been using all kinds of Intel hardware for years (CPUs, chipsets, network adapters, raid controllers and such), and generally their drivers were rock-solid... Intel was also quick to support XP x64 and Vista x64, and with good quality drivers too (especially nVidia made quite a mess of Vista at first).
And don't forget... Part of the 'driver' of Larrabee will be a SIMD-optimizing x86 compiler. Intel has quite a track record in the area of compilers aswell.
If Intel is as serious about Larrabee as I think they are, I'm not worried about the drivers. I think they know how important drivers are, and I think they have the resources to pull it off, if they get their priorities right.
My curiousity is where in the spectrum LRB is between SSE5 and AVX.
Intel has not disclosed Larrabee's new instruction (called LNI), but from current information it looks to be an even more radical departure from SSE than AVX.
3dilettante
23-Dec-2008, 20:17
That should be interesting. Some of the posts I've read elsewhere didn't seem to indicate such a massive disparity, leaving out special purpose operations related to graphics.
From what I understand, LNI does not contain many graphics related instructions. It's more designed for running shaders and GPGPU related stuffs. Basically, it's a standard vector instruction, including gather/scatter, execution mask, etc. which are not in SSE or even AVX.
It's the difference between SPMD and SIMD.
This is an interesting paper from a research group at Intel.
"Atomic Vector Operations on Chip Multiprocessors"
http://doi.acm.org/10.1145/1394608.1382154
vBulletin® v3.8.6, Copyright ©2000-2013, Jelsoft Enterprises Ltd.