22 nm Larrabee

8 texture sampling units took 10% of die space and they would have wanted 16 to be competetive = 20% of die space. I didn't realize modern texture sampling units are so large. I am wondering whether the tiny 4KB L1 texture caches are tied to the sampling unit (they borrowed sampling units from their GPU). IIRC it contains uncompressed texels ready for filtering. Intel GPUs also have L2 texture caches (16 KB / 24 KB) but a general purpose cache would likely replace the bigger L2 texture cache.

The rough area for the filtering and memory blocks for RV770 was in that range, perhaps ~12%. Unlike Larrabee, there wasn't the space investment in the per-core L2/LS portion of the cores. I didn't find a clear description of the data flow for bringing filtered data from the shared texture blocks into the memory subsystem of the cores.

AMD's Polaris has ~15% of its die area for the texture/LS portion.

I'm curious which architectures were the comparison targets for Larrabee, to get a picture of the competitive situation in terms of performance, power, and area.
 

That's an interesting retrospective, thanks for posting.

But I think he's kind of missing the point here. When people ask him why Larrabee failed they're not discrediting its utility in HPC; a lot of them know full well that it went on to become Xeon Phi. What they're asking is why it never materialized as a viable graphics part. As far as pretty much anyone is concerned that's what Larrabee was supposed to be, not whatever Intel's internal project guidelines or roadmaps dictated, and Intel made a lot of claims about why Larrabee was the right design for graphics. So why wasn't it?

This is a legitimate and, in my opinion, good question. He does address this somewhat but I find it a little lacking. Suggesting that they were never really serious about graphics doesn't fit with the extensive software effort that had to have been underway, and their public outreach which consisted of not just empty marketing claims but a lot of detailed academic description. The fact that they got by with little graphics-specific fixed function hardware is moot if they weren't ever competitive. So is the defense that the card ran DX11 titles all on its own or that he played some particular game satisfactorily or that it had some impressive compute or niche applications. If it wasn't within at least striking distance of nVidia and AMD in perf, perf/W, and perf/$ across a major selection of games then none of this mattered. And I suspect that after all Intel invested in this end they wouldn't have pulled the plug if it showed a clear path to success in this capacity.

The closest explanation we get is that maybe it could have used more texture samplers, or it was held back by hardware bugs (that Intel would have had a good grasp of timeline for fixing) or that it could have seen huge advances with more software development (based on what, and with how many more years?). There doesn't seem to be much consideration that it's even possible that the whole concept was unsuited for competitive consumer graphics. It may well be that a sea of x86 cores with rather conventional 512-bit SIMD and texture samplers bolted on was enough to deliver great gaming performance but I'm not yet convinced. Since Larrabee some pure-compute GPU renderers have been developed but they've shown some pretty big gaps in performance in areas that aren't attributable to texturing. And this is without being saddled with an architecture that's more CPU than graphics friendly.

Oh, and that argument that they must not have cared that much about graphics because they could have done a big discrete Gen-based GPU seems out of touch. Everyone else knows that Intel's GPU designs were garbage in 2005 and even with a scaled up area/power budget they would have been terrible, or at the very least far from great. I'm not even totally sure they could pull that off now. But we're talking about some huge changes that came with a lot of time and investment (and I'm sure many new hires). Ironically, some of their big improvements came from moving to more fixed-function hardware.
 
Last edited:
That's an interesting retrospective, thanks for posting.

But I think he's kind of missing the point here. When people ask him why Larrabee failed they're not discrediting its utility in HPC; a lot of them know full well that it went on to become Xeon Phi. What they're asking is why it never materialized as a viable graphics part. As far as pretty much anyone is concerned that's what Larrabee was supposed to be, not whatever Intel's internal project guidelines or roadmaps dictated, and Intel made a lot of claims about why Larrabee was the right design for graphics. So why wasn't it?

This is a legitimate and, in my opinion, good question. He does address this somewhat but I find it a little lacking. Suggesting that they were never really serious about graphics doesn't fit with the extensive software effort that had to have been underway, and their public outreach which consisted of not just empty marketing claims but a lot of detailed academic description. The fact that they got by with little graphics-specific fixed function hardware is moot if they weren't ever competitive. So is the defense that the card ran DX11 titles all on its own or that he played some particular game satisfactorily or that it had some impressive compute or niche applications. If it wasn't within at least striking distance of nVidia and AMD in perf, perf/W, and perf/$ across a major selection of games then none of this mattered. And I suspect that after all Intel invested in this end they wouldn't have pulled the plug if it showed a clear path to success in this capacity.

The closest explanation we get is that maybe it could have used more texture samplers, or it was held back by hardware bugs (that Intel would have had a good grasp of timeline for fixing) or that it could have seen huge advances with more software development (based on what, and with how many more years?). There doesn't seem to be much consideration that it's even possible that the whole concept was unsuited for competitive consumer graphics. It may well be that a sea of x86 cores with rather conventional 512-bit SIMD and texture samplers bolted on was enough to deliver great gaming performance but I'm not yet convinced. Since Larrabee some pure-compute GPU renderers have been developed but they've shown some pretty big gaps in performance in areas that aren't attributable to texturing. And this is without being saddled with an architecture that's more CPU than graphics friendly.

Oh, and that argument that they must not have cared that much about graphics because they could have done a big discrete Gen-based GPU seems out of touch. Everyone else knows that Intel's GPU designs were garbage in 2005 and even with a scaled up area/power budget they would have been terrible, or at the very least far from great. I'm not even totally sure they could pull that off now. But we're talking about some huge changes that came with a lot of time and investment (and I'm sure many new hires). Ironically, some of their big improvements came from moving to more fixed-function hardware.
It would be great if you could leave all that as a comment on his blog post (perhaps reprased in a less confrontationl manner here and there) Even if he ignores some points, it would still raise some more interesting discussion.
 
It would be great if you could leave all that as a comment on his blog post (perhaps reprased in a less confrontationl manner here and there) Even if he ignores some points, it would still raise some more interesting discussion.

Does his blog take comments? I didn't see a comments section.
 
If we'd had more time to tune the software, it would have got a lot closer. And the next rev of the chip would have closed the gap further. It would have been a very strong chip in the high-end visualization world, where tiny triangles, super-short lines and massive data sets are the main workloads - all things Larrabee was great at. But we never got the time or the political will to get there, and so the graphics side was very publicly cancelled.
Never thought I would see a literal manifestation....
 
did not know where to post this news, but I though this topic will be appropriate:

Benchmarks are an important tool for measuring performance, but in a rapidly evolving field it can be difficult to keep up with the state of the art. Recently Intel published some incorrect “facts” about their long promised Xeon Phi processors.
DL-comparison-chart-1.png
https://blogs.nvidia.com/blog/2016/08/16/correcting-some-mistakes/


The war is open. Fight !
 
If the Xeon Phi is so great at deep learning, why did Intel pay $350M+ for Nervana? ;)
Could be for the framework-software integration-algorithms, which Intel is behind Nvidia on in this regard and the future potential of the hardware that is a move to exascale concept *shrug*.
Cheers
 
Could be for the framework-software integration-algorithms, which Intel is behind Nvidia on in this regard and the future potential of the hardware that is a move to exascale concept *shrug*.
Cheers

They are likely going to kill this custom Nervana ASIC hardware in favor of their own, similar to what Nvidia did with the PhysX PPU. One competitor less, good also for Nvidia, not so for customer choice.
 
They are likely going to kill this custom Nervana ASIC hardware in favor of their own, similar to what Nvidia did with the PhysX PPU. One competitor less, good also for Nvidia, not so for customer choice.
Yeah I agree as the primary purpose was the framework-algorithms-software and expertise, although if the hardware does show promise they may keep it going in similar way Nvidia does with their Echelon concept as a way to break the exascale issue or better Deep/machine Learning with the technology put into their own Phi hardware - maybe in response to the move to 8-bit integer as initially seen by Google and then Nvidia, and maybe also looking to improve fine grained scatter/gather that is missing currently from Nvidia in this context.
Cheers
 
Last edited:
They are likely going to kill this custom Nervana ASIC hardware in favor of their own, similar to what Nvidia did with the PhysX PPU. One competitor less, good also for Nvidia, not so for customer choice.
Their software is available on GitHub under Apache license. (Interestingly, their API guru of Maxas fame left Nervana last month.)

$350M+ is a lot of money for a free API. But it's possible...
 
They are likely going to kill this custom Nervana ASIC hardware in favor of their own, similar to what Nvidia did with the PhysX PPU. One competitor less, good also for Nvidia, not so for customer choice.
Nvidia had to bought Ageia because Intel bought Havok and kill Havok FX. Had Havok FX launched, it will be the death knell for Ageia anyway, without Nvidia/Ati having to spend a dime killing it.
 
Back
Top