Larrabee: Samples in Late 08, Products in 2H09/1H10

Unlikely. From what I understand, Intel engineers are *forbidden* from even looking at patents. Why? If you're knowingly infringing, then it is 3x damages (part of the patent law statues). To avoid these 3x damages, they just disallow any of their engineers from looking at any patents. Clever, eh?
I'm aware of the treble damage penalty and its effect on due dilligence.
And the solution is not clever, it's a sign of an insipid patent system where the choice is between a $100 million settlement or $300 million fine, instead of $0 for avoiding the problem or a licensing fee if the company were allowed to show it made a good faith effort to find a filed patent in the sea of ridiculously opaque and poorly administered patents.

I've never heard Intel make a technical design decision based around a patent issue.
I'm not sure those who would make the deliberations on this would be obligated to tell you.

Plus, Intel has such a large patent portfolio, I don't think NVIDIA would want to get in a pissing match with Intel on patents. The counter-suit would likely drag both of them down, and neither of them would be clear winners.
Leaving Larrabee to freely infringe on Nvidia's patents leads to Nvidia losing anyway.
Why not sue intstead for damages, a way to force a licensing fee, or a way to slip past the aegis of the x86 patent portfolio that Intel is likely counting on for part of its advantage over Nvidia?

It seems Intel is deliberately pursuing a design philosophy that stays within its own patent spread and avoids the areas of specialized hardware that it knows are rife with competitors' patents.
It's a nice side effect, if nothing else.

Most of the time, hardware patents are used either (1) by big companies to squash little companies or (2) or little companies without products (only patents) suing a bigger company. A variant of #2 is when a university is suing a big company. You just don't see much hardware patent litigation between big players (few IBM vs Intel vs AMD size battles). In such cases, only the lawyers win (well, and the academics that serve as expert witnesses, but I digress).
Where do Transmeta and Integraph fit in that scheme?
 
The interplay between cost, price, and profit margin make this sort of thing hard to analyze (as you're well aware). Yet, from what you've said, a mid-range GPU chip is cheaper than a mid-range CPU.

So, I wonder what accounts for the difference?

The GPU is a 90nm, whereas the CPU was a 65nm. Generally, 65nm parts are more expensive. The quad-core CPUs I quoted were actually two chips glued together in the same package. One would assume that you could still get a dual-core version of the same chip for cheaper than $280 (which was the cost of the quad-core). Perhaps customers are just used to paying more for CPUs than "add on" GPUs?

A few years ago (before the whole GPGPU thing), Intel say GPUs as a non-threat and a lower-profit margin business. Their opinion was that burning high-end fab capacity on GPUs would make them less money than building more CPUs. With the advent of Larrabee, that thinking is clearly changing inside of Intel...

Of course GPU board prices represent an entire subsystem, the components costs, and trip through manufacturing. But just on a chip to chip comparison the math is out there. NVIDIA ASPs are in the low $30s.

So how does NVIDIA perform this economic magic? For starters out of necessity. In the beginning nobody wanted to pay much for an accelerator chip, especially factoring in the board costs. Lucky for them TSMC is an extremely efficient company that has many customers that will keep using older processes, and hence TSMC depreciation costs are probably relatively low. Plus Taiwan is cheap. Intel may have foreign operations, but a lot of corporate expenses are incurred in USA. So TSMC might not have high performance logic leadership over Intel, but it best them in economics (probably by a wide margin) and versatility.

Intel, meanwhile had the luxury of essentially dictating prices to the market, and built an a massive organization that reflected this. So one might argue that even though they have trimmed the payrolls, NVIDIA looks remarkably lean with about 20x fewer employees. It worth noting the NVIDIA probably sells somewhere between 1/2 to 2/3 fewer chip (just a rough guess including core logic). But still over 100 million chips to the same customers.

According to Paul Otellini Intel is trying to figure out how to be profitable at $25 chip prices. NVIDIA was there years ago. And now when NVIDIA has a motherboard GPU that share memory the board cost penalty is removed. For now that's a non event its just integrated graphics. But the future is not now.
 
Where do Transmeta and Integraph fit in that scheme?

These are both examples of by case #2 above.

Intergraph vs Intel is the case of a small company without much in the way of products suing a bigger company. The patents in question were from the "Clipper" chip, the last of which was released in the early 1990s.

As far as Transmeta. That would also be case #2. From Wikipedia:

"As of January 2005 [Transmeta] announced a strategic restructuring away from being a chip product company to an intellectual property company...
On October 11, 2006, Transmeta announced that it had filed a lawsuit against Intel Corporation for the infringement on ten of Transmeta's US patents."

Transmeta become a company without any products, but with a patent portfolio. That is exactly case #2 I outlined above.
 
Nope, that is NOT correct in the US. Quick Google gives me this document with this quote:
The rest of the document gives some ways to try to minimize the impact of that, but AFAIK the kind of R&D Aaron spoke of is 100% classified as 'Operating Expenses' at both NVIDIA and Intel.


On the same site is provided a means in which to amortize R&D, including amortization schedules. It also shows different means to get around the rule. In the end, there are many ways to expense and capitalize R&D that circumvent the rule. Many of which are employed regularly.
 
I'm note sure I follow. Can you say more?
I was merely implying that the price-per-mm2 on the 90nm CPUs were roughly comparable to the current ones for 65nm CPUs afaict, so that this wasn't a very significant factor.
ArchitectureProfessor said:
This is what I love about this thread. Fundamentally we're debating the role of special-purpose vs general-purpose hardware and which makes sense where. Of course, that is a moving target, but it is really fun to debate.
I've thought a lot about this, and my current personal conclusion is that going programmable is a perfectly viable proposition in *any* business if, and only if, the programmable core's ALUs are similar to what you'd need in your fixed-function unit. This is especially attractive when you can have the advantage of custom or semi-custom logic in the programmable case but not the fixed-function case.

Example: Triangle setup can be done efficiently in the shader core's floating point unit and the control logic is simple to non-existent. As such, it makes a lot of sense to do that in the shader core. On the other hand, INT8 filtering and blending are obviously wasteful on >=fp32 units.

Now, there are other points to look at, including how fixed-function units can bottleneck the pipeline and how there are engineering advantages to do some cheap things in software rather than in hardware; vonsider the following:
- Doing it in hardware may be slightly more expensive in terms of R&D, especially on cutting-edge nodes.
- In addition to sometimes being a bottleneck, it might also still take power even when it is bottlenecked by other elements (which is likely much of the time).
- The cost per mm2 for certain fixed-function units is higher because redundancy mechanisms are more limited: either you have two of them or you just hope it doesn't break. In a many-core architecture, you can obviously just disable one core.

Overall, my expectation is that DX11 NV/AMD GPUs (or even earlier) will likely get rid of the follwing stages:
- Input Assembly
- Triangle Setup (+Culling?)
- Texture filtering for fp32+ textures.
- Blending [not for performance/simplicity but because devs would like it programmable]

What does this leave us with?
- Hierarchical Z
- Rasterization
- Depth/Stencil Testing
- Texture Sampling
- Texture filtering for <fp32 textures
- Compression algorithms (textures & framebuffer)

When you think about it, that's really not much, and interestingly none of those are likely bottlenecks because they are fundamentally limited by bandwidth (texturing for compressed textures is the least limited in there). The only advantage Larrabee might have then is not avoiding bottlenecks, but rather higher programmability, allowing algorithms like logarithmic shadow maps.

However, if texture filtering is already done in the shader core for fp32, then that stage could be programmable at least (just slower for lower bitrates due to datapath limitations maybe?). Getting depth/stencil/rasterization to *also* be doable in the shader core efficiently might be much harder though, unless you just bypass the entire graphics pipeline and go through CUDA or something (and then you're just Larrabee with extra overhead units you won't use, so it'd only make sense for a small part of the scene!)
 
And the solution is not clever, it's a sign of an insipid patent system...

I 100% agree. I think this is really broken. The whole idea of a patent when first created was to give limited protection in exchange for publicly describing the invention. By requiring public disclosure, it ensured the body of knowledge increased. Yet, now we're in a situation in which patents are basically unreadable and nobody can look at them anyway because of treble damages. Pretty broken.

I'm not sure those who would make the deliberations on this would be obligated to tell you.

I've both worked in industry (briefly) and talked with lots of technical leads (chief architects and such) about CPU chip design. Certainly some chip could have been designed to get around patents and such, but from the technical leads I've talked with, they really don't consider patents. They can't even really be aware of what patents are our there because of the 3x damages issue.

Leaving Larrabee to freely infringe on Nvidia's patents leads to Nvidia losing anyway.

Or they could just compete with better engineering... novel, I know. Seems to have worked pretty well for them thus far. ;)

Why not sue intstead for damages, a way to force a licensing fee, or a way to slip past the aegis of the x86 patent portfolio that Intel is likely counting on for part of its advantage over Nvidia?

Because the counter-claim could damage Nvida more than it helps them. This is why failing companies usually resort to patent litigation as the last resort (and not usually before).

It seems Intel is deliberately pursuing a design philosophy that stays within its own patent spread and avoids the areas of specialized hardware that it knows are rife with competitors' patents.
It's a nice side effect, if nothing else.

True.
 
I've both worked in industry (briefly) and talked with lots of technical leads (chief architects and such) about CPU chip design. Certainly some chip could have been designed to get around patents and such, but from the technical leads I've talked with, they really don't consider patents. They can't even really be aware of what patents are our there because of the 3x damages issue.
Okay then, I guess that makes sense.

Because the counter-claim could damage Nvida more than it helps them. This is why failing companies usually resort to patent litigation as the last resort (and not usually before).

Perhaps the threat would be enough to get more leverage.
Patent litigation seems pretty hard-core these days. In other fields, more equal competitors have duked it out over patents, and the use of injunctions is more common.
If we rule out Nvidia, there are a number of minor GPU players besides AMD and Nvidia that could possibly try something. Given their increasingly marginal roles, they may have less to lose.

Perhaps AMD or Nvidia can invest in them as they launch their patent cases, as AMD did with Transmeta.
A settlement on the order of the Intergraph or Transmeta settlements would mean Larrabee would be that much farther from break-even. If they can somehow manage an injunction, they could retard Larrabee's uptake and buy time for the GPUs to narrow the process gap, perhaps by a transition to the 40nm half-node or even an early trickle of 32nm product if the impasse lasts.
 
On the other hand, INT8 filtering and blending are obviously wasteful on >=fp32 units.

Overall, my expectation is that DX11 NV/AMD GPUs (or even earlier) will likely get rid of the follwing stages:
...
- Texture filtering for fp32+ textures.

Arun,

Wouldn't it be more likely that vendors simply keep texture filtering for FP32+ textures as is (lower precision on the blend weight computations)? For example, NVidia, according to the CUDA docs, seems to use just 9 bits fixed point with 8 bits of fractional value when computing bilinear filtering weights for texture filtering regardless of texture type. So if you want accurate FP32 bilinear filtering (say for GPGPU stuff) you already have to roll your own FP32 texture filtering in the shader.
 
- In addition to sometimes being a bottleneck, it might also still take power even when it is bottlenecked by other elements (which is likely much of the time).

The relative costs can be different, though.
Let's say AMD kept some kind of tesselation or geometry amplification unit in the future.

The current unit in R6xx can in select instances amplify geometry to the point that it is likely that the rest of the chip can't keep up.
(The unit or future tesselation hardware may never catch on, but just for argument's sake...)
On the other hand, is the unit really all that large?

If we instead force a chunk of the more generalized hardware to emulate this, there might not be a clear bottleneck as much as there is lower peak execution.

So what if one sliver of the GPU idles when it saturates the rest of the core? Isn't it better than the rest of the core spending dozens of cycles emulating it, instead of accomplishing other work?

There are ways to power down idle units, but there's no way to idle units that are spinning their wheels synthesizing similar functionality through multiple cycles.
 
If we rule out Nvidia, there are a number of minor GPU players besides AMD and Nvidia that could possibly try something. Given their increasingly marginal roles, they may have less to lose.

This is certainly a constant threat.

There is another thing about our current patent system that annoys me (as an academic, especially). Companies are discouraged from discussing the details of its products, because if the company says too much, then that can be used against them in court if they are sued for patent infringement. I heard a story about a multithread chip that IBM design in the mid-1990s. They received a single letter from some small firm that said something like: "you may or may not infringe on our patent". That was enough for IBM management to put a moratorium on any public disclosure of how part of the chip work. This, of course, really annoyed the engineers, because they wanted to be able to talk about what they did. They were eventually able to talk more about it, but it delayed when they could disclose things by a year or two. Sort of sad, in my opinion.


If they can somehow manage an injunction, they could retard Larrabee's uptake and buy time for the GPUs to narrow the process gap...

Interestingly, as was pointed out earlier, the less fixed-function logic that Larrabee has, the less likely it is to infringe on NVIDIA patents. It would be hard for NVIDIA to claim that a many-core x86 chip with cache coherence somehow infringes on their patents. :) I guess some of the vector instructions might have arithmetic operations that could be patented, but those might be easier to work around.
 
The current unit in R6xx can in select instances amplify geometry to the point that it is likely that the rest of the chip can't keep up.
(The unit or future tesselation hardware may never catch on, but just for argument's sake...) On the other hand, is the unit really all that large? ... So what if one sliver of the GPU idles when it saturates the rest of the core?

This all depends on the relative gain (in terms of area or power efficiency) of brute-force custom fixed-function hardware vs more flexible software implementation. If the software does the exact same calculation in a general CPU, I could easily see it being 10x or more in some cases. Once you add special instructions to the CPU, the gap should narrow. Once you tune the software algorithm to exploit more irregular algorithms that more general hardware can support, the gap might disappear all together.

Of course, this all depends on the specific function and such, but I think the trade-space is pretty complicated.

Man, it would have been really fun to work on designing Larrabee (especially if it is a technical success).
 
They received a single letter from some small firm that said something like: "you may or may not infringe on our patent". That was enough for IBM management to put a moratorium on any public disclosure of how part of the chip work. This, of course, really annoyed the engineers, because they wanted to be able to talk about what they did. They were eventually able to talk more about it, but it delayed when they could disclose things by a year or two. Sort of sad, in my opinion.
Corporate legal fears certainly make being a spectator far more boring these days.
Look at how scant the POWER6 data was/is.

Interestingly, as was pointed out earlier, the less fixed-function logic that Larrabee has, the less likely it is to infringe on NVIDIA patents. It would be hard for NVIDIA to claim that a many-core x86 chip with cache coherence somehow infringes on their patents. :) I guess some of the vector instructions might have arithmetic operations that could be patented, but those might be easier to work around.

Sure, that was my contention. If you can reasonably expect there is a minefield in a certain direction, it may be prudent to go the long way around.

Whether that is the most optimal path there could be from an engineering perspective is something separate.
 
Interestingly, as was pointed out earlier, the less fixed-function logic that Larrabee has, the less likely it is to infringe on NVIDIA patents. It would be hard for NVIDIA to claim that a many-core x86 chip with cache coherence somehow infringes on their patents.
Depends on whether a hardware patent is also interpreted as a software patent, doesn't it?

Jawed
 
Depends on whether a hardware patent is also interpreted as a software patent, doesn't it?

I good point. I wonder how much software patents impact hardware patents. I suspect that something like the patent on the RSA cryptographic algorithm (now expired) would effect either hardware or software. I really don't know how such a thing would shake down.
 
They can't afford to have Larrabee 32nm going online much after the 32nm shrink of Nehalem though, because my current expectation is we'll see 32nm GPUs in 4Q10 or 1Q11. And TSMC's 45nm process has an obvious density advantage against Intel's (which probably doesn't do more than compensate Intel's speed advantage though). Of course, that's all based on preliminary TSMC roadmaps and things could change.

The past experience on TSMC's quoted performance vs delivered performance isn't good. I haven't seen anything which would give TSMC density advantage except for some random numbers that don't have anything to do with density in real life. And I don't ever believe TSMC's delivery dates for anything but FPGAs.

The "things could change" line for TSMC has historically been "things will change for the worst". They do a good job but there haven't been able to demonstrate that they can actually produce they're claims yet.

Aaron Spink
speaking for myself inc.
 
I have no idea what you're talking about. Are you implying the following chips are all-new architectures and have major differences in their RTL: NV15, G71, G92, etc.? RV370, RV610, etc.?
Yes.
Larrabee's advantage here is exactly zero.
I beg to differ. According to rumors, Larrabee's in-order cores are based on the age-old P5 architecture. Of course it takes some effort to extend it with x86-64, the SIMD units and the four-way SMT, but once that's done (and it has been done before) it's just mainly a matter of scaling the number of cores and tweaking some parameters. The really big differences will be in the software, as Larrabee will be capable of doing rasterization, raytracing, physics, etc. all with relatively high efficiency.

If you were to argue that NetBurst is a big departure from more 'conventional' x86 architectures then I would have to agree. But Pentium M, Core and Core 2 all build on the P6 architecture and just 'tweaked' some parameters between these generations (the main limitation being cost).

So I believe that future Larrabee versions will be mainly 'bigger and better', but not total redesigns. Relative to that GPUs have undergone major changes in architecture over the years. Taking the next leap in capabilities and performance takes years because the whole architecture changes. With CPUs things seem a lot more incremental and it's easier to transition to smaller process nodes.

But feel free to disagree. I'm just exploring another reason why Larrabee (II) might survive against G100/R700...
 
he really big differences will be in the software, as Larrabee will be capable of doing rasterization, raytracing, physics, etc. all with relatively high efficiency.
What does make a P5 core + SIMD efficient at rasterization?
But feel free to disagree. I'm just exploring another reason why Larrabee (II) might survive against G100/R700...
Larrabee II????
 
The work of implementing a design has increased to the point that Intel's CPUs are going through major architectural changes every 2 years, which doesn't leave much room for GPUs to seem all that excessive.
Going from Pentium M to Core 2 the only 'major' changes are doubling the L1 cache bus width to 128 bit, doubling the width of the SSE execution units, and issuing four operations per clock. From a high level point of view doubling the number of cores when the transistor budget doubles isn't exactly revolutionsary. It still has caches, decoders, register renaming, reorder buffers, retirement buffers, TLBs, branch predictors, you name it. But if we look at G70 versus G80 we hardly find the same building blocks. Texture samplers are separate from shader pipelines, vertex and pixel shader units are unified, SIMD units are scalar, interpolators and transcendental function evaluation share the same logic, granularity is way lower, etc.

Likewise, Nehalem will have little surprises (re-introducing Hyper-Threading and including a memory controller like AMD), while I expect G100 to be revolutionary instead of evolutionary.
Not keeping the same ISA hasn't stopped GPUs from rapidly evolving.
True, but my point is that Larrabee might evolve even more rapidly.
 
Back
Top