Larrabee: Samples in Late 08, Products in 2H09/1H10

Discussion in 'Rendering Technology and APIs' started by B3D News, Jan 16, 2008.

  1. ArchitectureProfessor

    Newcomer

    Joined:
    Jan 17, 2008
    Messages:
    211
    Likes Received:
    0
    Yea, sorry. I didn't mean to imply that only Intel does this. I totally agree that it is standard industry practice.
     
  2. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,579
    Likes Received:
    4,799
    Location:
    Well within 3d
    I'm aware of the treble damage penalty and its effect on due dilligence.
    And the solution is not clever, it's a sign of an insipid patent system where the choice is between a $100 million settlement or $300 million fine, instead of $0 for avoiding the problem or a licensing fee if the company were allowed to show it made a good faith effort to find a filed patent in the sea of ridiculously opaque and poorly administered patents.

    I'm not sure those who would make the deliberations on this would be obligated to tell you.

    Leaving Larrabee to freely infringe on Nvidia's patents leads to Nvidia losing anyway.
    Why not sue intstead for damages, a way to force a licensing fee, or a way to slip past the aegis of the x86 patent portfolio that Intel is likely counting on for part of its advantage over Nvidia?

    It seems Intel is deliberately pursuing a design philosophy that stays within its own patent spread and avoids the areas of specialized hardware that it knows are rife with competitors' patents.
    It's a nice side effect, if nothing else.

    Where do Transmeta and Integraph fit in that scheme?
     
  3. Voltron

    Newcomer

    Joined:
    May 25, 2004
    Messages:
    192
    Likes Received:
    3
    Of course GPU board prices represent an entire subsystem, the components costs, and trip through manufacturing. But just on a chip to chip comparison the math is out there. NVIDIA ASPs are in the low $30s.

    So how does NVIDIA perform this economic magic? For starters out of necessity. In the beginning nobody wanted to pay much for an accelerator chip, especially factoring in the board costs. Lucky for them TSMC is an extremely efficient company that has many customers that will keep using older processes, and hence TSMC depreciation costs are probably relatively low. Plus Taiwan is cheap. Intel may have foreign operations, but a lot of corporate expenses are incurred in USA. So TSMC might not have high performance logic leadership over Intel, but it best them in economics (probably by a wide margin) and versatility.

    Intel, meanwhile had the luxury of essentially dictating prices to the market, and built an a massive organization that reflected this. So one might argue that even though they have trimmed the payrolls, NVIDIA looks remarkably lean with about 20x fewer employees. It worth noting the NVIDIA probably sells somewhere between 1/2 to 2/3 fewer chip (just a rough guess including core logic). But still over 100 million chips to the same customers.

    According to Paul Otellini Intel is trying to figure out how to be profitable at $25 chip prices. NVIDIA was there years ago. And now when NVIDIA has a motherboard GPU that share memory the board cost penalty is removed. For now that's a non event its just integrated graphics. But the future is not now.
     
  4. ArchitectureProfessor

    Newcomer

    Joined:
    Jan 17, 2008
    Messages:
    211
    Likes Received:
    0
    These are both examples of by case #2 above.

    Intergraph vs Intel is the case of a small company without much in the way of products suing a bigger company. The patents in question were from the "Clipper" chip, the last of which was released in the early 1990s.

    As far as Transmeta. That would also be case #2. From Wikipedia:

    "As of January 2005 [Transmeta] announced a strategic restructuring away from being a chip product company to an intellectual property company...
    On October 11, 2006, Transmeta announced that it had filed a lawsuit against Intel Corporation for the infringement on ten of Transmeta's US patents."


    Transmeta become a company without any products, but with a patent portfolio. That is exactly case #2 I outlined above.
     
  5. nelg

    Veteran

    Joined:
    Jan 26, 2003
    Messages:
    1,557
    Likes Received:
    42
    Location:
    Toronto

    On the same site is provided a means in which to amortize R&D, including amortization schedules. It also shows different means to get around the rule. In the end, there are many ways to expense and capitalize R&D that circumvent the rule. Many of which are employed regularly.
     
  6. Arun

    Arun Unknown.
    Legend Veteran

    Joined:
    Aug 28, 2002
    Messages:
    5,023
    Likes Received:
    302
    Location:
    UK
    I was merely implying that the price-per-mm2 on the 90nm CPUs were roughly comparable to the current ones for 65nm CPUs afaict, so that this wasn't a very significant factor.
    I've thought a lot about this, and my current personal conclusion is that going programmable is a perfectly viable proposition in *any* business if, and only if, the programmable core's ALUs are similar to what you'd need in your fixed-function unit. This is especially attractive when you can have the advantage of custom or semi-custom logic in the programmable case but not the fixed-function case.

    Example: Triangle setup can be done efficiently in the shader core's floating point unit and the control logic is simple to non-existent. As such, it makes a lot of sense to do that in the shader core. On the other hand, INT8 filtering and blending are obviously wasteful on >=fp32 units.

    Now, there are other points to look at, including how fixed-function units can bottleneck the pipeline and how there are engineering advantages to do some cheap things in software rather than in hardware; vonsider the following:
    - Doing it in hardware may be slightly more expensive in terms of R&D, especially on cutting-edge nodes.
    - In addition to sometimes being a bottleneck, it might also still take power even when it is bottlenecked by other elements (which is likely much of the time).
    - The cost per mm2 for certain fixed-function units is higher because redundancy mechanisms are more limited: either you have two of them or you just hope it doesn't break. In a many-core architecture, you can obviously just disable one core.

    Overall, my expectation is that DX11 NV/AMD GPUs (or even earlier) will likely get rid of the follwing stages:
    - Input Assembly
    - Triangle Setup (+Culling?)
    - Texture filtering for fp32+ textures.
    - Blending [not for performance/simplicity but because devs would like it programmable]

    What does this leave us with?
    - Hierarchical Z
    - Rasterization
    - Depth/Stencil Testing
    - Texture Sampling
    - Texture filtering for <fp32 textures
    - Compression algorithms (textures & framebuffer)

    When you think about it, that's really not much, and interestingly none of those are likely bottlenecks because they are fundamentally limited by bandwidth (texturing for compressed textures is the least limited in there). The only advantage Larrabee might have then is not avoiding bottlenecks, but rather higher programmability, allowing algorithms like logarithmic shadow maps.

    However, if texture filtering is already done in the shader core for fp32, then that stage could be programmable at least (just slower for lower bitrates due to datapath limitations maybe?). Getting depth/stencil/rasterization to *also* be doable in the shader core efficiently might be much harder though, unless you just bypass the entire graphics pipeline and go through CUDA or something (and then you're just Larrabee with extra overhead units you won't use, so it'd only make sense for a small part of the scene!)
     
  7. ArchitectureProfessor

    Newcomer

    Joined:
    Jan 17, 2008
    Messages:
    211
    Likes Received:
    0
    I 100% agree. I think this is really broken. The whole idea of a patent when first created was to give limited protection in exchange for publicly describing the invention. By requiring public disclosure, it ensured the body of knowledge increased. Yet, now we're in a situation in which patents are basically unreadable and nobody can look at them anyway because of treble damages. Pretty broken.

    I've both worked in industry (briefly) and talked with lots of technical leads (chief architects and such) about CPU chip design. Certainly some chip could have been designed to get around patents and such, but from the technical leads I've talked with, they really don't consider patents. They can't even really be aware of what patents are our there because of the 3x damages issue.

    Or they could just compete with better engineering... novel, I know. Seems to have worked pretty well for them thus far. ;)

    Because the counter-claim could damage Nvida more than it helps them. This is why failing companies usually resort to patent litigation as the last resort (and not usually before).

    True.
     
  8. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,579
    Likes Received:
    4,799
    Location:
    Well within 3d
    Okay then, I guess that makes sense.

    Perhaps the threat would be enough to get more leverage.
    Patent litigation seems pretty hard-core these days. In other fields, more equal competitors have duked it out over patents, and the use of injunctions is more common.
    If we rule out Nvidia, there are a number of minor GPU players besides AMD and Nvidia that could possibly try something. Given their increasingly marginal roles, they may have less to lose.

    Perhaps AMD or Nvidia can invest in them as they launch their patent cases, as AMD did with Transmeta.
    A settlement on the order of the Intergraph or Transmeta settlements would mean Larrabee would be that much farther from break-even. If they can somehow manage an injunction, they could retard Larrabee's uptake and buy time for the GPUs to narrow the process gap, perhaps by a transition to the 40nm half-node or even an early trickle of 32nm product if the impasse lasts.
     
  9. TimothyFarrar

    Regular

    Joined:
    Nov 7, 2007
    Messages:
    427
    Likes Received:
    0
    Location:
    Santa Clara, CA
    Arun,

    Wouldn't it be more likely that vendors simply keep texture filtering for FP32+ textures as is (lower precision on the blend weight computations)? For example, NVidia, according to the CUDA docs, seems to use just 9 bits fixed point with 8 bits of fractional value when computing bilinear filtering weights for texture filtering regardless of texture type. So if you want accurate FP32 bilinear filtering (say for GPGPU stuff) you already have to roll your own FP32 texture filtering in the shader.
     
  10. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,579
    Likes Received:
    4,799
    Location:
    Well within 3d
    The relative costs can be different, though.
    Let's say AMD kept some kind of tesselation or geometry amplification unit in the future.

    The current unit in R6xx can in select instances amplify geometry to the point that it is likely that the rest of the chip can't keep up.
    (The unit or future tesselation hardware may never catch on, but just for argument's sake...)
    On the other hand, is the unit really all that large?

    If we instead force a chunk of the more generalized hardware to emulate this, there might not be a clear bottleneck as much as there is lower peak execution.

    So what if one sliver of the GPU idles when it saturates the rest of the core? Isn't it better than the rest of the core spending dozens of cycles emulating it, instead of accomplishing other work?

    There are ways to power down idle units, but there's no way to idle units that are spinning their wheels synthesizing similar functionality through multiple cycles.
     
  11. ArchitectureProfessor

    Newcomer

    Joined:
    Jan 17, 2008
    Messages:
    211
    Likes Received:
    0
    This is certainly a constant threat.

    There is another thing about our current patent system that annoys me (as an academic, especially). Companies are discouraged from discussing the details of its products, because if the company says too much, then that can be used against them in court if they are sued for patent infringement. I heard a story about a multithread chip that IBM design in the mid-1990s. They received a single letter from some small firm that said something like: "you may or may not infringe on our patent". That was enough for IBM management to put a moratorium on any public disclosure of how part of the chip work. This, of course, really annoyed the engineers, because they wanted to be able to talk about what they did. They were eventually able to talk more about it, but it delayed when they could disclose things by a year or two. Sort of sad, in my opinion.


    Interestingly, as was pointed out earlier, the less fixed-function logic that Larrabee has, the less likely it is to infringe on NVIDIA patents. It would be hard for NVIDIA to claim that a many-core x86 chip with cache coherence somehow infringes on their patents. :) I guess some of the vector instructions might have arithmetic operations that could be patented, but those might be easier to work around.
     
  12. ArchitectureProfessor

    Newcomer

    Joined:
    Jan 17, 2008
    Messages:
    211
    Likes Received:
    0
    This all depends on the relative gain (in terms of area or power efficiency) of brute-force custom fixed-function hardware vs more flexible software implementation. If the software does the exact same calculation in a general CPU, I could easily see it being 10x or more in some cases. Once you add special instructions to the CPU, the gap should narrow. Once you tune the software algorithm to exploit more irregular algorithms that more general hardware can support, the gap might disappear all together.

    Of course, this all depends on the specific function and such, but I think the trade-space is pretty complicated.

    Man, it would have been really fun to work on designing Larrabee (especially if it is a technical success).
     
  13. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,579
    Likes Received:
    4,799
    Location:
    Well within 3d
    Corporate legal fears certainly make being a spectator far more boring these days.
    Look at how scant the POWER6 data was/is.

    Sure, that was my contention. If you can reasonably expect there is a minefield in a certain direction, it may be prudent to go the long way around.

    Whether that is the most optimal path there could be from an engineering perspective is something separate.
     
  14. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,538
    Likes Received:
    1,911
    Location:
    London
    Depends on whether a hardware patent is also interpreted as a software patent, doesn't it?

    Jawed
     
  15. ArchitectureProfessor

    Newcomer

    Joined:
    Jan 17, 2008
    Messages:
    211
    Likes Received:
    0
    Yep.

    FYI, I recently saw an article on Power6 that said quite a bit, but Power6 had been shipping for quite a while before that came out. Interestingly, Power6 is in-order, whereas Power4 and Power5 were both out-of-order.
     
  16. ArchitectureProfessor

    Newcomer

    Joined:
    Jan 17, 2008
    Messages:
    211
    Likes Received:
    0
    I good point. I wonder how much software patents impact hardware patents. I suspect that something like the patent on the RSA cryptographic algorithm (now expired) would effect either hardware or software. I really don't know how such a thing would shake down.
     
  17. aaronspink

    Veteran

    Joined:
    Jun 20, 2003
    Messages:
    2,641
    Likes Received:
    64
    The past experience on TSMC's quoted performance vs delivered performance isn't good. I haven't seen anything which would give TSMC density advantage except for some random numbers that don't have anything to do with density in real life. And I don't ever believe TSMC's delivery dates for anything but FPGAs.

    The "things could change" line for TSMC has historically been "things will change for the worst". They do a good job but there haven't been able to demonstrate that they can actually produce they're claims yet.

    Aaron Spink
    speaking for myself inc.
     
  18. Nick

    Veteran

    Joined:
    Jan 7, 2003
    Messages:
    1,881
    Likes Received:
    17
    Location:
    Montreal, Quebec
    Yes.
    I beg to differ. According to rumors, Larrabee's in-order cores are based on the age-old P5 architecture. Of course it takes some effort to extend it with x86-64, the SIMD units and the four-way SMT, but once that's done (and it has been done before) it's just mainly a matter of scaling the number of cores and tweaking some parameters. The really big differences will be in the software, as Larrabee will be capable of doing rasterization, raytracing, physics, etc. all with relatively high efficiency.

    If you were to argue that NetBurst is a big departure from more 'conventional' x86 architectures then I would have to agree. But Pentium M, Core and Core 2 all build on the P6 architecture and just 'tweaked' some parameters between these generations (the main limitation being cost).

    So I believe that future Larrabee versions will be mainly 'bigger and better', but not total redesigns. Relative to that GPUs have undergone major changes in architecture over the years. Taking the next leap in capabilities and performance takes years because the whole architecture changes. With CPUs things seem a lot more incremental and it's easier to transition to smaller process nodes.

    But feel free to disagree. I'm just exploring another reason why Larrabee (II) might survive against G100/R700...
     
  19. nAo

    nAo Nutella Nutellae
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    4,397
    Likes Received:
    430
    Location:
    San Francisco
    What does make a P5 core + SIMD efficient at rasterization?
    Larrabee II????
     
  20. Nick

    Veteran

    Joined:
    Jan 7, 2003
    Messages:
    1,881
    Likes Received:
    17
    Location:
    Montreal, Quebec
    Going from Pentium M to Core 2 the only 'major' changes are doubling the L1 cache bus width to 128 bit, doubling the width of the SSE execution units, and issuing four operations per clock. From a high level point of view doubling the number of cores when the transistor budget doubles isn't exactly revolutionsary. It still has caches, decoders, register renaming, reorder buffers, retirement buffers, TLBs, branch predictors, you name it. But if we look at G70 versus G80 we hardly find the same building blocks. Texture samplers are separate from shader pipelines, vertex and pixel shader units are unified, SIMD units are scalar, interpolators and transcendental function evaluation share the same logic, granularity is way lower, etc.

    Likewise, Nehalem will have little surprises (re-introducing Hyper-Threading and including a memory controller like AMD), while I expect G100 to be revolutionary instead of evolutionary.
    True, but my point is that Larrabee might evolve even more rapidly.
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...