22 nm Larrabee

Discussion in 'Architecture and Products' started by Nick, May 6, 2011.

Tags:
  1. denev2004

    Newcomer

    Joined:
    Apr 28, 2010
    Messages:
    143
    Likes Received:
    0
    Location:
    China
    Knights Ferry with 32nm is just 32 core @ 1.2Ghz with only 500GFlops DP, less than today's Tesla C2070 and Power 7 4 MCM with 1T DP
    So how much do you believe 22nm can give, only by saying that LRBni is more easy to program?
     
  2. denev2004

    Newcomer

    Joined:
    Apr 28, 2010
    Messages:
    143
    Likes Received:
    0
    Location:
    China
  3. DavidC

    Regular

    Joined:
    Sep 26, 2006
    Messages:
    347
    Likes Received:
    24
    That's not correct. Knights Ferry is 45nm.
     
  4. rpg.314

    Veteran

    Joined:
    Jul 21, 2008
    Messages:
    4,298
    Likes Received:
    0
    Location:
    /
    http://forum.beyond3d.com/showpost.php?p=1531013&postcount=42
     
    #104 rpg.314, May 22, 2011
    Last edited by a moderator: May 22, 2011
  5. rpg.314

    Veteran

    Joined:
    Jul 21, 2008
    Messages:
    4,298
    Likes Received:
    0
    Location:
    /
    Hardly.

    http://portal.acm.org/citation.cfm?id=1516530
     
  6. RacingPHT

    Newcomer

    Joined:
    May 27, 2006
    Messages:
    90
    Likes Received:
    8
    Location:
    Shanghai
    I believe software is always smarter than hardware. If the hardware has an correct implementation, so does software.

    The problem is software is rarly optimized for bandwidth alone. In the CPU world, bandwidth/compute ratio is almost always higher.

    Let's say tessellate/displace a patch way more than twice is a dumb idea. So an implementation must always spill the vertices out to memory for neighbor tiles just like Reyes. That sounds dumb too.

    Or let's say spill into memory is a dumb idea. Then even if no single patch overlapps tile boundarys, because of sorting, every patch has to be tessellated/displaced exactly twice, otherwise is unclear which tile it touches.

    TBDR's idea relies on one assumption: pixel R/W bandwidth is larger than primitive parameter W/R bandwidth. As we all know that a pixel is much smaller than a vertex, if the vertex approaches pixel size, I can see no efficiency of TBDR. Maybe it's me that's dumb.
     
    #106 RacingPHT, May 22, 2011
    Last edited by a moderator: May 22, 2011
  7. rpg.314

    Veteran

    Joined:
    Jul 21, 2008
    Messages:
    4,298
    Likes Received:
    0
    Location:
    /
    Well, for a TBDR to win with highly tessellated geometry, it has to dump the compressed version of geometry (ie raw patches) after spatial binning to memory and tessellate them on chip. Off hand, I can't see any other way that TBDR will win.

    Assuming it can be done efficiently by a clever hw/sw combination, a TBDR can certainly win, and win big with ginormous tesselation.
     
  8. RacingPHT

    Newcomer

    Joined:
    May 27, 2006
    Messages:
    90
    Likes Received:
    8
    Location:
    Shanghai
    That's a common practice for software parallel renderer already IMO.

    Did you count in the cost of re-tessellate and vertex shading just because one patch overlaps multiple tiles?

    Thanks for the great paper!

    But I'm highly suspicious about whether the paper descipts a proven technique, Expecially, can it handle aribitary real-world vertex shaders, with arbiraty displacment map funcion? The paper seems uncertain about that, too. Plus, the cost of analyzing/converting a displacement map to an interval texture in real-time within at most a fragment of a milisecond in driver seems highly impossible to me. Don't ignore the case that the displacement map can be procedurelly generated each frame as in water rendering and simulations.

    Anyway the method described is quite novel, And IMO it should be quite good for an offline renderer, rather than of a very responsiveness program: graphics driver. DX11 provides no way for user to hint the graphics driver about this, thus I considered it as an immediate mode API. User(us graphics programmers) knows best about our data and shaders, not graphics drivers. Maybe OpenGL can be exented better, but in case of Larrabee, it has to be working well in DX11.
     
    #108 RacingPHT, May 22, 2011
    Last edited by a moderator: May 22, 2011
  9. rpg.314

    Veteran

    Joined:
    Jul 21, 2008
    Messages:
    4,298
    Likes Received:
    0
    Location:
    /
    They say pretty clearly in the abstract itself that they can haandle arbitrary shaders and arbitrary displacement maps.
    Water might be an exception, but caching shaders should work fine in a large number of cases.
     
  10. denev2004

    Newcomer

    Joined:
    Apr 28, 2010
    Messages:
    143
    Likes Received:
    0
    Location:
    China
    Is it changed during the midway? I haven't focus on it for times, it is 32nm base on Intel's words in ISC2010 and now I can find two expression through Google
     
  11. RacingPHT

    Newcomer

    Joined:
    May 27, 2006
    Messages:
    90
    Likes Received:
    8
    Location:
    Shanghai
    I'll be very curious how they get this conclusion.

    For one thing, they assume "differentiable functions". What if a function is not differentiable? as simple as a step(x,y) in HLSL is not differentiable. And to an extremme, what about an integer vertex shader with bitwise logical instructions?

    For another, they didn't show the cost of the analysis in terms of time, memory, etc. In the real-time context, this is even more important than the math itself. Just said "on the fly" is not enough. The fly might be from China to US.

    Last thing, they seem to make things overly complexed while in some other part was too brief. It sounds like a hell for me, as a programmer, to implement that paper.

    In my opinion, culling can be much simpler if done by USERs than in the API. Many people is already doing culling in their hull shaders, far simpler because they know their data and code.
     
    #111 RacingPHT, May 22, 2011
    Last edited by a moderator: May 22, 2011
  12. rpg.314

    Veteran

    Joined:
    Jul 21, 2008
    Messages:
    4,298
    Likes Received:
    0
    Location:
    /
    And just how realistic is that scenario?
     
  13. RacingPHT

    Newcomer

    Joined:
    May 27, 2006
    Messages:
    90
    Likes Received:
    8
    Location:
    Shanghai
    Sure, it's not realistic, but it's possible. As an graphics driver, you should support all possibilities within specifications. You can't fail to compile a program, for example, just because it's not realistic or doesn't makes sense.

    No offense, but what about the Taylor series of this function:
    float InvSqrt (float x){
    float xhalf = 0.5f*x;
    int i = *(int*)&x;
    i = 0x5f3759df - (i>>1);
    x = *(float*)&i;
    x = x*(1.5f - xhalf*x*x);
    return x;
    }
    It's creativeness.

    On the other hand, culling by developers themselves is much easier: everything is under control. If someone didn't make sense, you could just fire the guy.
     
  14. Mintmaster

    Veteran

    Joined:
    Mar 31, 2002
    Messages:
    3,897
    Likes Received:
    87
    That doesn't take away from my point. Tesselation hurts a TBDR more than an IMR. Including all the advantages of TBDR and inefficiencies of Larabee's generalized architechture, if it was barely competitive in DX9, it would be much less so in DX11.

    Geometry will always cost more on a TBDR than an IMR. There's no way around that. So no, it can't win big with high tessellation unless it's the pixel workload that is giving it the advantage (or, naturally, if you only give the TBDR an optimation that is equally valid for the IMR).

    True, but interval textures aren't free. And how about random value functions using bit masks? Procedural noise? It's an interesting paper, though.
     
    #114 Mintmaster, May 22, 2011
    Last edited by a moderator: May 22, 2011
  15. rpg.314

    Veteran

    Joined:
    Jul 21, 2008
    Messages:
    4,298
    Likes Received:
    0
    Location:
    /
    Culling be devs helps a tbdr and an imr equally.
     
  16. RacingPHT

    Newcomer

    Joined:
    May 27, 2006
    Messages:
    90
    Likes Received:
    8
    Location:
    Shanghai
    True. But my point is: DX11 doesn't give tile infomation when you are trying to do culling, so there's no chance for a developer to cull a patch based on a tile rect.

    Think about a crazy idea: By offering a new SV_TileRect to the hull shader, a user is able to cull the patch if the test fails. For TBDR, this value is the dimention of the current tile. For IMR, it's the viewport size.

    But no,there's no such chances for Larrabee and PowerVR. The Larabee now has to rely on itself to cull the patchs and do some hero programming to achieve that!
     
  17. Mintmaster

    Veteran

    Joined:
    Mar 31, 2002
    Messages:
    3,897
    Likes Received:
    87
    You can almost cull patches quite easily in hull shader. Just give it low tesselation factors, so that it's cheap to discard.
     
  18. RacingPHT

    Newcomer

    Joined:
    May 27, 2006
    Messages:
    90
    Likes Received:
    8
    Location:
    Shanghai
    True, just use 0 then the GPU will discard the eintire patch.

    I said DX11 is a IMR API, because, when a patch is replayed by an TBDR GPU, there's no way to know what tile we are currently in. Because of that, the developer can only cull the patch against the entire viewport, not the current tile.

    That's why I suggest SV_TileRect as a system value, because it makes it consistent for developers to help both IMR and TBDR to cull a patch.
     
    #118 RacingPHT, May 23, 2011
    Last edited by a moderator: May 23, 2011
  19. Simon F

    Simon F Tea maker
    Moderator Veteran

    Joined:
    Feb 8, 2002
    Messages:
    4,560
    Likes Received:
    157
    Location:
    In the Island of Sodor, where the steam trains lie
  20. Nick

    Veteran

    Joined:
    Jan 7, 2003
    Messages:
    1,881
    Likes Received:
    17
    Location:
    Montreal, Quebec
    As an intermediate step I believe it can indeed make sense to keep the IGP around for a while. Software is only slowly becoming more generic, and extending the vector processing capabilities of the CPU while retaining a minimal cost adequate IGP would be a really low-risk way to prepare for the future while not compromising legacy graphics.

    Note that AVX appears to be part of a bigger plan. Extending the CPU's SIMD capabilities to 256-bit must have been expensive both in terms of transistors and in terms of design time. But they've already specified FMA and the ability to extend it up to 1024-bit. So they don't appear to think the IGP is suitable for anything beyond legacy graphics. They must realize that running GPGPU applications on the IGP is an absolute joke. And making the IGP more flexible and powerful only converges it closer to the CPU architecture, while merely benefiting a handful of applications. Gradually making the CPU more powerful instead and reducing power consumption makes more sense in my opinion.

    The beauty of the upgrade path for AVX is that there's not much need for Intel to rush things. They can observe how the software evolves, and carefully implement the features that make the most sense going forward.
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...