Larrabee at GDC 09

Discussion in 'Architecture and Products' started by bowman, Feb 16, 2009.

  1. Ailuros

    Ailuros Epsilon plus three
    Legend Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    9,435
    Likes Received:
    181
    Location:
    Chania
    That sounds way more reasonable ;) Keep in mind that IHVs have to also respect the threshold specific existing and future protocols will set; an external power supply for instance would be a ridiculous idea.

    It sounds hard to find something that would be as lossless as today's algorithms, be very efficient in terms of memory and bandwidth consumption and all of that being done mostly via sw.

    By the way you forgot one further major headache for IMRs with MRTs + MSAA: translucency.
     
    #101 Ailuros, Apr 2, 2009
    Last edited by a moderator: Apr 2, 2009
  2. bowman

    Newcomer

    Joined:
    Apr 24, 2008
    Messages:
    141
    Likes Received:
    0
    Well, I was thinking more of a DIY overclocking take on things. Current graphics cards don't have much headroom, not without a bit of soldering anyway.
     
  3. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    10,561
    Likes Received:
    601
    Location:
    New York
    Oh, sort of like the free reign we have with over-volting/overclocking CPUs? Yeah that would be nice to have in GPU-land too.
     
  4. Nick

    Veteran

    Joined:
    Jan 7, 2003
    Messages:
    1,881
    Likes Received:
    17
    Location:
    Montreal, Quebec
    I don't see how you come to that conclusion. Just because a square millimeter of a programmable chip is slower at rasterization than a square millimeter of dedicated hardware doesn't mean the entire chip is slower than the few square millimeter that a classic GPU spends on rasterization logic.
     
  5. crystall

    Newcomer

    Joined:
    Jul 15, 2004
    Messages:
    149
    Likes Received:
    1
    Location:
    Amsterdam
    Good point... After thinking a bit more about the problem I'm also not really sure on how Intel intends to deal with the tiles in L2. Their paper showed 3 back-end threads for each core but I doubt that 3 threads are sharing the same color/depth/MRT tiles, it would be impossible without some kind of synchronization.
     
  6. MfA

    MfA
    Legend

    Joined:
    Feb 6, 2002
    Messages:
    7,009
    Likes Received:
    536
    You can tile the tile.

    PS. probably being a bit too pithy there ... what I mean is you can do sort middle parallelization inside the tile, each thread would get it's own subset of quads inside the tile, so synchronization between the threads would not be an issue.
     
    #106 MfA, Apr 3, 2009
    Last edited by a moderator: Apr 3, 2009
  7. nAo

    nAo Nutella Nutellae
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    4,332
    Likes Received:
    119
    Location:
    San Francisco
    Not a good idea imho, you could easily end up having only one or two threads doing any real work.
     
  8. MfA

    MfA
    Legend

    Joined:
    Feb 6, 2002
    Messages:
    7,009
    Likes Received:
    536
    Easily maybe, but as long as it's not often it doesn't matter. You can get utilization almost arbitrarily high by increasing queue size.
     
  9. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    10,873
    Likes Received:
    767
    Location:
    London
    Jawed
     
  10. crystall

    Newcomer

    Joined:
    Jul 15, 2004
    Messages:
    149
    Likes Received:
    1
    Location:
    Amsterdam
    I had forgotten the bit about scoreboarding, thanks for the pointer.
     
  11. Ailuros

    Ailuros Epsilon plus three
    Legend Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    9,435
    Likes Received:
    181
    Location:
    Chania
    If I don't interpret Abrash's comments in the wrong way, it sounds to me like LRB's amount of cores as well as their frequencies being kept within specific boundries to not end up with insane power consumption values. Overclocking such a variant doesn't sound like a problem to me. As for the persentage of overclockability I guess we shouldn't forget that LRB is still a GPU and not a CPU. Meaning it'll most likely come down to how tolerant to much higher frequencies the texture co-processor (as a ff hw example) might be in the end. Of course could someone say that they might allow the driver to increase only the ALU clock domain, but on a typical GPU you hardly get nearly linear performance scaling unless you increase the frequency of most if not all parts of it.

    What I had in mind all this time would had been a way higher amount of cores at way higher than currently projected frequencies, in order to reach/exceed competing future GPU's gaming performance.
     
  12. spacemonkey

    Newcomer

    Joined:
    Jul 16, 2008
    Messages:
    163
    Likes Received:
    0
    fp

    What sort of sp and dp floating point performance can we expect from Larrabee? Will it be IEEE 754 compliant?
     
  13. DeanoC

    DeanoC Trust me, I'm a renderer person!
    Veteran Subscriber

    Joined:
    Feb 6, 2003
    Messages:
    1,469
    Likes Received:
    185
    Location:
    Viking lands
    In theory it can do 16 single IEEE float ops per cycle per core + the FPU. As the float op can be an FMAC, thats 32 per core per cycle (we will ignore the FPU as noise).

    For a 32 core larrabee, its 1024 per cycle, AKA 1 Terraflop per Ghz

    DP is a bit slower
     
  14. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,348
    Likes Received:
    3,879
    Location:
    Well within 3d
    The numbers last shown would indiate DP is roughly 1/2 as fast. A bit more than a bit slower, but stilll a sight better than the 1/5 to 1/12 as fast GPU DP is stuck at.
     
  15. Megadrive1988

    Veteran

    Joined:
    May 30, 2002
    Messages:
    4,664
    Likes Received:
    184
    A 48 core Larrabee @ 2 GHz should be pretty darn impressve in SP.
     
  16. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    10,561
    Likes Received:
    601
    Location:
    New York
    It should but is Intel really gonna get that kinda density out of the gate? Given Abrash's (perhaps meaningless) comment of above 1 teraflop I'm guessing it's going to be closer to 16 cores than it is to 48.
     
  17. CouldntResist

    Regular

    Joined:
    Aug 16, 2004
    Messages:
    264
    Likes Received:
    7
    http://www.pcper.com/article.php?aid=683

    This was supposed to be the new feature of DX10 (even the most important, according to Tim Sweeney, for example). More than two years have passed, and paged texture memory is still a promise of future?
     
  18. MfA

    MfA
    Legend

    Joined:
    Feb 6, 2002
    Messages:
    7,009
    Likes Received:
    536
    Maybe it's political? The GPU manufacturers would rather sell cards with ever more memory ...

    WDDM v2 seems to have quietly disappeared.
     
  19. ShootMyMonkey

    Veteran

    Joined:
    Mar 21, 2005
    Messages:
    1,177
    Likes Received:
    72
    Either that, or it's closer to 1 GHz than 2 GHz... or a little mix of less-than-expected in both. I would have figured that even for Intel, figures like 32 cores at 2 GHz seems a bit out of reach. They're not miracle-workers, and they still need to try and produce something competitive at a competitive price point. Commenting that you'll apparently see 1 TFLOP only says that they don't want to jump the gun by saying they'll get 32 cores @ 2 GHz for sure.

    Besides which, I'd have to raise an eyebrow on how they plan to keep something like that fed all the time. Maybe it's just my indefatigable pessimism at work here. Both Abrash's and Forsyth's talk tended to focus on ekeing out performance at the level of individual code blocks and/or individual cores without going very deeply into things like thread scheduling. That wasn't really the topic anyway, so it wasn't really a problem for those talks. I just wish that at some point or other, we had some more hard "big picture" info. Though I think Intel doesn't really have it either.
     
  20. Simon F

    Simon F Tea maker
    Moderator Veteran

    Joined:
    Feb 8, 2002
    Messages:
    4,560
    Likes Received:
    157
    Location:
    In the Island of Sodor, where the steam trains lie
    I don't think there was any performance penalty, but order independent translucency was removed from PC PowerVR devices because virtually no developer was using it. <shrug>
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...