Grid 2 has exclusive Haswell GPU features

Discussion in 'Architecture and Products' started by Davros, May 29, 2013.

  1. nAo

    nAo Nutella Nutellae
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    4,325
    Likes Received:
    93
    Location:
    San Francisco
    OIT is done by just adding some code to your shader so it interacts with MSAA the same way any other shader does, there is nothing special about it.
    It makes more sense to say that MSAA makes shading more expensive since it increases the number of fragments that contribute to a given pixel.
     
  2. nAo

    nAo Nutella Nutellae
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    4,325
    Likes Received:
    93
    Location:
    San Francisco
    From IVB to HSW the number of threads per EU when from 8 to 7, see developer guide:
    http://download-software.intel.com/...tion_Core_Graphics_Developers_Guide_Final.pdf

    Since IVB each EU has two 4-wide SIMD pipelines and can execute up to 16 floating point operations per clock (one 4-wide multiply-add per pipeline).
     
  3. Frontino

    Newcomer

    Joined:
    Feb 21, 2008
    Messages:
    84
    Likes Received:
    0
    Thanks for the link.
    I'm not getting the 320 and 640 flops/cycle, though. With 7 threads they should be 280 and 560, no?
     
  4. Pete

    Pete Moderate Nuisance
    Moderator Veteran

    Joined:
    Feb 7, 2002
    Messages:
    4,925
    Likes Received:
    315
    20 EUs * (16 FLOPS / EU) = 320 FLOPS
    40 * 16 = 640

    FLOPS are based on available hardware (ALUs) per clock. Threads are executed one at a time, not simultaneously. Each EU keeps multiple threads in flight to have one to swap to when the current one stalls (think hyper-threading on Intel CPUs).
     
  5. Frontino

    Newcomer

    Joined:
    Feb 21, 2008
    Messages:
    84
    Likes Received:
    0
    Were the Sandy Bridge 6 EU models 16 flops/cycle too?
     
  6. Pete

    Pete Moderate Nuisance
    Moderator Veteran

    Joined:
    Feb 7, 2002
    Messages:
    4,925
    Likes Received:
    315
  7. Andrew Lauritzen

    Moderator Veteran

    Joined:
    May 21, 2004
    Messages:
    2,526
    Likes Received:
    454
    Location:
    British Columbia, Canada
    If I remember correctly, in Sandy Bridge the second pipeline could do mul or add, but not mad... hence the 8+4 = 12/clock. In Ivy Bridge and beyond both can do full 4-wide mad.
     
  8. Rakehell

    Newcomer

    Joined:
    Jul 25, 2013
    Messages:
    10
    Likes Received:
    0
    So that's how they were able to get "comparable framerates" to the GT 650M.
     
  9. Andrew Lauritzen

    Moderator Veteran

    Joined:
    May 21, 2004
    Messages:
    2,526
    Likes Received:
    454
    Location:
    British Columbia, Canada
    Read any other post in the thread maybe... although I do enjoy the odd demonstration of fanboy-level confirmation bias ;)
     
  10. Rakehell

    Newcomer

    Joined:
    Jul 25, 2013
    Messages:
    10
    Likes Received:
    0
    Saying Intel GPUs are slow makes one a fanboy? You're on the defense tonight.
     
  11. Andrew Lauritzen

    Moderator Veteran

    Joined:
    May 21, 2004
    Messages:
    2,526
    Likes Received:
    454
    Location:
    British Columbia, Canada
    Except that's not what you said at all... you replied to the original post in the thread that was completely wrong (AVX? lol) and implied there was some sort of special performance optimization going on only on Intel which is blatantly untrue.

    In reality, I'm pretty sure you already have the position that "Intel GPUs are slow" and you just look around for anything that confirms that position and ignore anything else. In this case it was just particularly funny since the information you found was incorrect/unrelated (and didn't even make any sense) but you didn't even read the very next post before assuming it was true since it fell in line with what you want to think.

    If you want to learn, please feel free to read the thread and ask any questions. If you're just going to mess up a good technical discussion with ignorant comments, there's always neogaf ;)

    But maybe I misunderstood your first post here. Happy to be convinced by your next few that I'm wrong about you and you have something interesting to contribute :)
     
    #131 Andrew Lauritzen, Jul 30, 2013
    Last edited by a moderator: Jul 30, 2013
  12. Paran

    Regular Newcomer

    Joined:
    Sep 15, 2011
    Messages:
    251
    Likes Received:
    14
    From Research to Production, How AVSM and AOIT made their way into games: http://software.intel.com/sites/default/files/From-Research-to-Production-final.pdf

    AOIT Sample: http://software.intel.com/en-us/blo...ency-approximation-with-pixel-synchronization
    AVSM Sample: http://software.intel.com/en-us/blogs/2013/03/27/adaptive-volumetric-shadow-maps

    AOIT seems to work mainly on foliage in Grid 2 and in the sample tool only on foliage. Chainlink fences as well in Grid 2. AOIT theoretically works on all kind of transparency textures or is there a limitation?
     
  13. Andrew Lauritzen

    Moderator Veteran

    Joined:
    May 21, 2004
    Messages:
    2,526
    Likes Received:
    454
    Location:
    British Columbia, Canada
    Yes it works on anything you want to blend. Just in Grid 2 foliage and chain link fences were what they typically do with alpha test/alpha-to-coverage and AOIT provides a much nicer image than those.

    Games tend to be designed to minimize blending *because* of the OIT problem. Going forward I imagine techniques like AOIT will allow artists to use more blended things than they have been able to in the past. The response from everyone so far has been that they really enthusiastically want/need these features and would like to see them on other platforms too (specifically the new consoles), so I think we'll gradually see other hardware support them too.
     
    #133 Andrew Lauritzen, Jul 30, 2013
    Last edited by a moderator: Jul 30, 2013
  14. Paran

    Regular Newcomer

    Joined:
    Sep 15, 2011
    Messages:
    251
    Likes Received:
    14
    This would be ideal for deferred lighting games without proper MSAA support. OIT would help for all the flickering foliage and other transparency stuff and for polygon smoothing some PP-AA would do it, preferably PP-AA with high detail preservation like SMAA. Or 2xSSAA combinated with OIT, although this wouldn't be useful for integrated graphics. A shame that Nvidia and AMD don't support OIT. Microsoft should make this mandatory in a future directx revision.
     
  15. 3dcgi

    Veteran Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    2,432
    Likes Received:
    261
    Any DX11 part supports OIT. They just don't support it in exactly the same way.
     
  16. Paran

    Regular Newcomer

    Joined:
    Sep 15, 2011
    Messages:
    251
    Likes Received:
    14

    Sure but nobody does it over DX11.
     
  17. nAo

    nAo Nutella Nutellae
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    4,325
    Likes Received:
    93
    Location:
    San Francisco
    That's not quite correct.
    Any DX11 GPU allow to "record" all fragments that contribute to a pixel into a variable size data structure (e.g. a list). Once you have such data you can do pretty much whatever you want with it, including sorting it and compositing it for OIT.

    The main drawback of these methods is that the more transparent stuff you render the memory you need, so that it's hard to determine how much memory one should allocate in advanced for it.
    Too much and you waste it, too little and parts of your transparent geometry won't appear on the screen. Also sorting a lot of fragments per pixel can be inefficient and generate not-so-predictable & stable performance.

    Alternative methods based on pixel synchronization (we developed one, but I am sure ISVs will come up with many others) allow to compute an approximate OIT solution as you render the transparent geometry into a fixed sized memory buffer, which makes the algorithm use a known amount of memory (e.g. 16 bytes per pixel) and also provides predictable/stable performance. Changing the amount of memory one allocates for each pixel makes possible to trade off image quality for performance (i.e. higher quality -> lower performance).
     
  18. Paran

    Regular Newcomer

    Joined:
    Sep 15, 2011
    Messages:
    251
    Likes Received:
    14

    I haven't said it wouldn't be possible on DX11, I was under the impression that it wouldn't make sense for efficiency/performance reasons and hence why no dev did go this route.
     
  19. Andrew Lauritzen

    Moderator Veteran

    Joined:
    May 21, 2004
    Messages:
    2,526
    Likes Received:
    454
    Location:
    British Columbia, Canada
    Yeah but that's like saying any GPU supports ray tracing by rendering 1x1 viewports ;)
     
  20. 3dcgi

    Veteran Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    2,432
    Likes Received:
    261
    Obviously that's hyperbole as there are interactive demos with OIT and ray tracing. At least one workstation app implemented OIT as well. Just noting it for others that don't know.

    It would have been interesting if GRID 2 supported a vanilla DX11 method so we could see the performance difference. Even if it's significantly slower (and I'm not convinced as to the level of significance yet) high end cards can brute force it.
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...