Grid 2 has exclusive Haswell GPU features

Discussion in 'Architecture and Products' started by Davros, May 29, 2013.

  1. Andrew Lauritzen

    Moderator Veteran

    Joined:
    May 21, 2004
    Messages:
    2,526
    Likes Received:
    454
    Location:
    British Columbia, Canada
    Heh sure, it was just my silly way to note that nothing is impossible on any GPU since you can do turing complete things with multi-pass... thus performance is always the relevant metric.

    I believe they had a DX11 linked list pass as well but did not ship it because it's just too slow.

    It really is a lot slower. Try it yourself, the demo has both paths: http://software.intel.com/en-us/blo...ency-approximation-with-pixel-synchronization

    Furthermore even if you store an entire linked list, DX11 OIT style, it's still faster to run the resulting list through the AOIT algorithm rather than sort it. Per-pixel sorting is irrationally expensive and linked lists have terrible memory access patterns.
     
  2. CarstenS

    Veteran Subscriber

    Joined:
    May 31, 2002
    Messages:
    4,797
    Likes Received:
    2,056
    Location:
    Germany
    That download doesn't work for me, always stops at 50.4 out of 58.6 MB. Any chance to fix this? Or is it just me?
     
  3. Kaarlisk

    Regular Newcomer Subscriber

    Joined:
    Mar 22, 2010
    Messages:
    293
    Likes Received:
    49
    Same happens to me.
     
  4. Andrew Lauritzen

    Moderator Veteran

    Joined:
    May 21, 2004
    Messages:
    2,526
    Likes Received:
    454
    Location:
    British Columbia, Canada
    Weird, works fine for me (in Chrome). Try again and if it still happens let me know and I'll bother someone to look into it.
     
  5. Paran

    Regular Newcomer

    Joined:
    Sep 15, 2011
    Messages:
    251
    Likes Received:
    14
    I downloaded successfully with my download manager. It didn't work with Firefox.
     
  6. CarstenS

    Veteran Subscriber

    Joined:
    May 31, 2002
    Messages:
    4,797
    Likes Received:
    2,056
    Location:
    Germany
    Didn't work from home either. Tried Chrome and chrome-based Iron.
    Internet Explorer did the trick though, but only after resuming the Download at 50.5 MB, where the other browsers thought they were finished already. Strange.
     
  7. NThibieroz

    Newcomer

    Joined:
    Jun 8, 2013
    Messages:
    31
    Likes Received:
    8
    The comparison is not valid. AOIT is a lossy OIT algorithm whereas a fragment sort gives you correct ordering. A better comparison would be to compare AOIT with a K-nearest fragment sort, whereby only the first K-fragments are sorted and remaining ones composited (or blended out of order). Both approaches have relative merits and drawbacks depending on situations.
     
  8. Andrew Lauritzen

    Moderator Veteran

    Joined:
    May 21, 2004
    Messages:
    2,526
    Likes Received:
    454
    Location:
    British Columbia, Canada
    True, but the visual difference is negligable with as few as 4 notes (see the paper for more results). Even 2 nodes is usually fine especially if you have even a rough sort (which most games do).

    Being an approximation doesn't make a comparison invalid, it just means you have to compare both the image error and the performance. Hell blending pretty much anything (particles, hair, etc) is a huge approximation already compared to reality, so it's hard to argue on a theoretical purity level. And as we all know, game developers don't really care about ground truth anyways as long as it looks good, behaves well and is fast :)

    Anyways my only point there was to emphasize how expensive sorting fragments is on GPUs. It's not a particularly SIMD-friendly algorithm, particularly with linked lists. I think the fragment sorting thing could be made to work better in a sort-middle architecture than an IMR to be honest, as then you could use local memory and organize it a lot better than linked lists. It's quite unfortunate that the DX/UAV/IMR model has forced us into the global atomics/scatter solution.

    Now to be fair I'm a linked-list hater even on the CPU (where they are less bad), but I'm in good company judging from the game dev twitter conversation the other day :)

    For a given K/storage size, AOIT already gets you a better result than a K-buffer (arguably a K-buffer is just a different replacement strategy). The key insight is that "nearest" isn't the greatest heuristic in a lot of cases if the transmittance of those fragments is very high. It's better to optimize for the error in transmittance over the curve (i.e. contribution to the final pixel) directly. Again, this is all covered in the paper from HPG 2011.

    That said, even simpler heuristics work pretty well in practice. I think Marco's upcoming paper will discuss some of that in more detail as well.
     
    #148 Andrew Lauritzen, Aug 8, 2013
    Last edited by a moderator: Aug 8, 2013
  9. Davros

    Legend

    Joined:
    Jun 7, 2004
    Messages:
    14,859
    Likes Received:
    2,277
    Just an update
    F1 2015 has intel specific features
    Advanced Smoke and Blended Skidmarks
    not sure if these are done on the cpu or igp
     
    Grall likes this.
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...