Grid 2 has exclusive Haswell GPU features

Andrew Lauritzen · Aug 4, 2013

3dcgi said:
Obviously that's hyperbole as there are interactive demos with OIT and ray tracing.

Heh sure, it was just my silly way to note that nothing is impossible on any GPU since you can do turing complete things with multi-pass... thus performance is always the relevant metric.

3dcgi said:
It would have been interesting if GRID 2 supported a vanilla DX11 method so we could see the performance difference.

I believe they had a DX11 linked list pass as well but did not ship it because it's just too slow.

3dcgi said:
Even if it's significantly slower (and I'm not convinced as to the level of significance yet) high end cards can brute force it.

It really is a lot slower. Try it yourself, the demo has both paths: http://software.intel.com/en-us/blo...ency-approximation-with-pixel-synchronization

Furthermore even if you store an entire linked list, DX11 OIT style, it's still faster to run the resulting list through the AOIT algorithm rather than sort it. Per-pixel sorting is irrationally expensive and linked lists have terrible memory access patterns.

CarstenS · Aug 6, 2013

That download doesn't work for me, always stops at 50.4 out of 58.6 MB. Any chance to fix this? Or is it just me?

Kaarlisk · Aug 6, 2013

Same happens to me.

Andrew Lauritzen · Aug 6, 2013

Weird, works fine for me (in Chrome). Try again and if it still happens let me know and I'll bother someone to look into it.

Paran · Aug 6, 2013

I downloaded successfully with my download manager. It didn't work with Firefox.

CarstenS · Aug 6, 2013

Didn't work from home either. Tried Chrome and chrome-based Iron.
Internet Explorer did the trick though, but only after resuming the Download at 50.5 MB, where the other browsers thought they were finished already. Strange.

NThibieroz · Aug 8, 2013

Andrew Lauritzen said:
Furthermore even if you store an entire linked list, DX11 OIT style, it's still faster to run the resulting list through the AOIT algorithm rather than sort it.

The comparison is not valid. AOIT is a lossy OIT algorithm whereas a fragment sort gives you correct ordering. A better comparison would be to compare AOIT with a K-nearest fragment sort, whereby only the first K-fragments are sorted and remaining ones composited (or blended out of order). Both approaches have relative merits and drawbacks depending on situations.

Andrew Lauritzen · Aug 8, 2013

NThibieroz said:
The comparison is not valid. AOIT is a lossy OIT algorithm whereas a fragment sort gives you correct ordering.

True, but the visual difference is negligable with as few as 4 notes (see the paper for more results). Even 2 nodes is usually fine especially if you have even a rough sort (which most games do).

Being an approximation doesn't make a comparison invalid, it just means you have to compare both the image error and the performance. Hell blending pretty much anything (particles, hair, etc) is a huge approximation already compared to reality, so it's hard to argue on a theoretical purity level. And as we all know, game developers don't really care about ground truth anyways as long as it looks good, behaves well and is fast

Anyways my only point there was to emphasize how expensive sorting fragments is on GPUs. It's not a particularly SIMD-friendly algorithm, particularly with linked lists. I think the fragment sorting thing could be made to work better in a sort-middle architecture than an IMR to be honest, as then you could use local memory and organize it a lot better than linked lists. It's quite unfortunate that the DX/UAV/IMR model has forced us into the global atomics/scatter solution.

Now to be fair I'm a linked-list hater even on the CPU (where they are less bad), but I'm in good company judging from the game dev twitter conversation the other day

NThibieroz said:
A better comparison would be to compare AOIT with a K-nearest fragment sort, whereby only the first K-fragments are sorted and remaining ones composited (or blended out of order).

For a given K/storage size, AOIT already gets you a better result than a K-buffer (arguably a K-buffer is just a different replacement strategy). The key insight is that "nearest" isn't the greatest heuristic in a lot of cases if the transmittance of those fragments is very high. It's better to optimize for the error in transmittance over the curve (i.e. contribution to the final pixel) directly. Again, this is all covered in the paper from HPG 2011.

That said, even simpler heuristics work pretty well in practice. I think Marco's upcoming paper will discuss some of that in more detail as well.

Davros · Aug 5, 2015

Just an update
F1 2015 has intel specific features
Advanced Smoke and Blended Skidmarks
not sure if these are done on the cpu or igp

Grid 2 has exclusive Haswell GPU features

Andrew Lauritzen

Moderator

CarstenS

Moderator

Kaarlisk

Andrew Lauritzen

Moderator

Paran

CarstenS

Moderator

NThibieroz

Andrew Lauritzen

Moderator

Davros

Similar threads