Intel ARC GPUs, Xe Architecture for dGPUs [2018-2022]

Status
Not open for further replies.
Did you look at the tweets at all, or just go from the text? Look at the image, it's clearly one of the two recent games, it looks like Shadow.

FL1kjhlUcA8ABhX
seen the image, yes, but I can't recognise the game by that screengrab alone. You might have a point though, given the fact that I haven't finished the modern Tomb Raider games yet, the one where I advanced the most was Rise of the Tomb Raider when it was on pc gamepass, but I didnt complete it yet
 

Wow Intel has some really strong opinions on RT architecture and they mostly seem to be saying that AMD is doing it all wrong.
  1. Don’t do BVH traversal on vector units
  2. DXR 1.0 style PSOs are better than DXR 1.1 ubershaders doing inline RT
While Intel’s approach is very similar to Nvidia they’re doing some things differently that could give them a big advantage. SIMD execution is only 8-wide when running RT meaning fewer and shorter stalls compared to Nvidia’s 32-wide warps. Intel is also sorting rays after each bounce which would further reduce losses due to divergence.

Alchemist could make things really interesting!
 
  • Don’t do BVH traversal on vector units
  • DXR 1.0 style PSOs are better than DXR 1.1 ubershaders doing inline RT
Everyone and their mother knows AMD's approach is minimalistic at best. And their DXR1.1 approach is used only do very modest RT effects that rely on a lot of screen space methods.
 
Having hardware coherency sorting means they're at "level 4" RT hardware straight out the gate? Per Imgtech's definition, vs level 2 for AMD and 3 for Nvidia, that'll be very interesting to see how well it benchmarks even if games are still using 32 wide waves instead of their recommended 8. Going for the widest matrix math units and more advanced RT hardware (on paper at least) vs the competitors is one way to introduce yourself into the market, could be exciting times in the next 3-5 years for GPUs
 
Everyone and their mother knows AMD's approach is minimalistic at best. And their DXR1.1 approach is used only do very modest RT effects that rely on a lot of screen space methods.

DXR1.1 is used in Minecraft and Metro:Exodus EE. But nVidia hasnt any problem with DXR1.1. Interessting that Intel find DXR1.0 more usefull.
 
Having hardware coherency sorting means they're at "level 4" RT hardware straight out the gate?
No. Intel does not saying anything about ray sorting. It's a shader execution grouping.

Per Imgtech's definition, vs level 2 for AMD and 3 for Nvidia
Nvidia doing coherence gathering in their TTU at each traversal step, so it's actually level 4.
https://www.freepatentsonline.com/11157414.html
Imagination didn't know that because patent is not published at that time yet.
 
No. Intel does not saying anything about ray sorting. It's a shader execution grouping.

That's true. Ray sorting improves shading efficiency and has the added benefit of speeding up ray traversal too. Intel seems to be more concerned about shading divergence than traversal though.
 
That's true. Ray sorting improves shading efficiency and has the added benefit of speeding up ray traversal too. Intel seems to be more concerned about shading divergence than traversal though.
They claim their traversal HW is essentially MIMD, so execution divergence is not an issue. BVH and triangle data divergence could still be a problem though.
 
Wow Intel has some really strong opinions on RT architecture and they mostly seem to be saying that AMD is doing it all wrong.
  1. Don’t do BVH traversal on vector units
  2. DXR 1.0 style PSOs are better than DXR 1.1 ubershaders doing inline RT
While Intel’s approach is very similar to Nvidia they’re doing some things differently that could give them a big advantage. SIMD execution is only 8-wide when running RT meaning fewer and shorter stalls compared to Nvidia’s 32-wide warps. Intel is also sorting rays after each bounce which would further reduce losses due to divergence.

Alchemist could make things really interesting!

Intel HW has fixed function dynamic dispatch via BTD (bindless thread dispatch) and their hardware doesn't differentiate between ray generation/any hit/closest hit/miss/intersection shaders so virtually all ray tracing shaders are callable shaders which might be a fairly unique setup specific to their HW ...

On AMD HW, all ray tracing shaders are just compute shaders so it's not a coincidence that they recommend doing inline RT with compute shaders to get the highest performance. The danger behind ubershaders in general is the increased register pressure so combining it with inline RT may negatively impact performance but you trade divergent dispatch between distinct shaders for divergent execution within a shader ...
 
Intel HW has fixed function dynamic dispatch via BTD (bindless thread dispatch) and their hardware doesn't differentiate between ray generation/any hit/closest hit/miss/intersection shaders so virtually all ray tracing shaders are callable shaders which might be a fairly unique setup specific to their HW ...

On AMD HW, all ray tracing shaders are just compute shaders so it's not a coincidence that they recommend doing inline RT with compute shaders to get the highest performance. The danger behind ubershaders in general is the increased register pressure so combining it with inline RT may negatively impact performance but you trade divergent dispatch between distinct shaders for divergent execution within a shader ...

AMD is dealing with the double whammy of divergent traversal and divergent shading on the SIMDs. Their current approach is really only appropriate for very trivial RT scenarios. I can’t imagine they will stick to it for RDNA3.

Intel is doing MIMD traversal and are giving the driver/hardware an opportunity to mitigate divergent dispatch. There really isn’t anything AMD’s driver can do to help with divergence within the shader and they're leaving it up to developers to figure out. It’s not quite a trade off as Intel’s approach seems to be objectively more robust for real RT use cases.
 
AMD is dealing with the double whammy of divergent traversal and divergent shading on the SIMDs. Their current approach is really only appropriate for very trivial RT scenarios. I can’t imagine they will stick to it for RDNA3.

Intel is doing MIMD traversal and are giving the driver/hardware an opportunity to mitigate divergent dispatch. There really isn’t anything AMD’s driver can do to help with divergence within the shader and they're leaving it up to developers to figure out. It’s not quite a trade off as Intel’s approach seems to be objectively more robust for real RT use cases.

With developers, I would not speculate too much on exactly what they're going to do. In the worst case scenario for Intel HW, developers could very well choose to ignore their advice and start hardcoding a wave size of 32 with inline RT as the common instance ...

A possible argument from AMD and others is that dynamic dispatch is the bigger evil compared to divergent SIMD lane execution. Merging similar shaders might very well be more practical compared to incurring the overhead of function calling. Divergent shading starts becoming a bigger issue if you start merging dissimilar shaders which shares little to no codepaths among each other ...
 
Status
Not open for further replies.
Back
Top