"next generation RDNA architecture" with hardware accelerated ray tracing

Status
Not open for further replies.

Alexko

Veteran
Subscriber
Are you saying that MS "gave" or "helped" AMD to implement Ray Tracing in their hardware by donating MS patents to AMD?

For semi-custom designs, customers have the opportunity to provide their own IP blocks to be integrated into the final SoC, but they remain the owners of the IP, and AMD can't use it for other products (at least not without a separate agreement). That's not to say that this is what's happening for RT on the next Xbox, but it's a possibility.
 

DmitryKo

Regular
looks interesting as it doesn't rely on dedicated hardware like nVidia, but rather makes it up to developers how many texture units should be dedicated to the purpose.
There is dedicated hardware for loading BVH tree nodes and testing ray intersections.
It reuses control flow logic from shader units to the TMUs, as well as their caches and memory buses.
It also makes it possible to bypass fixed function intersection testing with shader code in some cases.

OTOH, it seems like BVH tree is built on the host CPU, just like NVidia RTX.
 
There is dedicated hardware for loading BVH tree nodes and testing ray intersections.
It reuses control flow logic from shader units to the TMUs, as well as their caches and memory buses.
It also makes it possible to bypass fixed function intersection testing with shader code in some cases.

OTOH, it seems like BVH tree is built on the host CPU, just like NVidia RTX.

The reuse seems neat, though too many mentions of "fixed function" for my taste. Already went through all that with rasterization and the more programmable you made it the more devs were able to do. What if you want to do b-splines for hair? Spheres for bounding, they take less time to rebuild the acceleration tree, and that's already a major bottleneck/active area of research.

Still, overall it's just speculation. Is this how AMD does it, is it something completely different, is it two different ways for Sony and MS, did MS do their own functional block design and are they having it included in XSX (just typing that me think of middleschoolers making an online name). We'll find out, but what about the other things RDNA 2.0 will probably have? MS just updated their Direct X specs to include Mesh and Amplification shaders, I'd bet for reasons involving their console. Variable rate shading in RDNA 2 and both consoles is a given, but it'd be interesting to see if Mesh Shaders were absent from the PS5 (bad news for Sony AND consumers).
 

JoeJ

Veteran
but it'd be interesting to see if Mesh Shaders were absent from the PS5
Ahhm... what's the advantage of mesh shaders? I fail to understand.

My thinking is like: I want displacement mapping, progressive meshes, or any other stuff related to LOD, requiring dynamic meshes.
But for that, isn't it more attractive to generate indices / vertices in compute, reuse them over multiple frames and draw them indirectly?
With mesh shaders i'd need to do the same work every frame. Does not feel attractive, aside from memory savings. What do i miss here?
 

Rootax

Veteran
Ahhm... what's the advantage of mesh shaders? I fail to understand.

My thinking is like: I want displacement mapping, progressive meshes, or any other stuff related to LOD, requiring dynamic meshes.
But for that, isn't it more attractive to generate indices / vertices in compute, reuse them over multiple frames and draw them indirectly?
With mesh shaders i'd need to do the same work every frame. Does not feel attractive, aside from memory savings. What do i miss here?

It's way to technical for me, but I saw a few devs on twitter, like Sebastian Aaltonen, being very happy with mesh shaders (more explanations in the replies) :
 

JoeJ

Veteran
Yeah... getting rid of VS and tessellation stuff is great. This really feels restricted and i never liked any of this. But that's no game changer and does not address my argument of repetive and redunant per frame work.

iq's comment is interesting: "It's a nice move towards removing more rasterization-specific hardware (input assembler) and move us all to pure compute and tracing. I like it, but it also means at some point drawing a triangle is going to require understanding groups, threads, barriers, and complex systems..."

To me it feels more the opposite. Understanding groups / barriers etc. is the foundation of parallel programming and has to be understood, while isolated thread abstractions like VS, PS, GS and now RT always feel restricting and wrong to me.
And ofc Mesh Shaders are incompatible with RT. Actually his tracing argument would be more in line with my proposal to cache mesh data in main memory because RT requires this anyways.

So i still feel like missing something, and i assume it would be no big issue if some future hardware lacks the feature.
 

Leovinus

Newcomer
There is dedicated hardware for loading BVH tree nodes and testing ray intersections.
It reuses control flow logic from shader units to the TMUs, as well as their caches and memory buses.
It also makes it possible to bypass fixed function intersection testing with shader code in some cases.

OTOH, it seems like BVH tree is built on the host CPU, just like NVidia RTX.

Excuse my wording. There is obviously fixed functions. What I meant to convey is that it is not divorced from the rest of the hardware as is the case with nVidia. As I understand it, and granted I'm a layman, the fixed hardware is built into the TMUs themselves as a hybridised solution. Offering a somewhat more unified approach than nVidia's fixed hardware implementation. By which logic any increase or decrease in TMU units would suppose a linear increase or decrease in RT capabilities. Additionally, as the RT functions can be engaged or bypassed at will, it's up to the developer how much of the cards resources should be dedicated to RT. Making for increased flexibility.

I.e. a card based on this architecture would have RT capabilities regardless of SKU and practicality (say a low power mobile chip). It would only ever have more, or less. As I understand the patent.
 

DmitryKo

Regular
any increase or decrease in TMU units would suppose a linear increase or decrease in RT capabilities.
Intersection testing does use the massive memory bandwidth and large caches of the fixed function texture filtering units, but the patent does not indicate any particular ratio between TMUs and RT ALUs.
I read it that RT units are implemented as separate blocks on the same memory and control bus - so different SKUs may have arbitrary numbers of TMUs and RT units.
 

JoeJ

Veteran
Additionally, as the RT functions can be engaged or bypassed at will, it's up to the developer how much of the cards resources should be dedicated to RT. Making for increased flexibility.
I don't think so. RT 'blocks', or 'cores' take chip area and only serve one purpose. The developer can only adapt to what's given here.
But increased flexibility is given from the option to have programmable traversal, which first gen RTX probably lacks.
This probably allows e.g. stochastic continuous LOD transition from simple discrete models. So we can scale to GPU power, performance targets, various scene complexity, etc. Super important to make RT practical.
On the other hand this blocks CUs from being used for other tasks during traversal. Could be solved with additional FF blocks that handle 'default' traversal. The patent dos not rule such options out.
 

DmitryKo

Regular
what's the advantage of mesh shaders? I fail to understand.
Basically, massively parallel processing with compute shader-like scheduling that is not limited by issue rates of fixed-function geometry hardware.
https://forum.beyond3d.com/threads/direct3d-mesh-shaders.61286/

ofc Mesh Shaders are incompatible with RT.
Mesh Shaders do not run in the raytracing pipeline, they are in the traditional rasterization pipeline just like vertex shaders.

As for meshlets, they're just lists of vertices/primitives which still resolve to individual triangles, and it's rather easy to convert them into traditional index buffers. So I guess it's not a problem to directly consume meshlets for BVH tree creation and ray-triangle look-ups.
 
Last edited:

JoeJ

Veteran
Ah, thanks. I have missed the video of the presentation and after watching this i can see it's a win in any case, even with RT or presistant LOD processing in mind.
Pretty great :)

But what i'm really excited about is how close this is to GPU driving it's own work. They would only need to add support for wider workgroups, and then we could dispatch compute from compute and pass data efficiently through on chip memory.
I could save at least half of BW this way, and get rid of my small workload problems, together with indirect dispatches that end up doing nothing.

Guess this great future is not that far away... :D
 
Ahhm... what's the advantage of mesh shaders? I fail to understand.

My thinking is like: I want displacement mapping, progressive meshes, or any other stuff related to LOD, requiring dynamic meshes.
But for that, isn't it more attractive to generate indices / vertices in compute, reuse them over multiple frames and draw them indirectly?
With mesh shaders i'd need to do the same work every frame. Does not feel attractive, aside from memory savings. What do i miss here?

I began writing, then realized I'd seen an MS blog post explaining this exact thing in far more detail than a forum post should even be: https://devblogs.microsoft.com/dire...on-shaders-reinventing-the-geometry-pipeline/

No digging through videos and tweets if anyone wants, less sync and more polys are always cool.
 
Last edited:

jlippo

Veteran
Ah, thanks. I have missed the video of the presentation and after watching this i can see it's a win in any case, even with RT or presistant LOD processing in mind.
Pretty great :)

But what i'm really excited about is how close this is to GPU driving it's own work. They would only need to add support for wider workgroups, and then we could dispatch compute from compute and pass data efficiently through on chip memory.
I could save at least half of BW this way, and get rid of my small workload problems, together with indirect dispatches that end up doing nothing.

Guess this great future is not that far away... :D
There was also nice videoblog on writing simple Vulkan renderer which had mesh shader pipeline. (From Arseny Kapoulkine with sources in Github etc.)
(Episode 5 starts with Mesh Shaders.)
 
Status
Not open for further replies.
Top