Which does no apparent texture work so why exactly is it a part of a texture processor?
I'm going to give answer this a shot in the dark - break me down. First I'll separate the patent of how it works as listed earlier and then I'll follow up with a snippet that was missed in the snippet but critical:
(1) A shader sends a texture instruction containing ray data and a pointer to the BVH volume node to the texture address unit.
(2) The texture cache processor uses an address provided by
(1) to fetch BVH node data from the cache
(3) The ray intersection engine performs ray-BVH node type intersection testing using the ray data [from
(1)] and the BVH node data [from
(2)]
(4) The intersection testing results and indications for BVH traversal are returned to the shader [original caller from
(1)] via a texture data return path.
(5) The shader reviews the intersection results and the indications to decide how to traverse to the next BVH node
Breakdown
(1) the hybrid approach and using a shader unit to schedule the processing addresses the issues with solely hardware based and/or solely software based solutions.
- So what are the known problems of purely hardware based solutions? We know the issue with purely software based solutions (too slow).
- IIRC As noted by
@JoeJ and other developers
@sebbbi their biggest issue is over control of casting the rays for performance.
- With nvidia's current solution the rays are cast for them, it is an entire black box
-
We have an entire thread about this issue here
(2) Flexibility is preserved since the shader unit can still control the overall calculation and can bypass the fixed function hardware where needed and still get the performance advantage of fixed function hardware.
- Engines and renderers today have been using ray casting for some time; the ability to have a custom intersection shader is something that would allow them to port directly without needing to rework things excessively
https://forum.beyond3d.com/posts/2042744/
-
@JoeJ makes a heavy case for the need to remove restrictions around triangle intersection and many of our senior members and mods (
@Shifty Geezer) have been on the debating side to see a (flexible) based ray tracing solution.
https://forum.beyond3d.com/posts/2088217/
The problem: First RTX GPUs have no support for traversal shader because traversal is fixed function. How to deal with it? Leave first gen GPUs behind? Develop multiple codepaths? (The latter won't work. If we do this, we can not have a game that fully utilizes the new GPUs! The compromise has to be towards the old gen. Period.)
(3) In addition, by utilizing the texture processor infrastructure, large buffer for ray storage and BVH caching are eliminated that are typically required in a hardware ray tracing solution as the existing VGPRS and texture cache can be used in its place, which substantially saves area and complexity of the hardware solution
- And if understood correctly we may see some silicon savings as a result of going with this method.
Lets take these concepts and look at 2 important aspects [f____ck me I need to do actual work instead of this]
a) VRS Tier 2 (or MS version of it)
b) DXR Tier 1.1
VRS Tier 2 - Works well with having better control over ray casting: (
https://forum.beyond3d.com/posts/2093848/)
VRS Tier 2
- Shading rate can be specified on a per-draw-basis, as in Tier 1. It can also be specified by a combination of per-draw-basis, and of:
- Semantic from the per-provoking-vertex, and
- a screenspace image
- Shading rates from the three sources are combined using a set of combiners
- Screen space image tile size is 16x16 or smaller
- Shading rate requested by the app is guaranteed to be delivered exactly (for precision of temporal and other reconstruction filters)
- SV_ShadingRate PS input is supported
- The per-provoking vertex rate, also referred to here as a per-primitive rate, is valid when one viewport is used and SV_ViewportIndex is not written to.
- The per-provoking vertex rate, also referred to as a per-primitive rate, can be used with more than one viewport if the SupportsPerVertexShadingRateWithMultipleViewports cap is marked true. Additionally, in that case, it can be used when SV_ViewportIndex is written to.
Screen Space Image (image-based):
On Tier 2 and higher, pixel shading rate can be specified by a screen-space image.
The screen-space image allows the app to create an “LOD mask” image indicating regions of varying quality,
such as areas which will be covered by motion blur, depth-of-field blur, transparent objects, or HUD UI elements. The resolution of the image is in macroblocks, not the resolution of the render target. In other words, the subsampling data is specified at a granularity of 8x8 or 16x16 pixel tiles as indicated by the VRS tile size.
This is important because developers may be very specifically fine tuning their ray casts for a variety of things to be able to maximize performance in according to using VRS. ie, it makes it a hell of a lot easier to control performance as per above. They can fine tune where on the image they want more or less rays or no rays at all. ie:
why bother with ray casting/rendering where the UI is going to block it. When you consider VRS, look at (1). The shader is the one holding the ray data you want to submit for intersection testing.
This combined with new flexibility found in DXR 1.1
Tier 1.1 implementations also support a variant of raytracing that can be invoked from any shader stage (including compute and graphics shaders), but does not involve any other shaders - instead processing happens logically inline with the calling shader. See
Inline raytracing.
Tier 1.1 implementations also support GPU initiated DispatchRays() via
ExecuteIndirect().
So we see a scenario where in your standard rendering pathways for compute shader, you can freely inline or call ray tracing through executeIndirect, get the results you need and continue forward without having to go back to the CPU.
A bit of overview of Ray Tracing Algorithm in question for those that need a refresher
A ray gets sent out for each pixel in question. The algorithm works out which object gets hit first by the ray and the exact point at which the ray hits the object. This point is called the first point of intersection and the algorithm does two things here: 1) it estimates the incoming light at the point of intersection and 2) combines this information about the incoming light with information about the object that was hit.
1) To estimate what the incoming light looked like at the first point of intersection, the algorithm needs to consider where this light was reflected or refracted from.
2) Specific information about each object is important because objects don’t all have the same properties: they absorb, reflect and refract light in different ways:
...
Savvy readers with some programming knowledge might notice some edge cases here.
[*]Sometimes light rays that get sent out never hit anything. Don’t worry, this is an edge case we can cover easily by measuring for how far a ray has travelled so that we can do additional work on rays that have travelled for too far.
[*]The second edge case covers the opposite situation: light might bounce around so much that it’ll slow down the algorithm, or an infinite number of times, causing an infinite loop. The algorithm keeps track of how many times a ray gets traced after every step and gets terminated after a certain number of reflections. We can justify doing this because every object in the real world absorbs some light, even mirrors. This means that a light ray loses energy (becomes fainter) every time it’s reflected, until it becomes too faint to notice. So even if we could, tracing a ray an arbitrary number of times doesn’t make sense.
I asterisked these points because when you consider
(5) on the patent, it's up to the shader after each cast to determine how to traverse the node next. That should mean they can control when they want to stop bouncing, how many rays they want bounced, possibly controlling what distance the ray should travel before stopping. It appears to me that may be easier to handle with determining your zones with VRS.