Next gen lighting technologies - voxelised, traced, and everything else *spawn*

Work meaning: performance penalties on doing such operations. I'm pretty sure Nvidia isn't trying to redefine what RT is, but offering a more optimized way of doing it. Sure their bulletpoint/PR wording is somewhat hyperbolic, but what new products selling points aren't? We can't fault Nvidia for presenting data points or metrics, when the competition is lagging behind on presenting something to compare it to. I don't believe Nvidia's method will be the end of all of doing things. But a rightful kick in the ass on jump starting RT development and more innovative ways.
I'm totally in agreement with you here.

Some thoughts on the subject by the Graphics Programming Lead at NetherRealm Studios (Motal Kombat and Injustice games):

https://deadvoxels.blogspot.com/2018/08/some-thoughts-re-siggraph-2018.html


All that said, it's still not really clear how a lot of this stuff will work in practice. The announced cards are hardly cheap, and it's still unclear where other IHVs like AMD and Intel will fall in the mix. So it's not like we can count of customers having the hardware in large numbers for quite a while.. which means hybridized pluggable pipelines (ie, working your shadowmap shadows to result in a resolved mask that can be swapped for a high quality raytraced result).

Even then, it's not clear what best practices are for gamedevs to consider for all sorts of common scenarios we encounter daily at the moment. A straightforward example of this to consider would be a non-bald human hero character standing in a forest.
- Raytracing relies on rebuilds/refits of BVH structures to describe the dynamic elements in the scene but its certainly not clear how best to manage that, and it seems that currently no one's really sure.
- Do you separate your dynamic elements into a separate BVH from your statics, to reduce the refit burden? But that means needing to double your ray-testing... probably everywhere.
- Presumably the BVH needs to reside in video memory for the raytracing hardware to be effective, but what's the practical memory consumption expected? How much memory do I have for everything else? Is it fair to assume walking an system memory based BVH is something of a disaster? Given the memory reclamation that can happen to an app, I presume one must ensure a BVH can never exceed 50% of total video memory.
- There's some minor allowance for LOD-ish things via ray-test flags, but what are the implications of even using this feature? How much more incoherent do I end up if my individual rays have to decide LOD? Better yet, my ray needs to scan different LODs based on distance from ray origin (or perhaps distance from camera), but those are LODs *in the BVH*, so how do I limit what the ray tests as the ray gets further away? Do I spawn multiple "sub-rays" (line segments along the ray) and given them different non-overlapping range cutoffs, each targetting different LOD masks? Is that reasonable to do, or devastatingly stupid? How does this affect my ray-intersection budget? How does this affect scheduling? Do I fire all LOD's rays for testing at the same time, or so I only fire them as each descending LOD's ray fails to intersect the scene?
- How do we best deal with texture masking? Currently hair and leaves are almost certainly masked, and really fine grain primitives almost certainly have to be. I suspect that while it's supported, manual intersection shaders that need to evaluate the mask are best avoided if at all possible for optimal performance. Should we tessellate out the mask wherever possible? That might sound nice, but could easily turn into a memory consuming disaster (and keep in mind, the BVH isn't memory free, and updating it isn't performance free either). It might be tempting to move hair to a spline definition like the film guys do, but that's likely just not practical as things still have to interop well with rasterization and updating a few hundred thousand splines, or building an implicit surface intersection shader to infer the intersections doesn't sound like fun (well, actually it does, but that's besides the point).
- Even something like a field of grass becomes hugely problematic, as every blade is presumably moving and there are potentially millions of the little bastards in a fairly small space. It's basically just green short hair for the ground. Maybe it ends up procedurally defined as suggested before and resolved in an intersection shader, but again... confusing stuff to deal with.

Or maybe these cases get punted on. That would be disappointing, but certainly simplifies things. We rasterize a gbuffer initially and when we need to spawn rays, we just assume our forest is barren, grass is missing, and our character is bald. We correct for these mistakes via current methods, which are hardly perfect, but better than nothing. This makes things a lot more complicated, though:
- You can drop leaves from the trees for shadow casting, but then you're going to still need leaf shadows from some processes - presumably shadowmapping. How do you make the two match up (since presumably raytracing devastates the SM quality comparison)?
- Maybe for AO you trace a near field and far field, and for near field you ignore the leaves and for far field you use an opaque coarse leaf proxy? Maybe this can work for shadows as well in certain cases if you apply the non-overlapping range-ray idea mentioned earlier, assuming they're going to get softened anyway?

There are all sorts of other problems too, related to BVH generation...
- Say I've got a humanoid, and build the BVH against a T-pose initially. How does the refit handle triangles massively changing orientation? How well does it handle self-intersection (which sadly, happens more than we might like)? What happens when my character is attempting to dodge and rolls into a ball to jump out of the way? Do these degenerate cases cause spikes, as the BVH degrades and more triangles end up getting tested? Does my performance wildly fluctuate as my character animates due to these issues?
- If I have an open world game, how do I stream in the world geometry? At some point in the future, when the BVH format is locked down and thus directly authored to, maybe this becomes straightforward, but for now... yikes. Does one have to rethink their entire streaming logic? Maybe a BVH per sector (assuming that's even how you divide the world), although that causes all sorts of redundant ray-fires. Maybe you manually nest BVHs by cheating - use the custom intersection from a top level BVH to choose from amongst which of lower BVHs to intersect, so that you can have disparate BVHs but don't have to rayfire from the top-most level? Who knows?

Partial support is certainly better than zero support, but is raytracing as sexy of a solution when it fails to consider your hero character and/or other dynamics? There's an obvious desire for everything's appearance to be unified, but it wasn't soooo long ago that having entirely different appearance solutions for the world and for characters was the norm, and that the focus was primarily on a more believable looking world of mostly static elements (say, all the original radiosity-lightmapped games). Even now there tends to be a push to compromise on the indirect lighting quality of dynamics on the assumption they're a minority of the scene. Perhaps a temporary step backwards is acceptable for an interlude, or maybe that horse has already left the barn?

This post might all sound really negative, but its really not meant to be that way - raytracing for practical realtime scenarios is still in its infancy and its just not realistic for every problem to be solved (or at least, not solved well) on day-1. To make matters worse, in many cases while working with gamedevs is clearly a good call, it's certainly a big responsibility of nVidia and other IHVs to not prematurely lock down the raytracing pipeline to one way of working simply because they talked to one specific dev-house who have a particular way of thinking and/or dealing with things.

More a the link
 
Last edited:
Which is rather obvious given that the RT implementation in BF5 will be tailored to RTX's HW implementation. But this was not the subject of the discussion..which was Nvidia's misleading metrics to compare GPU architectures, claims that "it just works" and that before Turing real-time RT was impossible..
I get where your argument is going; some considerations:

6n7A81l.png


According to Jensen there are 3 separate processors on this chip
The Turing SM containing 3 more TF of power than a Titan X.
The RT Core that can produce 10 Giga rays/sec
The Tensor core at 110 TF at FP16
To put things into perspective. Titan X is 12B transistors. Turing is 18.9B

His only fallacy here is to compare the whole 1080TI vs _just_ the RT Core in ray tracing ability. As we have no way to measure how that would work...
except he provided some context to his words:
l3wJeRw.png


With the Light green bar being RT.
Dark green bar being Rasterization
S being shading.

You said his claim was, before Turing, RT was impossible. In the story that Jensen tells, he is essentially saying, DGX (4 Voltas) rendered the star wars vid at 55ms, approximately $66K worth of hardware.

A single 2080TI did it faster at a sub $1000 price point.
1080TI coming in at 308ms.

I don't think his claim is completely out to lunch here here. If these are the benchmarks, Turing has accomplished something that could not previously be accomplished when you consider the cost points.
 
I get where your argument is going; some considerations:

6n7A81l.png


According to Jensen there are 3 separate processors on this chip
The Turing SM containing 3 more TF of power than a Titan X.
The RT Core that can produce 10 Giga rays/sec
The Tensor core at 110 TF at FP16
To put things into perspective. Titan X is 12B transistors. Turing is 18.9B

His only fallacy here is to compare the whole 1080TI vs _just_ the RT Core in ray tracing ability. As we have no way to measure how that would work...
except he provided some context to his words:
l3wJeRw.png


With the Light green bar being RT.
Dark green bar being Rasterization
S being shading.

You said his claim was, before Turing, RT was impossible. In the story that Jensen tells, he is essentially saying, DGX (4 Voltas) rendered the star wars vid at 55ms, approximately $66K worth of hardware.

A single 2080TI did it faster at a sub $1000 price point.
1080TI coming in at 308ms.

I don't think his claim is completely out to lunch here here. If these are the benchmarks, Turing has accomplished something that could not previously be accomplished when you consider the cost points.

More points to ad to the discussion:

- What Jensen publicly says is often false or misleading or GREATLY exaggerated. Case in point during the same keynote he claimed 3 things that are factually false: Nvidia invented TAA (False), Nvidia Invented the GPU (False), Nvidia invented the first RT GPU (False). So what ever he says about anything should be taken with a heavy dose of salt. But then again some well known bozo recently said that "Truth it not the Truth" apparently..so maybe that's that.. Anyway that has always been Jensen thing since forever. Met him twice in the early 2000s..charming dude but full of hot hyperbolic s...
- Star Wars Reflections demo didn't absolutely "require" 4 Volta's at the time of GDC and it was obviously optimised since then. So yeah it now runs on one Turing GPU (he never specified the SKU)..but it most probably doesn't need 4 Volta's either.

Turing is a great friggin good GPU. But in terms of RT & the usability of this feature for games today it's the equivalent of the T&L implementation on the GeForce256 (the supposedly first GPU ever according to Jensen..).

Just don't take any company's marketing word for granted (especially Nvidia who has build it's public image with grand claims of supposedly earth shattering groundbreaking Guiness Book records setting inventions..)
 
Last edited:
Turing is a great friggin good GPU. But in terms of RT & the usability of this feature for games today it's the equivalent of the T&L implementation on the GeForce256 (the supposedly first GPU ever according to Jensen..).

Thiis where I am. The conventional GPU is stellar; a great improvement over the GTX10xx but the rest is just selling the promise of what the technology will deliver in x number of generations when a) more widespread, and b) more refined.
 
More points to ad to the discussion:

- What Jensen publicly says is often false or misleading or GREATLY exaggerated. Case in point during the same keynote he claimed 3 things that are factually false: Nvidia invented TAA (False), Nvidia Invented the GPU (False), Nvidia invented the first RT GPU (False). So what ever he says about anything should be taken with a heavy dose of salt. But then again some well known bozo recently said that "Truth it not the Truth" apparently..so maybe that's that.. Anyway that has always been Jensen thing since forever. Met him twice in the early 2000s..charming dude but full of hot hyperbolique s...
- Star Wars Reflections demo didn't absolutely "require" 4 Volta's at the time of GDC and it was obviously optimised since then. So yeah it now runs on one Turing GPU (he never specified the SKU)..but it most probably doesn't need 4 Volta's either.

Turing is a great friggin good GPU. But in terms of RT & the usability of this feature for games today it's the equivalent of the T&L implementation on the GeForce256 (the supposedly first GPU ever according to Jensen..).

Just don't take any company's marketing word for granted (especially Nvidia who has build it's public image with grand claims of supposedly earth shattering groundbreaking Guiness Book records setting inventions..)
You’ve done a great job establishing precedent here.

But to the original commentary, in a few months time we will have real benchmarks on DXR titles and there will be comparisons of AAA titles of pascal vs Turing. If the performance gap is massive then how valid is your original post of DXR can still be supported on non RT hardware.

You’ve made a strong argument that marketing could be entirely BS and that we shouldnt trust it. But End of the day we as consumers only play the end product. We can go to the moon to look for reasons why pascal perhaps will not perform as well as Turing, but end of the day is how those games perform with RT on.

And that’s the crux of my argument, we can debate and question marketing information, but Real game benchmarks will be the ultimate end game for this discussion. Unless pascal is able to keep up with Turing in RT applications, your points will invalidate on their own.
 
Ok so we have some actual numbers rather than "look this bar is over twice as big on the bestest scale" marketing guff. So they're saying in this slide that for evaluating N Samples Per Pixel (?) Turing up to five times as fast, I'm pretty unfamiliar with RT beyond the basics but is SPP a std perf measure for RT? I understood RT to be about rays drawn from light emitters and bounces (GigaRays) but why are they talking about SPP? Is this related to a particular RT implementation or does each pixel after the various Rays are calculated get a sample table that says say "4 Rays from Source A + 2 Rays from Source B = Blue light"?
 
Thanks for the link. It does reveal much about the issues/tradeoffs involved in ray-tracing and helps bring together what Nvidia hopes to achieve with parts of their new architecture.
 
While I do agree that demoscene productions are awesome (and I even have two releases myself, one of them being real-time ray traced), I do not agree that they can be directly compared to what DXR does. DXR traces polygon soups, which is much harder and intensive than sphere-tracing primitive geometries like spheres and cubes. There is a very important difference here. =)
 
While I do agree that demoscene productions are awesome (and I even have two releases myself, one of them being real-time ray traced), I do not agree that they can be directly compared to what DXR does. DXR traces polygon soups, which is much harder and intensive than sphere-tracing primitive geometries like spheres and cubes. There is a very important difference here. =)
DXR can be used for sphere tracing with custom intersector shaders.
 
DXR can be used for sphere tracing with custom intersector shaders.

Yes, certainly. I was implying that we can't compare performance of sphere-tracing primitives against freely modeled big polygonal scenes. Anyway, as is already agreed, the (sort of) standardization brought by DXR will allow more devs to bring new ideas, and we will see a faster progress now. I think that in a few years ray tracing will become the standard and we will never look back to pure rasterization (or I hope so, at least).

Eventually we'll switch to ray traced voxels and eliminate polygons, I guess.

This all sounds so awesome, much like when the industry switched from CPU only to the first GPUs!
 
While I do agree that demoscene productions are awesome (and I even have two releases myself, one of them being real-time ray traced), I do not agree that they can be directly compared to what DXR does. DXR traces polygon soups, which is much harder and intensive than sphere-tracing primitive geometries like spheres and cubes. There is a very important difference here. =)

Except that the above linked '5 faces' IS raytracing polygon soups...
 
Developer Q & A -- Metro Exodus dev talks Nvidia RTX: How ray tracing will speed up development and make life harder for monsters
August 24, 2018
There’s a lot more I can’t tell you about just yet, but one man who can is 4A Games’ rendering programmer Ben Archard, who I sat down with earlier this week to chat about all things RTX in upcoming post-apocalyptic, train adventure Metro Exodus. We talked about everything from 4A’s RTX performance targets to how it could potentially change the course of game development as we know it.
https://www.rockpapershotgun.com/20...-rtx-2080-ray-tracing-metro-exodus-interview/
 
While I do agree that demoscene productions are awesome (and I even have two releases myself, one of them being real-time ray traced), I do not agree that they can be directly compared to what DXR does. DXR traces polygon soups, which is much harder and intensive than sphere-tracing primitive geometries like spheres and cubes. There is a very important difference here. =)

In most ray tracing methods, rays are traced out from the camera. The basic reason is that the distribution of pixels seen by the camera is highly skewed, favoring nearby objects. In fact, in a basic landscape scene, the distribution of the area projected to given pixels is asymptotic! This is worse than exponential! This is the "teapot in a stadium" problem.

There are a great many ways of dealing with this, but the most simple and general solution that "just works" for a wide variety of scenes is to just shoot rays out from the camera, and bounce them toward lights to see which ones are occluded , as well as to follow reflections from shiny surfaces, calculate ambient occlusion (here we assume that there is light in all directions, and cast rays out to see how much of that is blocked by nearby objects), and so forth. Now the important metric is how many rays we can shoot out for each pixel. The more rays, the better the scene is sampled. Thus samples per pixel.
 
With very low sample counts per pixel how do you decide which lights to test?
The current approach is, use as many of the importance sampling techniques developed by CGI in the past years, use heavy dithering and rely on spacial filtering and temporal reprojection on top of it to clean it all up.
 
Back
Top