Next gen lighting technologies - voxelised, traced, and everything else *spawn*

It's a tight rope of ambitions and compromises. So fixed function HW is always a bit short sighted, because it focus on today's algorithmic solutions, which are not always (actually never are) tomorrow's problems. Now, I'm not against all forms of HW acceleration for current trends, I just think it's necessary to exercise caution on how specific and limiting such architectural choices are.

In Port Royal and Quake 2 the RTX2060 is as fast as or faster than a TitanV. How exactly are RT Cores "short sighted" when you need twice the die size for the same performance?
Fixed Functions units are bringing Raytracing to the mainstream market. Isnt that the opposite of "short sighted"?!
 
In Port Royal and Quake 2 the RTX2060 is as fast as or faster than a TitanV. How exactly are RT Cores "short sighted" when you need twice the die size for the same performance?
Fixed Functions units are bringing Raytracing to the mainstream market. Isnt that the opposite of "short sighted"?!
As said, it brings a short timed push in performance, but on the long run the FF limitations hinder progress to go further. So two years later you would have FF performance anyways, but with existing FF progress is only possible by the hardware vendor.

Practical example:

RT Core is 2 x faster than software on prev gen. (In ideal cases it's more than just 2, but that;s what we see in practice because RT is just a part of work necessary to build a frame)
But RT core is limited to static topology. It is not possible to reduce detail at distance. So even at distance RT Core has to traverse fully detailed mesh, although what we really want IS reduced detail at distance due to microfacets.

With software tracing we can use a data structure that allows to adapt detail at will. Better image quality overall, and much better performance.
Tune it to the performance budget you have and an increase of factor 10 is possible. This is what software can do, but FF makes hard or impossible to achieve.

A speedup of factor 2 is something you expect from every day software optimizations. For FF hardware it is more disappointing than impressive.
The problem is that GPUs have a wide field of application, even within a game. So FF that can do only one specific task needs to be very fast to justify itself because the alternative would a smaller, cheaper chip or more general purpose performance.

Of course it's just the beginning of RT in games and we can't expect everything would be perfect, but just accepting all the issues and being happy is not the way of progress, so it's worth to mention and discuss.
 
Practical example:

RT Core is 2 x faster than software on prev gen. (In ideal cases it's more than just 2, but that;s what we see in practice because RT is just a part of work necessary to build a frame)
But RT core is limited to static topology. It is not possible to reduce detail at distance. So even at distance RT Core has to traverse fully detailed mesh, although what we really want IS reduced detail at distance due to microfacets.
Do you have proof of this claim that is published and that we can see? So you’re saying that BFV would run at 30fps on a compute based RT solution on 2080TI today ?

And that with a software based data structure you’re going to obtain 10x improvement over that resulting in a 5x improvement over a FF hardware today?

I’m just confused on why it hasn’t been done already. If you’re friends at nvidia know about it, why bother with FF. They could have built this directly into GameWorks and have it run on all their hardware.
 
I think the counterargument to that fair suggestion is that ray-tracing is actually bogged down in legacy thinking. It's a visualisation concept first described in 1968, and implemented in 1979. Computation options were limited. Data storage was incredibly limited. The very notion of parallelisable workloads didn't exist - processors were single threaded rather than thousands of integrated cores.

Visualisation can take a step back from all the old ways of doing things, like representing everything as triangles, and see what other options are available, exploring the new paradigms presented by multicore hardware and vast quantities of fast storage. These developments are only years old, not decades old, and the argument would be to keep working on new ideas rather than trying to perfect old ideas. That doesn't mean the new ideas won't gravitate towards to old concepts, but it means they aren't tied to them. The moment the hardware prescribes a way of doing something, R&D for the following years/decades ends up being tied to that. What if SDF for graphics had appeared in 1982...would we be looking at whole games and hardware and tool solutions directed towards that, with decades of research solving the limitations of SDF, while someone starts exploring 'representing models as triangle meshes' and writing their own triangle modeller because none exists in a world of Maya SDF etc?

Offering the most flexible solutions, even if not the most performant, will provide the best opportunities for new, ideal paradigms to develop, which the hardware can then be tailored towards. Offering a fixed approach to a particular solution will instead get better performance for that solution now, but constrict the options being explored for years to come - at least if history and common sense are to be followed. It'll take devs to eschew the HW options, such as MM ignoring the rasterising portions of the GPU while exploring their splatter, to explore other options, whcih is counter-intuitive. You need to get a game out there looking good; the hardware provides XYZ features to do that; let's use those features and create an engine to run well using the hardware provided.
Fixed hardware for the majority of developers and consumers who want speed and compute for those who want to experiment. So, RTX.

The question is: Which game has more realistic lighting? Minecraft has shown the most accurate lighting in this threat. It's the only one shown which has infinite bounces. (Q2 has just one.)
It this regard Minecraft is ground truth. Ignoring the spatial limitation of surface cache (which is not visible to me), a offline pathtracer rendering for hours could NOT do better! It is the most impressive lighting shown here.
What remains incorrect is likely a perfect diffuse material which does not exist in reality, but if there would be perfect cubes in reality, they would look exactly like this.

But if you turn the lack of geometric detail against it, then i could do just the same with Quake 2 RTX. It will drop performance a lot if you try this within BFV, i'm sure of. Or do you believe lies like 'geometric complexity does not matter'? And you need at least 5 bounces to be realistic, not one.


I can only do the Q2 stuff, not BFV! Only triangle raytracing can show exact reflections of triangles, but i use discs to approximate stuff.
The reason you did not see this on PS4 would be: I'm coding too slowly. (In this sense i better close the page for today now... ;D )
Beyond the already discussed limitations of the Minecraft approach, there's another reason why the cheap infinite bounces technique (which is doable with DXR as well) isn't applicable to other games, specially shooters: it's laggy, very laggy:


Exactly, and we've been there, and graphics programming has matured substantially, and it's taken 20 years to do so. And besides sheer bruteforce improvements granted by microchip tech evolution, one of the biggest reason game graphics improved so much were clever Dev aproximations enabled by hw that's more programmable at every gen. The March of progress has been hand in hand with flexibility. Going to overly specific FF solutions goes against the current that has given us the most progress recently.
"Overly specific". It isn't T&L. It's just an extra operation you can do. What you use those rays for is entirely up to you. Lighting, antialiasing, audio, physics... It gives you the performance to try things you couldn't do otherwise. It opens doors, it doesn't close them.

As said, it brings a short timed push in performance, but on the long run the FF limitations hinder progress to go further. So two years later you would have FF performance anyways, but with existing FF progress is only possible by the hardware vendor.

Practical example:

RT Core is 2 x faster than software on prev gen. (In ideal cases it's more than just 2, but that;s what we see in practice because RT is just a part of work necessary to build a frame)
But RT core is limited to static topology. It is not possible to reduce detail at distance. So even at distance RT Core has to traverse fully detailed mesh, although what we really want IS reduced detail at distance due to microfacets.

With software tracing we can use a data structure that allows to adapt detail at will. Better image quality overall, and much better performance.
Tune it to the performance budget you have and an increase of factor 10 is possible. This is what software can do, but FF makes hard or impossible to achieve.

A speedup of factor 2 is something you expect from every day software optimizations. For FF hardware it is more disappointing than impressive.
The problem is that GPUs have a wide field of application, even within a game. So FF that can do only one specific task needs to be very fast to justify itself because the alternative would a smaller, cheaper chip or more general purpose performance.

Of course it's just the beginning of RT in games and we can't expect everything would be perfect, but just accepting all the issues and being happy is not the way of progress, so it's worth to mention and discuss.
You can use alternative data structures with DXR. Use triangles (and therefore the RT cores) where they count and something else for the rest of the scene. That's flexibility.
 
Do you have proof of this claim that is published and that we can see? So you’re saying that BFV would run at 30fps on a compute based RT solution on 2080TI today ?
No and i did not say so, i only made an example based on an arbitary chosen factor of 2, based on given argument TitanV has twice die size vs. 2060 but similar or worse RT performance.
I don't think BFV is possible with compute because it shows sharp and exact reflections of triangles, which would not work with alternative geometry. I don't believe compute tracing triangles is the best idea because they take too much registers.

But what i try to say is: In real life materials sharp reflections are very rare. If we could, we would trace cones not rays to approximate material reflectance better with less noise and less rays. So it makes sense to reduce LOD not only for performance but also for image quality.
If we had a software algorithm that utilizes this it can beat RTX in both performance and IQ, and it fits most materials better, but it can not handle perfect mirrors. Depending on the scene, we would prefer this in most cases still.

... if Navi has no RT cores, i'll work on this. I already know how and it is likely very fast. If it has RT cores i won't, because RT cores need work. It might make sense to combine both to get the best of both worlds eventually.

And that with a software based data structure you’re going to obtain 10x improvement over that resulting in a 5x improvement over a FF hardware today?

As said earlier, if i compare my compute ray numbers for GI on FuryX with given 2080Ti numbers from Remedys video, they pretty much match up. (Hard to compare!)
But this already includes LOD and alternative geometry utilization, so i can't beat RTX only match it. With an old GPU though.
Also, for GI you do not need much accuracy. It is not fair to compare my optimized geometry vs. what RTX achieves with BFV. I try to mention this all the time.

My 10x number comes within an example of typical speedups achieved by software optimization vs. the speedup we see with RTX and FF hardware.
I can still make my stuff faster. It can't saturate Fury yet - i need to add async compute, in need to add shader model 6.0 stuff, i could do everything with fp16 on newer hardware, etc. Low level optimizations still missing.
But the important thing is: I do not need so many rays. Also i don't need denoising. And that's why i'm faster than Q2 and why i can do this with first gen GCN.
(But i do not compete with RTX - it's still useful to add details and can be fully utilized for that. Also the work on denoising presented in Q2 is still as useful if combining with my tech. It's all great! Just nit-picking here :) )

All i say here is under assumption i make various trade-offs, like said restriction about sharp reflections, but also things like ignoring transparency. I sacrifice this 'minor' stuff to achieve an overall better approximation of reality. So i tailor to my specific requirements, which is what games will keep doing in any case.
And all is said only to give examples of what would have happened if there were no RT HW, to illustrate that less performance but more flexibility would have been better IMHO.

If you say: "No! I want perfect sharp reflections at most! And i care a s**t about infinite bounces, or area lights and shadows!" then this still does not necessarily mean i'm wrong, you just talk to the wrong guy. Others would just come up with completely different solutions if they put priorities on this.
And RTX can't do everything either. You still need to choose between GI, or soft shadows, or crisp reflections, or Refractions... Atomic Heart shows the most 'effects', but no complete lighting solution, no dynamic GI. NV showed individual denoiser pass per effect or even per light! Imagine how this sums up!

But i've said this much too often already. Just because of one new guy on the threat it all came up again - my fault, admitted. It's as exhausting for me is it's for you. So i'll no longer self defense - its just rays ;D
 
Beyond the already discussed limitations of the Minecraft approach, there's another reason why the cheap infinite bounces technique (which is doable with DXR as well) isn't applicable to other games, specially shooters: it's laggy, very laggy:
Ther's something important here to seperate:

Dostals work is a diffusion approach. It works by having a large grid with geometry inserted as obstacles.
(... thus inheriting all limitations we know from VCT as light leaking. I tried this too, and it's one of my many failures, because it can't handle thin walls etc. Also i never like static grids. I don't like SDF either for this reason... just to clarify i likely know all limitations you'll point out to me already. I've learned it the hard way.)
The light is than advanced one cell per interation. So if you have cell size of 0.5m, and you do only one Iteration per frame, light moves 30 m/s (?). Speed of light is thus a bit slower than in reality :)
But the beauty is here: It is one of the very few techniques that avoids the need to calculate visibility, which usually is the most expensive part of rendering equation and mostly done by raytracing.

I think the technique is interesting for volumetric lighting. At least i have no better idea than that. I like his work.

EDIT: Lag you'll see often, but in this case here it is worse than in others and it has other reasons.

You can use alternative data structures with DXR. Use triangles (and therefore the RT cores) where they count and something else for the rest of the scene. That's flexibility.
Yep. But the price is high to maintain a second BVH just for a few sharp reflections in the image. If only i could share this. But i'll do so. No choice.
 
But i've said this much too often already. Just because of one new guy on the threat it all came up again - my fault, admitted. It's as exhausting for me is it's for you. So i'll no longer self defense - its just rays ;D
I think I'm trying to get a better grasp of what you're trying to achieve. I apologize if it came out that I was attacking you. I'm just looking at how something could be so easily overlooked.
No and i did not say so, i only made an example based on an arbitary chosen factor of 2, based on given argument TitanV has twice die size vs. 2060 but similar or worse RT performance.
I see where your numbers are coming from.
It may be critically flawed to assume this because Titan V has 640 tensor cores ~ 110 Teraflops of tensor performance (I suppose MADD). Without knowing how Nvidia is leveraging tensor cores (we only make assumption that it's only for ML) for RT processing we could be entirely wrong.
Nvidia specifically restricts DXR to Volta and Turing. Volta without RT cores, but has Tensor Cores. Turning with both.

This might be an important distinction all of us have been missing here.

Perhaps this is a more useful benchmark:
TitanV vs TitanRTX.

https://www.techradar.com/news/last...frame-rates-with-ray-tracing-in-battlefield-v

the Titan V may not have RT cores, but it’s a very powerful GPUwhich actually has more shader units than the Titan RTX: 5,120 as opposed to 4,608, which seemingly helps the cause when it comes to dealing with real-time ray tracing.

That’s only a mere 7 fps slower than the Turing card, a pretty impressive result – but then, as Wccftech (which spotted this) points out, that particular map is a snow-fest, and there aren’t many reflective surfaces around (so it’s less taxing in terms of ray tracing demands).

On the Rotterdam map, the difference between the two cards is more pronounced, with the Titan V hitting 56 fps with ray tracing on high (and ultra details) compared to 81 fps for the Titan RTX. And with ray tracing on medium, the last-gen Titan managed 67 fps compared to 97 fps for the Titan RTX.
So on busier maps, where ray tracing comes into effect, we can see the result of including RT cores now - as the BVH hardware acceleration is going to be necessary here. Approximately ~ (41-45%) improvement.

More scientific testing method between 2080TI RTX/ Titan RTX/ Titan V
https://www.overclock3d.net/news/gp..._rtx_2080_ti_when_ray_tracing_battlefield_v/1
 
Last edited:
Ther's something important here to seperate:

Dostals work is a diffusion approach. It works by having a large grid with geometry inserted as obstacles.
(... thus inheriting all limitations we know from VCT as light leaking. I tried this too, and it's one of my many failures, because it can't handle thin walls etc. Also i never like static grids. I don't like SDF either for this reason... just to clarify i likely know all limitations you'll point out to me already. I've learned it the hard way.)
The light is than advanced one cell per interation. So if you have cell size of 0.5m, and you do only one Iteration per frame, light moves 30 m/s (?). Speed of light is thus a bit slower than in reality :)
But the beauty is here: It is one of the very few techniques that avoids the need to calculate visibility, which usually is the most expensive part of rendering equation and mostly done by raytracing.

I think the technique is interesting for volumetric lighting. At least i have no better idea than that. I like his work.

EDIT: Lag you'll see often, but in this case here it is worse than in others and it has other reasons.


Yep. But the price is high to maintain a second BVH just for a few sharp reflections in the image. If only i could share this. But i'll do so. No choice.
Even in the Minecraft videos you can see a lot of lag. Dostal's video just shows it more clearly.

If detail is not a concern you could just use a lower LOD for the BVH and use it for everything. Ray trace at a low resolution and upsample as well.
 
I apologize if it came out that I was attacking you.
No, no. Never perceived it like that. Discussion is always good, and i can understand your doubt very well. (Not just against my claims, but against RT in games witout HW backing in general) Taking a step back and looking at it from a distance, i even doubt myself too, haha :)
Truth is somewhere in the middle, most likely.
It may be critically flawed to assume this because Titan V has 640 tensor cores ~ 110 Teraflops of tensor performance (I suppose MADD). Without knowing how Nvidia is leveraging tensor cores (we only make assumption that it's only for ML) for RT processing we could be entirely wrong.
Nvidia specifically restricts DXR to Volta and Turing. Volta without RT cores, but has Tensor Cores. Turning with both.

This might be an important distinction all of us have been missing here.
I don't think tensors are related yet. Initially i have assumed NV offers denoising by tensors as GameWorks lib, but it seems that's not done yet. BFV definitively does not use them, and it was the game spuring the TitanV vs. GTX discussion.
BFV will use them for upscaling soon, but no word about denoising. I also was surprised Schieds work uses regular shader cores here. There is no hint tensors have been used for denoising anywhere yet. Likely also not in the older Star Wars demo.
The Star Wars Demo is the only one which allowed GTX 10XX as well. So you can compare nothing vs. advanced compute sheduling of Volta vs. Turing. We discussed this somewhere here too, involving a lot of confusion about native res vs. MLAA, some math errors from me and conflicting benchmark results.
Did they use Tensors for the dancing robot or Porsche? Who knows?
(we only make assumption that it's only for ML)
You can not use them with DX12 or VK, so certainly no game can use them yet other than with NV libs like MLAA. Pretty sure of that. We found Interface for CUDA but nothing else some weeks ago.
 
No, no. Never perceived it like that. Discussion is always good, and i can understand your doubt very well. (Not just against my claims, but against RT in games witout HW backing in general) Taking a step back and looking at it from a distance, i even doubt myself too, haha :)
Truth is somewhere in the middle, most likely.

I don't think tensors are related yet. Initially i have assumed NV offers denoising by tensors as GameWorks lib, but it seems that's not done yet. BFV definitively does not use them, and it was the game spuring the TitanV vs. GTX discussion.
BFV will use them for upscaling soon, but no word about denoising. I also was surprised Schieds work uses regular shader cores here. There is no hint tensors have been used for denoising anywhere yet. Likely also not in the older Star Wars demo.
The Star Wars Demo is the only one which allowed GTX 10XX as well. So you can compare nothing vs. advanced compute sheduling of Volta vs. Turing. We discussed this somewhere here too, involving a lot of confusion about native res vs. MLAA, some math errors from me and conflicting benchmark results.
Did they use Tensors for the dancing robot or Porsche? Who knows?

You can not use them with DX12 or VK, so certainly no game can use them yet other than with NV libs like MLAA. Pretty sure of that. We found Interface for CUDA but nothing else some weeks ago.
DXR API calls to drivers. The drivers can leverage whatever hardware nvidia wants. The developers will not have access to them. I mean, nvidia has screwed over data scientists since they discovered their mistake during Maxwell. 2080TI RTX is has nearly 104 Teraflops of tensor power, but it's 32 accumulate is 0.5. They did this in the past with the previous generations, making sure only Titan series had full rate on certain aspects. But the 2080TI and Titan RTX are very close in video game performance, actual neural network performance training performance is a different story entirely for pro-sumer work.

Here is the video comparing 2080TI vs 2080 vs 1080TI

Effectively 1-3fps vs ~30fps.
 
Last edited:
Even in the Minecraft videos you can see a lot of lag. Dostal's video just shows it more clearly.
No - there's a difference. Personally i have not spotted lag in static Minecraft. But it must be there due to caching. With caching you typically resolve one bounce per frame, so there is lag for all the other infinite bounces of one frame each.
You can also say: After running it for 1000 frames, i have 1000 bounces. Only after infinite frames you have infinite bounces. (We use marketing lies too, ha!) It goes unnoticed because after 5-10 bounces the energy goes below what humans can see, and a bit later it goes below what floating point precision can represent.

But because lighting is mostly static, you can go even further and update changing areas more frequently than static areas. So stochasitc updates. Which is another form of lag then.

You also see Lag by denoising in screenspace, like noise becoming blurry reflections only after some time, or ghosting around characters. That's just the same trick, but here you reconstruct from fragments of information over time (Path Tracing, or Classic Raytracing -> not cache efficient)
In contrast, radiosity method as above typically calculates the complete environment of a sample in one go (can be much more cache efficient), but it does not necessarily update all samples each frame.

So it is more clearly in Dostals video because it simulates how light diffuses over time at reduced speed, while everywhere else the lag is just the price for realtime and light reflects instantly over any distance.

The question is: In which method the lag is most acceptable?
RT Denoising in screenspace? No - causes noise temporary at some spots - drags you out of illusion all the time.
RT Denoising in texturespace? Likely much better but can't fix it - we have not seen yet.
Diffusion in world space? Good as long as no fast action goes on.
Stochastic Radiosity Method? Most acceptable because it changes smoothly. Totally acceptable if you inject direct lighting the traditional way (limited to ugly point lights, but enough for flashlights, car lights, stuff that moves)

Make your choice but lag is unavoidable and the key for anything to be realtime.
 
Effectively 1-3fps vs ~30fps.
Somehow most impressive at 3 fps - you can see it's a lot of work! :D
But there's this: 1080 has no MLAA so renders at native 4K, RTX renders only half the pixels. Some Benchmarks gave the 1080Ti 5 fps, others 10 fps. (just to let you know)
DXR API calls to drivers. The drivers can leverage whatever hardware nvidia wants.
It is not thaaaat blackboxed. You have to implement your denoisers with regular compute shaders, the driver can not translate this to tensors, you'd need to code for this explicitly.
Unfortunately i do not know the cost of denoising. (One could test with Q2) But we likely can expect a future speedup from tensors. Maybe NVs GameWorks relies on Direct ML too, which is not ready yet?
 
No - there's a difference. Personally i have not spotted lag in static Minecraft. But it must be there due to caching. With caching you typically resolve one bounce per frame, so there is lag for all the other infinite bounces of one frame each.
You can also say: After running it for 1000 frames, i have 1000 bounces. Only after infinite frames you have infinite bounces. (We use marketing lies too, ha!) It goes unnoticed because after 5-10 bounces the energy goes below what humans can see, and a bit later it goes below what floating point precision can represent.

But because lighting is mostly static, you can go even further and update changing areas more frequently than static areas. So stochasitc updates. Which is another form of lag then.

You also see Lag by denoising in screenspace, like noise becoming blurry reflections only after some time, or ghosting around characters. That's just the same trick, but here you reconstruct from fragments of information over time (Path Tracing, or Classic Raytracing -> not cache efficient)
In contrast, radiosity method as above typically calculates the complete environment of a sample in one go (can be much more cache efficient), but it does not necessarily update all samples each frame.

So it is more clearly in Dostals video because it simulates how light diffuses over time at reduced speed, while everywhere else the lag is just the price for realtime and light reflects instantly over any distance.

The question is: In which method the lag is most acceptable?
RT Denoising in screenspace? No - causes noise temporary at some spots - drags you out of illusion all the time.
RT Denoising in texturespace? Likely much better but can't fix it - we have not seen yet.
Diffusion in world space? Good as long as no fast action goes on.
Stochastic Radiosity Method? Most acceptable because it changes smoothly. Totally acceptable if you inject direct lighting the traditional way (limited to ugly point lights, but enough for flashlights, car lights, stuff that moves)

Make your choice but lag is unavoidable and the key for anything to be realtime.
The reason why it is less noticeable in Minecraft is because the environments are static. Rapid moving light sources like the one in Dostal's video give it away instantly. It's basically an accumulation buffer for lighting and suffers from the same artifacts as any other accumulation buffer techniques.

In terms of which one is worse, I'd take some noise over extreme lag any day. But that's just me.
 
Somehow most impressive at 3 fps - you can see it's a lot of work! :D
Yea I was still going to say that this was still very impressive.

If we assume hypotheticals for a second here, say Turing was somehow 2x more performant over Pascal. The most we could expect from Turing is 6fps. But in a flat out Titan V vs Titan RTX, we see similar numbers but without RT cores. In heavier scenes we see up to a 50% performance increase with RT cores.

So... in theory a 1080TI with RT cores would have only improved that demo up to 10-15 fps. The tensors need to be making up the remaining amount.
It's not a complete story. Granted. But it's something to consider. Using benchmarks and die size is not enough to calculate relative possible performance. There's just too much we have not yet accounted for.
 
"Overly specific". It isn't T&L. It's just an extra operation you can do. What you use those rays for is entirely up to you. Lighting, antialiasing, audio, physics... It gives you the performance to try things you couldn't do otherwise. It opens doors, it doesn't close them.

It's still specifically about shooting rays through a specific kind of BVH and finding hits against polys.
In the world of different acceleration structures, scene representations, traversal types (and etc.) we live at today, this flavor of classic ray tracing Nvidia chose to accelerate is too specific for modern graphics devs.
 
Practical example:
RT Core is 2 x faster than software on prev gen. [...]

A speedup of factor 2 is something you expect from every day software optimizations. For FF hardware it is more disappointing than impressive.
The problem is that GPUs have a wide field of application, even within a game. So FF that can do only one specific task needs to be very fast to justify itself because the alternative would a smaller, cheaper chip or more general purpose performance.

Of course it's just the beginning of RT in games and we can't expect everything would be perfect, but just accepting all the issues and being happy is not the way of progress, so it's worth to mention and discuss.

RT Cores are up to 10x faster for the specific workload they are designed for. Overall 2x faster on the same process is huge improvement. AMD archived only 30% more performance with 7nm and they have to use 1TB/s to make up for the lack of specific fixed functions units.
I dont see how a smaller and cheaper chip would be beating a RTX2060 when even TitanV has no chance. Fixed Function units are making it possible to reduce processing units and to get the same performance from a smaller, cheaper chip. Now people can enjoy TitanV raytracing performance for $350. Sure, the acceleration part is limited in functionality but does it really matters when the alternative
is out of the reach for 99,9% of the consumer?
 
RT Cores are up to 10x faster for the specific workload they are designed for.
Never seen a 10 times factor in practice, and the specific workload is not always what we need for games.
The goal is not to 'beat' 2060 with a cheaper chip, the goal would have been to achieve similar quality and performance using algorithms tailored to the specific need games require.
Nothing of this is out of reach for consumers or would have been. RT would have happened in any case, i'm sure of that. Not the classical approach from the 80's but something tailored to realtime requirements.
And i know exactly how this 'something' would look like for me - there is no bottleneck, no incoherent memory access - it would be blazingly fast.
The latter is the reason our perspective on this is so different. I'm not sure if i could convince you, if you would be happy with reflections that show only an approximation of the scene. For GI it makes no difference for sure.

Look at the Danger Planet video i've posted some posts above and then look ant Q2 RTX. And tell me what looks better, if there is a signifcant difference that justifies a GPU with three times to cost of 10 years old consoles which did this in real time already. (I'm really interested in your opinions here.)

I won't any longer repeat my opinion here - if we want RTX yes or no depends on our wallets, not our opinions. It's pointless to repeat ourselves without new arguments. But we can still discuss the alternatives, which still make sense even with presence of RTX!
I use RT as well for GI. RT part takes 1-2 milliseconds. I don't need RTX to speed this up - it would only slow down most likely. This is why i am not excited about the 'new era' here, because i am in this era since ten years. This is no arrogance from my side, but of course it affects my point of view.
If there are faster algorithms for GI than pathtracing and denoising, and there are, then we can still use them. Most of them involve RT, some not. (Danger Planet does not)
What we want is to find the most effective combination of tech, now including RTX. RTX alone does not bring photorealims to games - we would have seen this already. It can help a lot, but we need to find out it's strength, which is accuracy, not performance IMO.

(also i'm no hardware enthusiast, forgive me inaccuracies with given example numbers - it does not matter so much for me how the speed up is exactly)
 
I dont see how a smaller and cheaper chip would be beating a RTX2060 when even TitanV has no chance. Fixed Function units are making it possible to reduce processing units and to get the same performance from a smaller, cheaper chip. Now people can enjoy TitanV raytracing performance for $350. Sure, the acceleration part is limited in functionality but does it really matters when the alternative
is out of the reach for 99,9% of the consumer?
The argument is that the short-term gains come with notable long-term losses in reduced R&D and dead-end/restricted algorithm developments.

I don't know whether that's true or not, but hopefully people can understand the argument and talk about it in terms of the choices, rather than just trying to make everything about how to get RT out there now. Hypothetically, if the choices were:

1) Get fixed function HW and realtime RT hybrid games out there now, and lose access to the more efficient Future Methods for 10 years, resulting in slower RT effects in games for 10 years
2) Have a slower introduction of RT technologies that are more flexible and get weaker gains now, but through more flexible solutions gain more significant gains through Future Methods, resulting in significantly faster solutions 5 years from now and going forwards

Which would people choose? And how about different timelines? Iroboto's argument that you need iterative advances to uncover new tech is a strong one, but we also have compute enable exploration in an 'almost fast enough' way without the need for specific hardware, and that could lead to the best game solutions rather than following a 1970s image construction concept.

My theory is that it's fine to put in these RT cores for professional imaging. I'm not sure they're ideal for gaming. It also doesn't matter one way or another - lives aren't in the balance. ;) Worst case, nVidia set a precedent for how acceleration is handled and we get 5 years of slower reflections in games. Big deal. For those working in the field though, I can see them feeling more animated about the choices and implementation.
 
Back
Top