Next gen lighting technologies - voxelised, traced, and everything else spawn

OlegSH · Dec 31, 2018

JoeJ said:
i could use the same math and come to a different conclusion

Sure, I took 15% out of total 47%, that's (7 ms/47)*15 = 2.23 ms in my calculations

JoeJ said:
but he has 80 fps average on his TitanV (not knowing what level and such...)

That's the issue, without knowing test scene, settings, etc, we can not compare these results with Turing ones.
I remember there were press drivers with enabled RTX on GTX 1080 Ti (or was it the fallback layer?), the difference was drastic - https://www.ixbt.com/img/x780x600/r30/00/02/11/02/swreflectionsdemo.png
Pretty sure Volta would still lag at least 2x in heavy regimes

JoeJ said:
Can it be that fixed function RT core is not worth it and they would have achieved the same with more shader cores instead?

There is a power wall, it would be problematic to achieve more flops with more shader cores on the same tech process, GV100 is way wider, yet it achieves less flops than high end TU102 SKUs because both chips are power limited
Pretty sure more general SMs would cost way more transistors, frequencies would be lower because of the mentioned power limitation and even with special SM instructions to accellerate ray-triangle intersection tests, it would still be much slower in RTX games. C'mon, NVIDIA has been working on optix for a decade, do you really think they cannot model and calculate such simple things? Give a little bit of credit to them.

JoeJ said:
According to this 2070 is only slightly faster than 1080Ti IIRC

Nope - https://www.ixbt.com/img/r30/00/02/14/58/swreflectionsdemo.png

Scott_Arm · Dec 31, 2018

I never really considered the power limits of the gpu before. Rt cores and tensor cores a way to add relatively low power transistors?

Is that Star Wars thing available freely as a benchmark? Is it customizable at all?

OlegSH · Dec 31, 2018

Scott_Arm said:
Rt cores and tensor cores a way to add relatively low power transistors?

Sure, TCs provide much more flops at the same power due to higher data reuse, RT Cores are much more area and power efficient at ray-triangle intersection tests as these are specialized processors (it would have been insane to make them as wide as standart SMs)

DavidGraham · Jan 1, 2019

OlegSH said:
That's the issue, without knowing test scene, settings, etc, we can not compare these results with Turing ones.

Yes, I feel like without a controlled like for like scenes, the comparison with a Titan V is not really useful. There is also the possibility that the Titan V is not really running "proper" DXR in BFV.

JoeJ said:
but it agrees with similar results about the Star Wars demo, of which i have posted a screenshot earlier here somewhere.

Could you provide that link again? I searched for it and I couldn't find anything on Titan V. However it was known that the Star Wars demo required 4 Titan Vs to run, but only one RTX 2080Ti.

Scott_Arm said:
I'm not sure if this driver fix is complete,

Yeah I read that the new driver fixed this problem, can't find the link though right now.

JoeJ · Jan 1, 2019

DavidGraham said:
Yes, I feel like without a controlled like for like scenes, the comparison with a Titan V is not really useful. There is also the possibility that the Titan V is not really running "proper" DXR in BFV.

He gave quite accurate specs in following post and assured the comparison is fair, other than differing CPUs. He says he is able to tell the difference between RTX and SSR.
I hope for some more guys with Titans and BFV. Maybe a new years joke

DavidGraham said:
Could you provide that link again? I searched for it and I couldn't find anything on Titan V. However it was known that the Star Wars demo required 4 Titan Vs to run, but only one RTX 2080Ti.

https://forum.beyond3d.com/posts/2050012/
Notice the DLSS available only on Turing, older GPUs render at full 4K. And check my math. Maybe i did it wrong and it would fit to the png @OlegSH has posted then.

DavidGraham · Jan 1, 2019

JoeJ said:
https://forum.beyond3d.com/posts/2050012/

But that test doesn't have any Titan V results, am I missing something?

JoeJ said:
Notice the DLSS available only on Turing, older GPUs render at full 4K.

Here are more results:

https://pclab.pl/art78828-20.html

https://pclab.pl/art79052-17.html

JoeJ · Jan 1, 2019

OlegSH said:
Sure, I took 15% out of total 47%, that's (7 ms/47)*15 = 2.23 ms in my calculations

I disagree and get 7 ms of RTX time in total, and 15% tracing is only 1 ms.
But no matter, if we take 2.23 ms tracing cost on turing, and turing is 10 times faster, then the tracing cost alone would be 22ms. Plus 7 ms raster would be even worse, just 30 FPS. But if the 10 times speedup would be true we should get back to the initial 68 fps (or only 15ms).
We differ because you used RTX and i used Titan maybe, but in any case: Turing has not 10 times faster tracing. Otherwise both GPUs would not end up equally fast. We don't need any math for this - it's just obvious.
It seems, Titan with a few hundred compute threads more (?) compensates missing RT cores, or not? I don't see how power targets are related here, other than distracting from the obvious.

OlegSH said:
C'mon, NVIDIA has been working on optix for a decade, do you really think they cannot model and calculate such simple things? Give a little bit of credit to them.

So you think you owe them credit because they are.. awesome??
No! They want my money, and i want to know what i get for it.

A decade of experience... i have that myself, and many others too. That's just necessary at least. And from that i know efficient raytracing is NOT a simple thing. If you think it's about a fast triangle test you are just wrong.
We know GCN has issue to utilize CUs with rasterization. (some insiders know it has insane compute power instead, hihi)
There is a rumor Vega has broken binning rasterizer.
Is it so unbelievable awesome NV could make a 'mistake' too?

Well... could be all fake. I hope it turns out.

JoeJ · Jan 1, 2019

DavidGraham said:
But that test doesn't have any Titan V results, am I missing something?

No Titan, but fer we can compare GTX 10X0 GPUs vs RTX.
I see the tests you posted show similar results.
Now please convert FPS to time, and also consider GTX can NOT do DLSS, so it traces at much higher resolution than Turing. PCGH has pointed this out in text, otherwise i would not have spotted this myself.
You should end up with the same result as i did (work score = real performance). I have ignored the DLSS cost here, also that BVH update is resolution independent, etc. So it's not that worse but still shocking.

You really need to do the math yourself. FPS can be extremely misleading and together with the resolution mismatch just looking at benchmark bars seems unremarkable, but it is not. (assuming i did it correctly - i tend to mistake divisions and multiplications

)

JoeJ said:
i tend to mistake divisions and multiplications

Oh, i'm so sorry. Yes this was my mistake:

2070: 19.8 fps = 50ms per frame x (3.8 x 2.1 res) = 399 work score
1080ti: 10.1 fps = 100ms per frame x (1.4 x 2.5 res) = 350 work score

should be:

2070: 19.8 fps = (3.8 x 2.1 res) / 50ms per frame x = 0.23 work score
1080ti: 10.1 fps = (1.4 x 2.5 res) / 100ms per frame x = 0.035 work score

So RTX is 6.5 times faster here. Makes much more sense! Embrassing...

Apologize the noise, guys! Feel free to clean up some of that BS...

Sigh - both is wrong! all wrong... the 1080 renders at 4K, not the other way around.
I end up with both being equally fast, but i do not dare to post the math... please help! Too much drinks already and better get some sleep... Happy new year!

Malo · Jan 1, 2019

JoeJ said:
Now please convert FPS to time, and also consider GTX can NOT do DLSS, so it traces at much higher resolution than Turing. PCGH has pointed this out in text, otherwise i would not have spotted this myself.
You should end up with the same result as i did (work score = real performance). I have ignored the DLSS cost here, also that BVH update is resolution independent, etc. So it's not that worse but still shocking.

I think the biggest difference between Pascal and Volta is the Tensors, which not only means DLSS but denoising, the latter being a significant factor in RTRT performance, not DLSS. I don't think Pascal gives us any hints as to Titan V performance in the Star Wars demo.

DavidGraham · Jan 1, 2019

JoeJ said:
No Titan, but fer we can compare GTX 10X0 GPUs vs RTX.

Titan V can't be directly compared to Pascal, because Titan V has DLSS and the denoising part of RTX.

In general, a 2080Ti is 35% faster than a 1080Ti, at 4K the 1080Ti scores 4 fps, which means the 2080Ti without DLSS and RT will score 6 fps at best, however it does 33fps with them. A 5.5 folds increase directly from using DLSS + RT acceleration.

If you don't like that math, you can compare the 1080Ti at 1440p to the 2080Ti at 4K DLSS results, though these results are not entirely representative as DLSS still incurs a performance hit on the 2080Ti, and DLSS only improves performance by 25~30%, but we will do it anyway:

1080Ti @1440p: 10fps
2080Ti @4K DLSS: 33 fps

Now add 10fps or more to the 2080Ti to offset the DLSS hit (which is about 30% or more), and you end up with 40fps at the very least, which still means minimum 4 folds increase purely from the RT cores.

pixeljetstream · Jan 1, 2019

Happy new year. Unfortunately pcgh unnecessarily jumped the gun and prematurely made a lot of fuss.
The latest numbers as well as different users trying to compare in Rotterdam map show improved perf for Turing.

Shifty Geezer · Jan 1, 2019

Malo said:
I think the biggest difference between Pascal and Volta is the Tensors, which not only means DLSS but denoising, the latter being a significant factor in RTRT performance, not DLSS. I don't think Pascal gives us any hints as to Titan V performance in the Star Wars demo.

Isn't BFV using Frostbite's own denoising though? Or was that only the other demos like SW?

JoeJ · Jan 1, 2019

Just woke up, head hurts, but...

2070: 19.8 fps = 50ms per frame x (3.8 x 2.1 res) = 399 work score
1080ti: 10.1 fps = 100ms per frame x (1.4 x 2.5 res) = 350 work score

should be simply

2070: 19.8 fps = (1.4 x 2.5 res) / 50ms per frame = 0.07 RT work per frame
1080ti: 10.1 fps = (3.8 x 2.1 res) / 100ms per frame = 0.079 RT work per frame

DavidGraham said:
Titan V can't be directly compared to Pascal, because Titan V has DLSS and the denoising part of RTX.

My goal here is to remove DLSS from the equation, because we don't need tensor cores to upscale reflections. Bilateral filter and TAA will do. Likely as flickery as we saw in the GT demo, but this can be fixed with some extra work which is not expensive.
We could start the obvious 'Do we need tensor cores for games' discussion now, but my long answer is: 'I can't use tensors for games at all yet - neither Cuda nor GameWorks is an option for me.' And my short answer is just no. You don't need NNs to upscale.

Thanks NV, for showing me RT cores are not necessary. I think i understand why those demos are not public, and mostly why GTX GPUs have been disabled from running BFV. Likely those loyal customers better get a new RTX - they surely not want to see upscaled reflections on their already too high res screens.
I now know enough. Targeting consoles anyways, i will focus on compute traced reflections. You showed me it will work, as i have assumed anyways.
Also thanks for showing people that are willing to see, that i'm right with my claim 'we don't need fixed function for RT'. I do not proof it at all to them. You already did.

Draw your own conclusions. I'll remove RTX from my todo list for now. (And work on my math skills instead

)

DavidGraham · Jan 1, 2019

JoeJ said:
for showing me RT cores are not necessary.

I just showed you a minimum 4 folds (400%) increase in performance just due to RT cores! How are they not needed?!

JoeJ said:
2070: 19.8 fps = (1.4 x 2.5 res) / 50ms per frame = 0.07 RT work per frame

Your math here is wrong! You are assuming DLSS is cost free, it's not, it incurs performance hit even when upscaling from 1440p, the hit is variable depending on the original resolution, some have compared it to running the game on 1800p.

Shifty Geezer said:
Isn't BFV using Frostbite's own denoising though? Or was that only the other demos like SW?

Yes they use an inhouse denoising solution.

BRiT · Jan 1, 2019

I'm getting a bit confused in the Terminology and the discussion now. I thought the following were distinct and separate items in hardware, but some of whats written here uses them interchangeably ( @JoeJ ):

Tensor Cores
RayTracing Cores

Here is what NVIDIA says about it: https://developer.nvidia.com/rtx

@DavidGraham , are those two cores physically implimented as separate sections of hardware?

For my sanity and for clarity in discussions can we please keep the terms straight?

DavidGraham · Jan 1, 2019

BRiT said:
@DavidGraham , are those two cores physically implimented as separate sections of hardware?

Yeah, separate specific fixed hardware for both.

Volta has only Tensor cores, which can be used for AI upscale and for AI denoising for ray tracing, which means Volta can do DLSS and denoising at the hardware level for ray tracing, so it can theoretically run DXR although no acceleration will happen, except for the denoising part.

Turing has Tensor cores and Ray Tracing cores (RT), which means it can accelerate BVH, and do DLSS and denoising at the hardware level.

pixeljetstream · Jan 1, 2019

@JoeJ have you not seen SEED's slides, they showcase Turing vs Volta
https://www.ea.com/seed/news/siggraph-2018-picapica-nv-turing

Imo you underestimate the power and area benefits of FF units.

JoeJ · Jan 1, 2019

DavidGraham said:
Your math here is wrong! You are assuming DLSS is cost free

Yes, i assume it is cost free because upscaling can be done cheep. I also assume the BVH building to be related to pixel count, which is wrong too.
But i am only interestend in a coarse ratio to draw my occlusions. As a dev i am am interested in speedups of orders of magnitude, not percentages. I have achieved such speed ups multiple times through the years - it's common in software development.

BRiT said:
I'm getting a bit confused in the Terminology and the discussion now. I thought the following were distinct and separate items in hardware, but some of whats written here uses them interchangeably ( @JoeJ ):

Tensor Cores

RayTracing Cores

Here is what NVIDIA says about it: https://developer.nvidia.com/rtx

Yes they are seperate. Someone here on the forum pointed out they can not be programmed with game APIs, only with CUDA, which rules them out of my interest so i never brought them up.
I'm no AI guy anyways and leave it to others to discuss tensors.

pixeljetstream said:
@JoeJ have you not seen SEED's slides, they showcase Turing vs Volta
https://www.ea.com/seed/news/siggraph-2018-picapica-nv-turing

Imo you underestimate the power and area benefits of FF units.

I have seen anything of this. But what we talk here are real world end results, out of any influences or looking at individual speedups in isolation.

I don not assume those cores to be useless, but for now it just a vendor extension.
RT is the future, part of it. I agree with this, but with my target platforms i am happy to make lower res stuff which works for everything first. (I may change my mind...)

DavidGraham · Jan 1, 2019

JoeJ said:
Yes, i assume it is cost free because upscaling can be done cheep.

It's not unfortunately, latest testing on Final Fantasy 15 reveals it has a hit of about 35% from native 1440p, however it gives 25% more performance than native 4K with TAA.

http://i68.tinypic.com/20fure9.jpg

OlegSH · Jan 1, 2019

JoeJ said:
Plus 7 ms raster would be even worse, just 30 FPS

That matches quite well with 29 FPS I calculated (+- for rounding error)

JoeJ said:
We differ because you used RTX and i used Titan maybe

I was talking about 2560x1440 resolution (I mentioned this several times), while 80 FPS with RTX Ultra can be achieved on Titan V only in 1080p and below, that's 1.78 less rays to trace

Moreover, there is a geometry processing cost (skinning, other transformations, etc both in Raster and RT), which is a constant across all resolutions (mostly, except for adaptive LODs, tesselation and early CS-based backface/frustum/subpixel tris culling in Raster), hence Titan V will lose more as resolution grows (since tracing fraction will grow with resolution), so Titan V can easily be 1.7x slower at 1440p depending on how heavy the ray-tris intersection part is.

2080 Ti is just 1.26x times slower going from 1080p to 1440p and 1.62x slower going from 1440p to 2160p, that's 2x slowdown going from 1080p to 4K.

As for Titan V, there can easily be 4x slowdown because ray-tris intersection part will dominate at higher resolutions, which will result into 2x difference between 2080 Ti and Titan V at higher resolutions

So all this comes down to a simple thing - how heavy the ray-tris intersection part is, obviously, the heavier it's, the more RTX GPUs will win, and the acceleration factor "x-times faster framerate", as I mentioned in the very beginning, comes down to Amdahl's law applied to accelerated part. So no surprises here at all.

Next gen lighting technologies - voxelised, traced, and everything else spawn

OlegSH

Scott_Arm

OlegSH

DavidGraham

JoeJ

DavidGraham

JoeJ

JoeJ

Malo

Yak Mechanicum

DavidGraham

pixeljetstream

Shifty Geezer

uber-Troll!

JoeJ

DavidGraham

BRiT

(>• •)>⌐■-■ (⌐■-■)

DavidGraham

pixeljetstream

JoeJ

DavidGraham

OlegSH

Similar threads

Next gen lighting technologies - voxelised, traced, and everything else *spawn*

Yak Mechanicum

uber-Troll!

(>• •)>⌐■-■ (⌐■-■)

Similar threads

Next gen lighting technologies - voxelised, traced, and everything else spawn