DegustatoR
Legend
"Best vs best" kinda make sense when looking at what h/w can do.I still don't get why people are comparing the £1500+ RTX4090 to the 7900XTX.
Doesn't make much sense when choosing what to buy with your cash of course.
"Best vs best" kinda make sense when looking at what h/w can do.I still don't get why people are comparing the £1500+ RTX4090 to the 7900XTX.
Yep, flagship vs flagship ...."Best vs best" kinda make sense when looking at what h/w can do.
when I first confronted the beast in the Blood and Wine expansion, I was awed by the lighting inside the building where the fight takes place. I was equally awed by some other moments in the game where lighting looked so close to real life.^^ there are no lower settings. Its either on or off for ray tracing. The GI is the most expensive one, the other ones much lower. Its also the most visually impressive, depending on the area of course. Then shadows make the largest visual impact. AO has near zero performance hit from i saw. I would say outside of cities, in the nature, GI and shadows make the most visual impact. In the cities, AO might be more noticeable, but i didnt bother to check it individually.
The performance in novigrad and other cities is atrocious.
AO vs No AO and no shadows
I would not replay this game unless you have a 4090, it just runs too bad
I dunno. I am still impressed with how much AMD has improved despite being a generation behind in performance due to not having all the fancy additions Nvidia has(and obviously starting a generation later with RT). Eventually they will get to where Nvidia is although by that point Nvidia will be even higher.When the fps is that low, it doesn't mean much. The most shocking part is the 3090 Ti at 10fps and the 4090 at 45.
How do you optimize Pathtracing for AMD? This is just pure brute force and AMD has a huge amount of less cores than nVidia. Alone the RT Cores doing more work and they are doing it faster than AMDs compute units... And these can run concurrently with the Cuda Cores on nVidia hardware. Then there is the better BVH compression, better async functionality, better overall compute architecture etc...In addition to my previous post, @Dictator seems to think a big part of Portals performance on AMD isn't even on the AMD side but rather how it's not optimized in the first place beyond Nvidia GPUs. So it's probably not even a fair comparison
I don't know. I am just parroting Alex. He would know a lot more than I do about this kind of thing.How do you optimize Pathtracing for AMD? This is just pure brute force and AMD has a huge amount of less cores than nVidia. Alone the RT Cores doing more work and they are doing it faster than AMDs compute units... And these can run concurrently with the Cuda Cores on nVidia hardware. Then there is the better BVH compression, better async functionality, better overall compute architecture etc...
How do you optimize Pathtracing for AMD? This is just pure brute force and AMD has a huge amount of less cores than nVidia. Alone the RT Cores doing more work and they are doing it faster than AMDs compute units... And these can run concurrently with the Cuda Cores on nVidia hardware. Then there is the better BVH compression, better async functionality, better overall compute architecture etc...
In other path traced games like Minecraft RTX/Quake 2 RTX, the 3090Ti is about 40% to 50% faster than 7900XTX.The RESTIR stuff that Portal is using is also very shader heavy. It’s not all about casting rays. It’s very likely that a bit of optimization would greatly benefit RDNA.
Portal RTX: 4K native
6950XT: 0.4
7900XTX: 6
3090Ti: 10
4080: 30
4090: 45
In other path traced games like Minecraft RTX/Quake 2 RTX, the 3090Ti is about 40% to 50% faster than 7900XTX.
And considering the 3090Ti scores 10fps here, I think even after optimizations the 7900XTX would go from 6 fps to maybe 7 fps? The 6950XT would definitely see the greatest benefits here, and would go from sub 1 fps to maybe 3 or 4 fps?
Yeah, but percentages at native 4K none the less, and since DLSS is essential for the experience here, I guess a small percentage at native 4K would yield much bigger gains at native 1080p (DLSS P) or 1440p (DlSS Q).Yeah we’re talking percentages here. Clearly the absolute fps numbers are pointless on most cards, even from Nvidia.
Witcher 3 with just RTXGI on is in the over 200% range. From 119 fps (GPU limit) to 28 fps.
Thank you for measuring in ms, that is indeed very helpful. Great analysis.Curiosity got the better of me. The first scene in the game is a good RT test and I measured the following frametimes on my 3090 @ 3840x1600.
DX12 RT off: 8.1ms
GI: +14.7ms
AO +2.3ms
Shadows: +4.3ms
Reflections: +4.3ms
The GI cost includes all of the fixed RT overhead of BVH construction etc but it is still very, very high. In the OG RTXGI presentation a 2080 Ti needed 2.5ms to trace and update 16,000 probes with 144 rays per probe which is pretty intense. Something is truly borked with RTXGI on PC in TW3.
View attachment 7880
I am going Out on a limb Here and am gonna say the GI cost is normal actually but your number there in milliseconds is predominantly the BVH build/refit cost on GPU which is probably huge in tw3.Curiosity got the better of me. The first scene in the game is a good RT test and I measured the following frametimes on my 3090 @ 3840x1600.
DX12 RT off: 8.1ms
GI: +14.7ms
AO +2.3ms
Shadows: +4.3ms
Reflections: +4.3ms
The GI cost includes all of the fixed RT overhead of BVH construction etc but it is still very, very high. In the OG RTXGI presentation a 2080 Ti needed 2.5ms to trace and update 16,000 probes with 144 rays per probe which is pretty intense. Something is truly borked with RTXGI on PC in TW3.
View attachment 7880
I am going Out on a limb Here and am gonna say the GI cost is normal actually but your number there in milliseconds is predominantly the BVH build/refit cost on GPU which is probably huge in tw3.
If you look at the RT shadow and RT reflection aplication distance, it is massive in tw3. It includes very expenaive things too like trees, grash and bushes with their leaves.
Perhaps not a big deal, but! Since it is DX11 on 12 - it might not use async compute... which means no hiding GPU BVH build/refit costs.
Anyone wanna run a trace to see If async compute is in use? I am getting the feeling it might be a "No".
That’s a solid theory but it doesn’t explain why the cost is still so high indoors with static geo.
Since it is DX11 on 12 - it might not use async compute... which means no hiding GPU BVH build/refit costs. Anyone wanna run a trace to see If async compute is in use? I am getting the feeling it might be a "No".
Edit - Ok someone else went through it with Nsight and yeah, looks like the BVH construction time is the major downside here, up to 15ms on a 5800x3d/3090.
If it weren't for that the performance would be a pretty typical "tacked on Nvidia SDK" RT stuff, not focused on optimization, but workable. Looks like CDPRs quick and dirty 11 on 12 hack has borked performance. I'm guessing the interior/exterior thing doesn't matter because it's a seamless open world (well, per map) and there's still animated characters doing stuff/foliage outside.
BVH build takes 1.14ms on my card so it's not that. It runs async with a lot of other work on the fixed function pipeline and async compute is in full swing so that's not the problem either. According to Nsight the entire RT command list completes in under 3.2ms and that's in parallel with other graphics work.
I'm baffled as to where the extra 14ms is coming from.
View attachment 7892
Definitely not seeing 15ms build times on my card.