Nvidia GeForce RTX 4090 Reviews

Or that in mixed rendering the RT hardware is simply not being taxed as much (in other words RT hardware for mixed rendering is now overkill, excluding Psycho CP2077)? I still think it's more about the workload than anything else. Are there any benchmarks showing energy consumption of mixed rendering versus path tracing? It would be interesting to see if there is a significant difference.
Well there are benchmarks which show 3-4x gains in PT over Ampere but they are synthetics, and such gains don't seem to transition into Q2RTX and Minecraft. Dunno why. When do Portal RTX coming out btw?
 
Scaling seems fine to me. Most of 4090 benchmark results are CPU limited, even on a 5800X3D.
Looking at the 3090 Ti vs 2080 Ti, you appear to be correct. The 3090 Ti has more than twice the shader cores and higher clocks but is about 60-65% than the 2080 Ti I think. Seems Lovelace actually scales better than Ampere compared to its predecessor.
 
Well there are benchmarks which show 3-4x gains in PT over Ampere but they are synthetics, and such gains don't seem to transition into Q2RTX and Minecraft. Dunno why. When do Portal RTX coming out btw?
In the end we are talking about software, which can be more or less efficiently written. None of that software was also built with Ray Tracing in mind originally, so maybe they are just not the most efficient at using the hardware? They may just brute force it through.
 
3DMark's DXR feature test shows a 2,5x improvement over Ampere. But this is a "pure" raytracing test and UL describes the implementation:
Implementation
The test measures the peak ray-traversal performance of the GPU. All other work, such as illumination and post processing, is kept to a minimum. The ray tracing acceleration structure is built only once. As the scene is static
and non-animated, there is no need to update the acceleration structure during the test. The test casts primary rays only. The rays are approximately sorted by direction on the CPU during the test initialization, which is possible because
the sampling pattern in screen space is known beforehand. Generating the optimal ray order during initialization allows more coherent ray traversal for out-of-focus areas without the run-time cost of sorting.

So only primary rays and sorting has mostly happened on the CPU side.
 
3DMark's DXR feature test shows a 2,5x improvement over Ampere. But this is a "pure" raytracing test and UL describes the implementation:


So only primary rays and sorting has mostly happened on the CPU side.
It's also a static and non animated scene, unlike a game, so less taxing. @DegustatoR are those synthetics you talked about like this? No wonder they give much higher results than the games then. It's just a best case scenario, not real world.
 
That's definitely a 3080 10GB card and not the 12GB card?
I mean price wise? 4080/12 fits against 3080Ti while providing 3090Ti(ish) performance level.
The obvious elephant in the room is the fact that 3080/10, 3080/12, 3080Ti are all within 10% of performance from one another while their prices range from $700 to $1200.
Hence why comparisons to 3080/10 seem kinda valid.

It's also a static and non animated scene, unlike a game, so less taxing. @DegustatoR are those synthetics you talked about like this? No wonder they give much higher results than the games then. It's just a best case scenario, not real world.
Yeah. PT games can be CPU limited too I guess?

So we're going to potentially have games with issues like this then as not all implementations are created equally.
Just as it was with DLSS2, FSR2 and likely to be with XeSS.
 
So we're going to potentially have games with issues like this then as not all implementations are created equally.
It sort of makes sense though. DLSS will need to be informed of a camera cut to know when and when not to generate intermediate frames.

Just perform it during a camera view change or a cutscene change. Should be okay to flag this without too much difficulty.
 
It sort of makes sense though. DLSS will need to be informed of a camera cut to know when and when not to generate intermediate frames.

Just perform it during a camera view change or a cutscene change. Should be okay to flag this without too much difficulty.
I think that most games with TAA already have this implemented as otherwise there would be severe ghosting on camera cuts.
 
It sort of makes sense though. DLSS will need to be informed of a camera cut to know when and when not to generate intermediate frames.

Just perform it during a camera view change or a cutscene change. Should be okay to flag this without too much difficulty.
@Dictator did point to this earlier ...
I expect camera cut issues to be an early teething/ per-game Implementation issue - DLSS 3 should have the Info (Motion vectors) based on its design to make informed decisions about camera cuts.

There are flags in DLSS 2.x for example to make it camera cut aware.
 
Maybe not the right place to ask, but given the 4090 reviews and the cpu limitation, what can nVidia do for next generation to help with that ? More hardware offloading of tasks currently done by the CPU ?
Or is more of an engine problem, and they need to be more threaded for a better cpu utilisation, or an API problem, etc ?
 
Maybe not the right place to ask, but given the 4090 reviews and the cpu limitation, what can nVidia do for next generation to help with that ? More hardware offloading of tasks currently done by the CPU ?
Yes, this is the mainstream of all their efforts for the last several years. Everything they add to their GPUs is aiming at lowering CPU part of rendering equation. The big one right now is the BVH building/refitting, and I kinda expected Lovelace to have something on this but alas this doesn't seem to be the case outside of two specific things which should help make BVHs simpler - OMMs and DMMs.

Or is more of an engine problem, and they need to be more threaded for a better cpu utilisation, or an API problem, etc ?
It is also an engine problem of course but mostly a problem of legacy code preventing said engines from taking full advantage of what's possible in new APIs and new h/w. There are examples of well built / remade engines which scale near perfectly even in lower resolutions - IW engine used in CODV is an example.

But we should also account for absolute framerates while looking at this. There are some engines which obviously are using CPUs badly - WDL and FC6 are two prime examples, no CPU can pass 120 fps in WDL for example, no matter the resolution. But there are also engines which become CPU limited at framerates of 200+ fps - and it's unclear if this can even be considered an "issue".
 
or an API problem, etc ?
NVIDIA does have an API problem with DX12, it causes their big GPUs to quickly become CPU limited at under 4K resolutions, getting around this is hard though, as DX12 spawns lots of unnecessary instructions and calls on their hardware. They need to get this sorted out to partially relieve the bottlneck.

Then there is the currently available CPUs which honestly have abysmal single threaded improvements each gen. They are simply not adequate enough.

Then there is the game engines, they often have hard fps caps related to their physics systems and their interactions with CPUs, the engine for Doom Eternal needed to be modified to allow extremely high fps for example. Other games need to do that too, but most are console ports, so developers don't feel the need to allow for more than 200fps in complex games, often even lower than that.

DLSS3 Frame Generation seems like a massive workaround for these issues, but still more is needed to be done.
 
I mean price wise? 4080/12 fits against 3080Ti while providing 3090Ti(ish) performance level.
I'm questioning whether 4080 12GB may be being boosted in NVidia's graphs by games where 10GB 3080 is running out of memory, so was curious whether you knew for sure that NVidia is using the 10GB card.
 
I'm questioning whether 4080 12GB may be being boosted in NVidia's graphs by games where 10GB 3080 is running out of memory, so was curious whether you knew for sure that NVidia is using the 10GB card.
This I don't know but besides Requiem other two titles should not have any issues running in 4K on a 3080/10.
Requiem there is without RT though in which case I think it's also unlikely to cause VRAM issues on 8-10GB cards.
 
Back
Top