Nvidia GeForce RTX 4090 Reviews

DegustatoR · Oct 12, 2022

Picao84 said:
Or that in mixed rendering the RT hardware is simply not being taxed as much (in other words RT hardware for mixed rendering is now overkill, excluding Psycho CP2077)? I still think it's more about the workload than anything else. Are there any benchmarks showing energy consumption of mixed rendering versus path tracing? It would be interesting to see if there is a significant difference.

Well there are benchmarks which show 3-4x gains in PT over Ampere but they are synthetics, and such gains don't seem to transition into Q2RTX and Minecraft. Dunno why. When do Portal RTX coming out btw?

Below2D · Oct 12, 2022

DegustatoR said:
Scaling seems fine to me. Most of 4090 benchmark results are CPU limited, even on a 5800X3D.

Looking at the 3090 Ti vs 2080 Ti, you appear to be correct. The 3090 Ti has more than twice the shader cores and higher clocks but is about 60-65% than the 2080 Ti I think. Seems Lovelace actually scales better than Ampere compared to its predecessor.

Picao84 · Oct 12, 2022

DegustatoR said:
Well there are benchmarks which show 3-4x gains in PT over Ampere but they are synthetics, and such gains don't seem to transition into Q2RTX and Minecraft. Dunno why. When do Portal RTX coming out btw?

In the end we are talking about software, which can be more or less efficiently written. None of that software was also built with Ray Tracing in mind originally, so maybe they are just not the most efficient at using the hardware? They may just brute force it through.

DavidGraham · Oct 12, 2022

More tests showing 2X scaling in compute from 3090 to 4090

https://twitter.com/x/status/1580051494447243265

troyan · Oct 12, 2022

3DMark's DXR feature test shows a 2,5x improvement over Ampere. But this is a "pure" raytracing test and UL describes the implementation:

Implementation
The test measures the peak ray-traversal performance of the GPU. All other work, such as illumination and post processing, is kept to a minimum. The ray tracing acceleration structure is built only once. As the scene is static
and non-animated, there is no need to update the acceleration structure during the test. The test casts primary rays only. The rays are approximately sorted by direction on the CPU during the test initialization, which is possible because
the sampling pattern in screen space is known beforehand. Generating the optimal ray order during initialization allows more coherent ray traversal for out-of-focus areas without the run-time cost of sorting.

So only primary rays and sorting has mostly happened on the CPU side.

Jawed · Oct 12, 2022

DegustatoR said:
That's as bad as a comparison between a 3080/10 and a 3090Ti this gen

That's definitely a 3080 10GB card and not the 12GB card?

DavidGraham · Oct 12, 2022

davis.anthony said:
Hardware unboxed DLSS3 capture with the comment...... "just don't feed it big changes between frames"

This is misleading. All three solutions have issues if accumulation is not reset as indicated by the APIs when it's necessary. This is most likely an implementation issue from the game. Camera cuts are an example of when the developer must set this flag.

https://twitter.com/x/status/1579863904037265409

Picao84 · Oct 12, 2022

troyan said:
3DMark's DXR feature test shows a 2,5x improvement over Ampere. But this is a "pure" raytracing test and UL describes the implementation:

So only primary rays and sorting has mostly happened on the CPU side.

It's also a static and non animated scene, unlike a game, so less taxing. @DegustatoR are those synthetics you talked about like this? No wonder they give much higher results than the games then. It's just a best case scenario, not real world.

troyan · Oct 12, 2022

Picao84 said:
It's also a static and non animated scene, unlike a game, so less taxing. @DegustatoR are those synthetics you talked about like this? No wonder they give much higher results than the games then. It's just a best case scenario, not real world.

Bulding the accelerator structure is not a problem for nVidia. This happens as async compute workload.

davis.anthony · Oct 12, 2022

DavidGraham said:
https://twitter.com/x/status/1579863904037265409

So we're going to potentially have games with issues like this then as not all implementations are created equally.

DegustatoR · Oct 12, 2022

Jawed said:
That's definitely a 3080 10GB card and not the 12GB card?

I mean price wise? 4080/12 fits against 3080Ti while providing 3090Ti(ish) performance level.
The obvious elephant in the room is the fact that 3080/10, 3080/12, 3080Ti are all within 10% of performance from one another while their prices range from $700 to $1200.
Hence why comparisons to 3080/10 seem kinda valid.

Picao84 said:
It's also a static and non animated scene, unlike a game, so less taxing. @DegustatoR are those synthetics you talked about like this? No wonder they give much higher results than the games then. It's just a best case scenario, not real world.

Yeah. PT games can be CPU limited too I guess?

davis.anthony said:
So we're going to potentially have games with issues like this then as not all implementations are created equally.

Just as it was with DLSS2, FSR2 and likely to be with XeSS.

iroboto · Oct 12, 2022

davis.anthony said:
So we're going to potentially have games with issues like this then as not all implementations are created equally.

It sort of makes sense though. DLSS will need to be informed of a camera cut to know when and when not to generate intermediate frames.

Just perform it during a camera view change or a cutscene change. Should be okay to flag this without too much difficulty.

DegustatoR · Oct 12, 2022

iroboto said:
It sort of makes sense though. DLSS will need to be informed of a camera cut to know when and when not to generate intermediate frames.

Just perform it during a camera view change or a cutscene change. Should be okay to flag this without too much difficulty.

I think that most games with TAA already have this implemented as otherwise there would be severe ghosting on camera cuts.

iroboto · Oct 12, 2022

DegustatoR said:
I think that most games with TAA already have this implemented as otherwise there would be severe ghosting on camera cuts.

good point

Deleted member 2197 · Oct 12, 2022

iroboto said:
It sort of makes sense though. DLSS will need to be informed of a camera cut to know when and when not to generate intermediate frames.

Just perform it during a camera view change or a cutscene change. Should be okay to flag this without too much difficulty.

@Dictator did point to this earlier ...

Dictator said:
I expect camera cut issues to be an early teething/ per-game Implementation issue - DLSS 3 should have the Info (Motion vectors) based on its design to make informed decisions about camera cuts.

There are flags in DLSS 2.x for example to make it camera cut aware.

Rootax · Oct 12, 2022

Maybe not the right place to ask, but given the 4090 reviews and the cpu limitation, what can nVidia do for next generation to help with that ? More hardware offloading of tasks currently done by the CPU ?
Or is more of an engine problem, and they need to be more threaded for a better cpu utilisation, or an API problem, etc ?

DegustatoR · Oct 12, 2022

Rootax said:
Maybe not the right place to ask, but given the 4090 reviews and the cpu limitation, what can nVidia do for next generation to help with that ? More hardware offloading of tasks currently done by the CPU ?

Yes, this is the mainstream of all their efforts for the last several years. Everything they add to their GPUs is aiming at lowering CPU part of rendering equation. The big one right now is the BVH building/refitting, and I kinda expected Lovelace to have something on this but alas this doesn't seem to be the case outside of two specific things which should help make BVHs simpler - OMMs and DMMs.

Rootax said:
Or is more of an engine problem, and they need to be more threaded for a better cpu utilisation, or an API problem, etc ?

It is also an engine problem of course but mostly a problem of legacy code preventing said engines from taking full advantage of what's possible in new APIs and new h/w. There are examples of well built / remade engines which scale near perfectly even in lower resolutions - IW engine used in CODV is an example.

But we should also account for absolute framerates while looking at this. There are some engines which obviously are using CPUs badly - WDL and FC6 are two prime examples, no CPU can pass 120 fps in WDL for example, no matter the resolution. But there are also engines which become CPU limited at framerates of 200+ fps - and it's unclear if this can even be considered an "issue".

DavidGraham · Oct 12, 2022

Rootax said:
or an API problem, etc ?

NVIDIA does have an API problem with DX12, it causes their big GPUs to quickly become CPU limited at under 4K resolutions, getting around this is hard though, as DX12 spawns lots of unnecessary instructions and calls on their hardware. They need to get this sorted out to partially relieve the bottlneck.

Then there is the currently available CPUs which honestly have abysmal single threaded improvements each gen. They are simply not adequate enough.

Then there is the game engines, they often have hard fps caps related to their physics systems and their interactions with CPUs, the engine for Doom Eternal needed to be modified to allow extremely high fps for example. Other games need to do that too, but most are console ports, so developers don't feel the need to allow for more than 200fps in complex games, often even lower than that.

DLSS3 Frame Generation seems like a massive workaround for these issues, but still more is needed to be done.

Jawed · Oct 12, 2022

DegustatoR said:
I mean price wise? 4080/12 fits against 3080Ti while providing 3090Ti(ish) performance level.

I'm questioning whether 4080 12GB may be being boosted in NVidia's graphs by games where 10GB 3080 is running out of memory, so was curious whether you knew for sure that NVidia is using the 10GB card.

DegustatoR · Oct 12, 2022

Jawed said:
I'm questioning whether 4080 12GB may be being boosted in NVidia's graphs by games where 10GB 3080 is running out of memory, so was curious whether you knew for sure that NVidia is using the 10GB card.

This I don't know but besides Requiem other two titles should not have any issues running in 4K on a 3080/10.
Requiem there is without RT though in which case I think it's also unlikely to cause VRAM issues on 8-10GB cards.

Nvidia GeForce RTX 4090 Reviews

DegustatoR

Below2D

Picao84

DavidGraham

troyan

Jawed

DavidGraham

Picao84

troyan

davis.anthony

DegustatoR

iroboto

Daft Funk

DegustatoR

iroboto

Daft Funk

Deleted member 2197

Guest

Rootax

DegustatoR

DavidGraham

Jawed

DegustatoR

Similar threads