Current Generation Games Analysis Technical Discussion [2023] [XBSX|S, PS5, PC]

Status
Not open for further replies.
Some comparisons from me:

Hardware Lumen vs. Full Ray Tracing 4 at DLSS Ultra Performance (720P): https://imgsli.com/MjAyNzUw
Hardware Lumen vs. Full Ray Tracing 5 at DLSS Ultra Performance (720P): https://imgsli.com/MjAyNzUx
Hardware Lumen vs. Full Ray Tracing 6 at DLSS Ultra Performance (720P): https://imgsli.com/MjAyNzU5

What you don't see in the pictures is that the lumen is also more pulsating and restless while standing still.

Even with DLSS Ultra Performance the image quality is mostly good on a 65 inch screen at 2,7 meters distance. This makes it playable for many more GPUs.

More comparison shots with the DLSS Quality mode (1440p) are here:
Wow. Some of these screenshots definately show a generational improvement. 1 is especially impressive.
 
A 4090 only has 85 theoretical FP32 flops. Due to the nature of how Nvidia does its 2xFP32, it is obvious that there will never be a situation where you get anywhere close to 2x the performance from such a solution since games still need decent use of INT32 instructions. It was not some magic way to actually get twice the compute performance in a game. I think I even looked at this a couple weeks ago here and found that 2xFP32 seems to achieve something like 20-25% better performance in some of the better-case scenarios. Which is still good, but still, nothing remotely close to 100%. This alone cuts the actual difference between these two GPU's massively.

The 6900 xt also shares its FP32 pipes with INT32. All of the same caveats apply to both architectures. So no it doesn’t cut the actual difference at all. Somehow this myth that INT32 is free on AMD architectures lives on. I have no idea where it came from.

Beyond that, gaming performance generally doesn't scale with TFLOPS like this anyways. You dont really get 30% more performance from 30% more TFLOPS, even with the same general architecture.

Right.

It's also a weird argument, cuz you're kind of making the case that this 'unoptimized' factor only applies to the highest end parts? If I can demonstrate relatively predictable scaling from a 6700XT to a 6900XT, what does that then suggest about this supposed lack of optimization?

All software is bottlenecked by something whether it’s optimized or not. As long as the 6900 xt is faster at that particular thing (e.g. bandwidth) you will see improved performance. That does not mean the software is efficiently utilizing either card.
 

200W for a 4090 is weird, for sure. I know in a lot of games it won't hit its max, but 200W is definitely low.
This isn’t uncommon. DX12 + Nvidia.

Have you played through gears 4? I wouldn't blame you if not, it's not the best game ever made, but I think you may have a mis-conception about how it looks from screenshots or clips. It's not as consistent artistically as uncharted or doom, but in it's best scenes it definitely looks just as good.
I have seen a lot of footage. The cutscenes tend to hold up well but actual gameplay not as much. And gameplay is what I prefer to judge. Also mediocre doesn't mean bad just to clarify.
 
Last edited:
what if every game going forward does this however? Is the population of AAA software poorly optimized, or are developers just taking hardware to task now. I would likely think the latter.
No, APIs have just gotten very complicated and hard to use efficiently on multiple platforms. Everyone avoids this discussion, but DirectX 12, Vulkan, Unreal Engine, etc. all suck because while they open up much lower level control , they do so without sane defaults.

Back in the day I could design a simple -> moderately complex scene and be assured it ran reasonably well across a wide range of devices. These days everything from GPU vendor to Memory speed plays a huge factor. Even a simple algorithm for occlusion culling might work great on an RTX 3090, but fail hard on a Radeon or a 4090. Why? because Microsoft has largely delegated API extensions/development to NVIDIA and AMD. They have zero interest in making a better experience. They defer to the “experts” who have zero desire to make the playground a safer place.

That is the reason we don’t have a programmable RT pipeline , for example.
 
Compute still requires bandwidth to operate. So is the issue the ALU or the lack of bandwidth to feed the ALU? Does it have 4X more bandwidth?

Secondly, you're redoing lighting and shadow calculations every single frame, the higher the resolution the more compute and bandwidth required.
You're seeing loads never done in previous generation games.
Raytracing and in the end Pathtracing is much more compute heavy than a simple rasterizing engine. In these workloads a 4090 is 3x+ faster than a 6900XT.

And what "load never done in previous generation games" can i see in Starfield? Next generation Pathtracing - something better than Cyberpunk? Next generation Raytracing - something better than Control? It is still a rasterizing engine.
 
Raytracing and in the end Pathtracing is much more compute heavy than a simple rasterizing engine. In these workloads a 4090 is 3x+ faster than a 6900XT.

4090 has additional hardware units to assist with RT in Cyberpunk that use only CUs on AMD.

And what "load never done in previous generation games" can i see in Starfield? Next generation Pathtracing - something better than Cyberpunk? Next generation Raytracing - something better than Control? It is still a rasterizing engine.

Are you sure iroboto was talking about Starfield there? I followed the quote chain back and it becomes ambiguous.

Finally, could you please, please, please keep the worst of the PC GPU vendor warz out of the Gaming forums.
 
Raytracing and in the end Pathtracing is much more compute heavy than a simple rasterizing engine. In these workloads a 4090 is 3x+ faster than a 6900XT.

And what "load never done in previous generation games" can i see in Starfield? Next generation Pathtracing - something better than Cyberpunk? Next generation Raytracing - something better than Control? It is still a rasterizing engine.
DF covers a large list of real time calculations for Starfield. No hardware units are used; so it’s just compute. The major advantages to Nvidias hardware in this case isn’t as massive. While I whole heartedly agree that 4090 has extremely impressive hardware to run path tracing and ray tracing, when you remove that you’re just left with compute units, cache, bandwidth and fixed function units.

I assume that developers would try their best to reduce bandwidth usage for compute so the majority of the optimizations should be around hitting as much cache as possible. Which, could possibly give fairly decent boost to AMD cards with infinity cache.

While I am fairly positive better drivers will come out for Nvidias cards, so I think it’s too early to assume it’s not optimized.

But really, you’re talking about a company that has had an extra year to optimize this title. It runs on 6GB of system ram and 4GB of VRAM. I have a hard time believing this is a poorly optimized game.
 
With Starfield and UE5 the time has come when software is so bad that modern GPUs like Ampere and Lovelace cant be fully used. More and more it is obviously that unoptimized software is holding GPUs back. A 4090 has over 4x the compute performance of a 6900XT and yet it is only 55% faster. At the same time when you fully use a 4090 there is >5x more performance possible (Cyberpunk Pathtracing).

Is it the software or the hardware? Can the cache expansion really make up for the limited increase in bandwidth?
 
This isn’t uncommon. DX12 + Nvidia.


I have seen a lot of footage. The cutscenes tend to hold up well but actual gameplay not as much. And gameplay is what I prefer to judge. Also mediocre doesn't mean bad just to clarify.

200W on a 4090 common? I don't think so. Lots of games you'll see low gpu usage on a 4090 as games are cpu limited. In this pic the 4090 is at 98% gpu usage and only 203W. So it's gpu-limited and only using 200W of 450+W. It's not utilizing the gpu well.
 
Last edited:
Is it the software or the hardware? Can the cache expansion really make up for the limited increase in bandwidth?

It's also something like 2x the memory bandwidth, so with the 55% performance increase number cited it wouldn't be utilizing that advantage either.

But really, you’re talking about a company that has had an extra year to optimize this title. It runs on 6GB of system ram and 4GB of VRAM. I have a hard time believing this is a poorly optimized game.

This has always been my problem with the term "optimization" in that it's used in way that is too generic (also that all optimization choices benefit all scenarios with zero draw backs for others) and assumes zero bias (and that zero bias is even possible). Take the low memory usage as an example, it is likely a by product of a heavy optimization catering to MS and the XBS.
 
200W on a 4090 common? I don't think so. Lots of games you'll see low gpu usage on a 4090 as games are cpu limited. In this pic the 4080 is at 98% gpu usage and only 203W. So it's gpu-limited and only using 200W of 450+W. It's not utilizing the gpu well.
No, but poor utilization and lower power draw in DX12 isn't uncommon. 200 watts being extreme of course. Even on my 1080ti though, DX 12 titles often draw less power and have lower temps despite not being CPU limited. This almost never happens in DX11.
 
Last edited:
I personally would not assume all IHVs have had the same amount of pre-release input or dev influence for IHV-specific path optimisations in a game that was advertised as having "PC exlcusive partnership". The playing field may not exactly level at launch and I would not assume such with such wording. Remember AC Valhalla? An extreme outlier at launch? Remember how it changed over time?
 

200W for a 4090 is weird, for sure. I know in a lot of games it won't hit its max, but 200W is definitely low.

The 200W pic is an anomaly, the person he's replying to is posting Tom's numbers where 4090 is using 270W on average at 1080p. 7900XTX uses 358W at same settings.

The funny thing is that this can be used as a criticism of AMD for using high power at lower resolutions. Which I think is a more valid scenario since the issue has been raised before as one of RDNA3's failings, as in it doesn't power-gate well with lower resolutions and settings like 60fps VSync.

 
I personally would not assume all IHVs have had the same amount of pre-release input or dev influence for IHV-specific path optimisations in a game that was advertised as having "PC exlcusive partnership". The playing field may not exactly level at launch and I would not assume such with such wording. Remember AC Valhalla? An extreme outlier at launch? Remember how it changed over time?
I think it’s still an outlier. Just not as crazy as before but last I checked, AMD GPUs still performed quite a bit better than their NVIDIA counterparts. It’s not the crazy 30% that it was before but it’s what now? 15-20% or something?
 
Last edited:
@gamervivek that optimum tech video is showing cpu-limited scenarios in esports type games with low settings. You’d expect you wouldn’t be drawing anywhere near full power. Like overwatch 2 is pulling over 200w at 1440p low with the gpu utilization at 72% gpu usage, and that’s on an rtx 4080. If star field is hitting hitting full gpu usage on a 4090, regardless of 1080p, that power number should be higher if it’s actually utilizing the gpu well.
 
The 6900 xt also shares its FP32 pipes with INT32. All of the same caveats apply to both architectures. So no it doesn’t cut the actual difference at all. Somehow this myth that INT32 is free on AMD architectures lives on. I have no idea where it came from.

All software is bottlenecked by something whether it’s optimized or not. As long as the 6900 xt is faster at that particular thing (e.g. bandwidth) you will see improved performance. That does not mean the software is efficiently utilizing either card.
Huh? Where did I say INT was free on AMD? :/

The 6900XT does NOT have the same setup as Ampere/Lovelace at all and it's bizarre you're trying to suggest it does. So yes, it massively affects this when you're trying to rely on theoretical TFlop figures.

A 6800XT has 21TF, while the similarly performing 3080 has 30TF. So is this cuz of lack of optimization? But only for the 3080?

Your argument doesn't really make any sense and it just seems like you're confused why theoretical TFlop figures, especially with two very different architectures, dont correlate 1:1 with performance, and somehow thinking that this is proof that games aren't optimized.
 
Huh? Where did I say INT was free on AMD? :/

The 6900XT does NOT have the same setup as Ampere/Lovelace at all and it's bizarre you're trying to suggest it does. So yes, it massively affects this when you're trying to rely on theoretical TFlop figures.

A 6800XT has 21TF, while the similarly performing 3080 has 30TF. So is this cuz of lack of optimization? But only for the 3080?

Your argument doesn't really make any sense and it just seems like you're confused why theoretical TFlop figures, especially with two very different architectures, dont correlate 1:1 with performance, and somehow thinking that this is proof that games aren't optimized.

Please read my post again as you seem to have misunderstood nearly everything I said.
 
Please read my post again as you seem to have misunderstood nearly everything I said.
Well I didn't realize you weren't the original person I was talking to, so my bad.

In that light, I really dont know what you were trying to say then. My response was mainly intended for the arguments troyan was making.
 
Status
Not open for further replies.
Back
Top