CPU Limited Games, When Are They Going To End?

you want to use more CPU and decided to make buildings in your game destructible .. You can't just make an option to make buildings indestrutible on older computers
This is understandable, but the subject at hand is related to scalable things, view distance is very very scalable, much more than most things in graphics, users shouldn't have their 4090 sleep on the job (40% utilization, and downclocked), just because that user selected max view distance settings in ArmA 3, and then gets 46fps in the end!


Or this user shouldn't drop from 60fps in World of Warcraft, to literally 10fps, just because he managed to make the game render at max view distance.


GPU Physics and dynamic destruction are dead features now despite being alive a decade ago, massive view distances are dead too, but not because they are not scalable, because something gotta give in the way we code games.

A couple of years ago, Ubisoft migrated two of their DX11 games to Vulkan to improve their CPU utilization and overall game performance, they successfully did so, Ghost Recon Breakpoint and Rainbow Six Siege, gained at least 20% more performance in CPU limited scenarios as a result. Few game developers do that nowadays.
 
I kinda think that the part of the issue is with GPUs which simply has gotten a lot faster while CPUs has been stagnating for some years and even now they still is nowhere close to GPUs in gen on gen performance gains.

This leads to more and more users running into GPU limitations with new GPUs even though they do upgrade CPUs too. These upgrades though aren't enough to chase the performance gains in GPU space.

This is the most likely reason for people thinking that we are now CPU limited despite nothing's changed in how games are using the CPUs. The truth is that now we are CPU limited at 120+ fps where as prior to that we were GPU limited at 30-60. CPU performance has never scaled linearly and thus it is much easier to hit CPU bottlenecks on higher framerates.
 
Vulkan has had Conditional Rendering for quite awhile now. I’d be surprised if DX12 didn’t have something similar.
Sadly conditional rendering handles only draws or dispatches, but no barriers.
If you compute something, and the result affects later processing e.g. the workload of a later indirect dispatch, you need a barrier on this result.
It's beyond me why they missed this. The new feature feels more like growing bloat of useless features shortening the lifespan of the API than a proper step forward. : (

Iirc, in Mantle the feature was called 'conditional execution', and it did not only include barriers, but it could do even loops. So you could repeat some solve until the error is small enough, for example.
In VK or DX12 you can only add 100 dispatches for the solve into your command buffer/list to be sure it's enough, and after error is fine after just 5 iterations the GPU will execute 95 further zero work dispatches and barriers for nothing.
Or you download the error result after each iteration to CPU, and do a new dispatch only if needed. But that's even worse due to the cost of CPU / GPU sync.

It's a shame, because with Cuda / OpenCL2 GPUs can generate work even from compute directly, and there is no need to mess with command buffers or conditions.
Imo, this API flaw is just as bad as balckboxing RT BVH data structures. It's actually worse, because there is no technical or political reason for this limitation anymore.

However, i have not worked on GPUs for years, so if you think i miss something please let me know.
 
I have been optimizing data structures on both CPU and GPU side. Embedding data in the structure is always a big win compared to storing a pointer or array index in the structure and doing an indirect read (hits cold cache lines).
This topic is one of the main uncertainties for me. I agree, but your point only holds if you need all the data of the struct. If a lot of processing only needs some of the data, it might be better to separate the data over multiple structs or arrays.

The problem is that we have no tools to test different variations quickly. The languages do not help us on that, so all we could do is writing multiple code paths.
But we don't have time for that, so we make a decision upfront from belly feeling and experience, and we can only hope our choice was good.
However, since we don't have time to try multiple memory layouts frequently, chance that we have any proper experience at all is quite small. : )

I always think about creating some abstractions for the memory layout, so i could experiment without a need to change implementation of functionality, but then i never do it.
 
Outside of outliers like Spiderman (with RT on), The Witcher 3 (with RT on) there's very little games where the 3930k can not do a locked 60fps.

If you look at CPU reviews even low to mid end CPU are 100fps+ in pretty much every game: https://www.techpowerup.com/review/amd-ryzen-5-5600/14.html

If all you want is 60fps than a 10yr old CPU will do.

If you want high refresh you'll need something newer.

If you want to play modern games with RT enabled you'll need something released in the last 2-3 years.
Cyberpunk, Battlefield 2042, Hogwarts, Flight sim, Watch Dogs Legion, Gotham knights, Forspoken etc. There are plenty of games requiring huge amts of CPU performance.
 
Cyberpunk, Battlefield 2042, Hogwarts, Flight sim, Watch Dogs Legion, Gotham knights, Forspoken etc. There are plenty of games requiring huge amts of CPU performance.

And yet it wil be able to do at least 60fps in those games.

I had a 3930k and over locked to 4.4+ Ghz it's at Zen + levels.
 
And yet it wil be able to do at least 60fps in those games.

I had a 3930k and over locked to 4.4+ Ghz it's at Zen + levels.
Maybe in certain scenes. It will not come close to maintaining 60 fps throughout playtime. Takes Zen 4/Rocket Lake to vsync a locked 60, if it’s even possible at all.
 
Maybe in certain scenes. It will not come close to maintaining 60 fps throughout playtime. Takes Zen 4/Rocket Lake to vsync a locked 60, if it’s even possible at all.
No it doesn't.

Have you recently used a 3930k?

6 months ago I was gaming on a 4Ghz i7 4770k and was getting 60fps in every game bar the odd exception.

And the 3930k has an extra two cores which modern games absolutely will make use of.
 
Last edited:
6 months ago I was gaming on a 4Ghz i7 4770k and was getting 60fps in every game bar the odd exception.
I have a 3770K on a 2080Ti, and I can testify to that, 60fps is attainable on almost any game, unless CPU limited (due to RT, view distance, bad console ports), or GPU limited (4K, heavy RT, Path Tracing).

Still, we are wasting CPU cycles on who knows what .. a game like Uncharted 4 ran on the potato CPU of the PS4, yet on PC it requires a 4770K as a minimum, which is what? a CPU that is 3 times as powerful as the PS4? And that's the minimum mind you, to run properly on a modern GPU you would need to go higher and higher.
 
I have a 3770K on a 2080Ti, and I can testify to that, 60fps is attainable on almost any game, unless CPU limited (due to RT, view distance, bad console ports), or GPU limited (4K, heavy RT, Path Tracing).

Still, we are wasting CPU cycles on who knows what .. a game like Uncharted 4 ran on the potato CPU of the PS4, yet on PC it requires a 4770K as a minimum, which is what? a CPU that is 3 times as powerful as the PS4? And that's the minimum mind you, to run properly on a modern GPU you would need to go higher and higher.

PS4's OS only uses 1 CPU core (or is it 1.5 cores?) so that helps massively.

PC has a heavier OS and a heavier API and it all adds up.

The latest version of Linux runs loads faster on older hardware than Windows 11 does as it's just a lighter environment.
 
Last edited:
No it doesn't.

Have you recently used a 3930k?

6 months ago I was gaming on a 4Ghz i7 4770k and was getting 60fps in every game bar the odd exception.

And the 3930k has an extra two cores which modern games absolutely will make use of.
I have a 7700k@4.5 so probably faster in practice even with 2 less cores. Getting consistent 60 fps is miles away in the games I listed.

I have a 3770K on a 2080Ti, and I can testify to that, 60fps is attainable on almost any game, unless CPU limited (due to RT, view distance, bad console ports), or GPU limited (4K, heavy RT, Path Tracing).

Still, we are wasting CPU cycles on who knows what .. a game like Uncharted 4 ran on the potato CPU of the PS4, yet on PC it requires a 4770K as a minimum, which is what? a CPU that is 3 times as powerful as the PS4? And that's the minimum mind you, to run properly on a modern GPU you would need to go higher and higher.
I mean you can hand wave away the reason but it still exists.
 
PS4's OS only uses 1 CPU core (or is it 1.5 cores?) so that helps massively.

PC has a heavier OS and a heavier API and it all adds up.

The latest version Linux runs loads faster on older hardware than Windows 11 does as it's just a lighter environment.
It's 1.5 to 1.6 cores for the OS depending on VR or not. Reference: https://forum.beyond3d.com/threads/...-x-and-xbox-series-x-s-2019-12-2020-03.61513/

Around late 2015 Sony changed System Reservations from 100% CPU Core #7 to allow for using only 50% of CPU Core #7 or 60% of CPU Core #7 if VR, giving developers use of 50% of CPU Core #7 or 40% of CPU Core #7 if VR.
 
I have a 7700k@4.5 so probably faster in practice even with 2 less cores. Getting consistent 60 fps is miles away in the games I listed.

Nope, that extra clock speed is no substitute for 2 extra physical cores.

But I speak from experience of using a 3930k.

But the original argument is CPU requirements have increased while games aren't really doing anything extra than before.

And as can be demonstrated by 10 year old CPU's that's subjectively false.

Even a dual core CPU with SMT can get a locked 60fps in A Plagues Tale.
 

Attachments

  • A-Plague-Tale-Requiem-CPU-benchmarks.png
    A-Plague-Tale-Requiem-CPU-benchmarks.png
    29.5 KB · Views: 8
Last edited:
Nope, that extra clock speed is no substitute for 2 extra physical cores.

But I speak from experience of using a 3930k.

But the original argument is CPU requirements have increased while games aren't really doing anything extra than before.

And as can be demonstrated by 10 year old CPU's that's subjectively false.

Even a dual core CPU with SMT can get a locked 60fps in A Plagues Tale.
It's also several architecture revisions ahead of your Sandy Bridge CPU. Not every game struggles, but plenty do. Older Intel CPUs in particular have also seen performance heavily impacted in various workloads over the years due to all the software patching required to fix the many security exploits that surfaced.
 
It's also several architecture revisions ahead of your Sandy Bridge CPU. Not every game struggles, but plenty do. Older Intel CPUs in particular have also seen performance heavily impacted in various workloads over the years due to all the software patching required to fix the many security exploits that surfaced.

Intel architecture revisions bought next to nothing until AMD Ryzen.

Most the performance increase came from increased clock speeds.

3.8Ghz boost clock for the 2600k vs 4.5Ghz for the 7700k.

Clock the 2600k to 4.5Ghz (Which is a very easy overclock to achieve) and most of the 7700k's advantage goes away.

The 3930k can hit 4.5Ghz too and has aged much better than the older quad cores.
 
On a related note; for many years back, as clock speed increases and die shrinks got difficult, death of Moore's law was brought up.
Yesterday, Moore himself actually passed away. Much respect for the pioneers of what we are debating about here.
 
Intel architecture revisions bought next to nothing until AMD Ryzen.

Most the performance increase came from increased clock speeds.

3.8Ghz boost clock for the 2600k vs 4.5Ghz for the 7700k.

Clock the 2600k to 4.5Ghz (Which is a very easy overclock to achieve) and most of the 7700k's advantage goes away.

The 3930k can hit 4.5Ghz too and has aged much better than the older quad cores.

And this is essentially the problem with the software. CPU increases have slowed, especially during the last ten years. Parallelism/multi-threading is hard. Legacy code bases that were written with the expectation that cpu gains would continue on the same path gen over gen will basically not scale well to new hardware. So to remove cpu bottlenecks the engineers have to focus more on more on mapping the software to the hardware vs using abstractions that do not.
 
Sandy Bridge to Skylake is probably some 20-40% more performance at identical clocks. Especially when you factor in the higher spec memory Skylake supports.
 
Sandy Bridge to Skylake is probably some 20-40% more performance at identical clocks. Especially when you factor in the higher spec memory Skylake supports.

It won't be that high, Intel spent many years offering 3-4% increases in IPC until Ryzen showed up and forced them to start delivering decent gen on gen IPC gains.

But in modern games that's not enough to allow a Quad-Core to overcome two extra physical cores.
 
Last edited:
Back
Top