Digital Foundry Article Technical Discussion [2023]

Remij · Apr 6, 2023

BitByte said:
I guessed why you did it but, I wanted to give you the benefit of doubt. This is a sleazy way to argue where you completely ignore the points he’s making and instead go for a personal attack that has nothing to do with his points. You can’t refute his points so you instead attack the individuals work in an attempt to invalidate his points. It’s basically a straw man.

Nothing he said was contentious and the games the company has shipped doesn’t not reflect his knowledge or intelligence. He just happens to work at a game studio that hasn’t shipped that many big games yet.

Where did I ignore the points he was making?

If you can't understand that there's a difference between John Carmack speaking about something he knows directly about, and someone having me on a podcast basically regurgitating things that John Carmack has said... then I dunno what else to say. There's no need for you to get offended on this guys behalf.. I'm simply pointing out that he has no first hand knowledge.. and MLID shouldn't present him as someone who does. He can basically regurgitate things other people have told him as well as I or anyone else could. Then not only that, you get other sites and people posting this podcast with their own spin on it...

Here's an actual informed opinion about VRAM usage and scaling:

https://twitter.com/x/status/1643637612211744768

BitByte · Apr 6, 2023

Remij said:
Where did I ignore the points he was making?

If you can't understand that there's a difference between John Carmack speaking about something he knows directly about, and someone having me on a podcast basically regurgitating things that John Carmack has said... then I dunno what else to say. There's no need for you to get offended on this guys behalf.. I'm simply pointing out that he has no first hand knowledge.. and MLID shouldn't present him as someone who does. He can basically regurgitate things other people have told him as well as I or anyone else could.

Here's an informed opinion:

https://twitter.com/x/status/1643637612211744768

Not once in your post did you address the points he made about memory swapping and etc. I’m assuming based on the flow of discussion, that’s natural point of contention.

Secondly, I’m not offended on his behalf. I just have a very strong disdain for arguments constructed in this manner. It’s destructive to honest conversation/debate.

Remij · Apr 6, 2023

BitByte said:
Not once in your post did you address the points he made about memory swapping and etc. I’m assuming based on the flow of discussion, that’s natural point of contention.

Secondly, I’m not offended on his behalf. I just have a very strong disdain for arguments constructed in this manner. It’s destructive to honest conversation/debate.

Maybe because this forum has been discussing this stuff for ages, and this MLID video does not bring anything new or interesting to the table... only arguments which have already been made.

Oh! PC has split memory? Tell me more!

Globalisateur · Apr 6, 2023

Remij said:
Indeed. Developers will be able to hide some of the cost of decompression because they aren't always compute bound.. and depending on the assets you're going to get even higher bandwidths than 10GB/s... which also means that the length of time that performance COULD be impacted... will be much shorter.. or they could actually slow things down a bit to reduce the performance impact but still ensure the assets are loaded ahead of time. They also don't even need to use the GPU for everything.. they can still use the CPU to decompress assets if they want. They can figure out the optimal decompression setup for all their assets. And as you said... during initial loads... the GPU can absolutely tear through whatever it needs to load going full out since it's not rendering anything at that point... whereas the CPU would also have to do other things like file reads, copies, asset initialization, shader compilation and so on.. at the same time.

Look at Forspoken.. it doesn't even use GPU decompression and it loads faster than the PS5 version, or essentially as fast. We already know there's other CPU bottlenecks in the engine preventing it from going faster on both PS5 and PC.

Here's my 7GB/s Gen4 drive in that same avocado test, getting more than double the bandwidth.. meaning ~2.6x compression ratio. Of course not all assets will compress the same.. but this is true for PS5 as well. We're talking ~11GBs on PS5 for assets which compress well, and ~18GB/s on a Gen4 drive.

Small correction: on PS5 it's ~20GB/s for assets that compress very well. ~10GBs is average bandwidth using Oodle compression, 8 or 9GB/s average when not using Oodle.

BitByte · Apr 6, 2023

Remij said:
Maybe because this forum has been discussing this stuff for ages, and this MLID video does not bring anything new or interesting to the table... only arguments which have already been made.

Oh! PC has split memory? Tell me more!

It literally costs you nothing to just own up to straw manning the dude. I asked because I wanted to give you the benefit of doubt but, you're still looking for a way to justify your dishonest argument. It's whatever man, if that's your style, it's your style.

Remij · Apr 6, 2023

Globalisateur said:
Small correction: on PS5 it's ~20GB/s for assets that compress very well. ~10GBs is average bandwidth using Oodle compression, 8 or 9GB/s average when not using Oodle.

That's only a correction if you assume that Mark Cerny and I have the same idea of what "compresses well" means.

The simple fact is PC has higher raw throughput, and assuming similar compression ratios, can also decompress at higher bandwidths as well.

Remij · Apr 6, 2023

BitByte said:
It literally costs you nothing to just own up to straw manning the dude. I asked because I wanted to give you the benefit of doubt but, you're still looking for a way to justify your dishonest argument. It's whatever man, if that's your style, it's your style.

-PCs have split memory
-You have to copy data across the PCI bus with added latency
-You have intrinsic overhead of the OS
-You have multiple vendors, APIs, and Drivers
-You have less specialized tools and documentation

-Consoles have unified memory
-CPU and GPU can both access memory directly
-Less overhead, more streamlined OS designed around gaming
-Single vendor, specialized APIs for the hardware, and guarantees from the drivers
-Specialized tools and documentation

Console development is easier than PC development... news at 11. Thanks MLID and random indie developer!

pjbliverpool · Apr 6, 2023

BitByte said:
Imo, it’s infinitely superior

I think you need to look up the definition of Infinite. It's use in that context it literally the definition of hyperbole.

BitByte said:
and if it wasn’t, Microsoft wouldn’t have bothered bringing GPU upload heaps to dx12.

I agree that the CPU and GPU having the ability to read and write from a shared pool of memory is better than having to copy the data between those pools for each processor to make it's changes. I haven't said otherwise. What I am saying is that upload heaps should mitigate that fundamental disadvantage of split memory pools to a degree. the extent of that degree is yet to be revealed and likely hinges on how versatile and easy to use the function is. i.e. can it literally do away with this type of memory transfer or is it only applicable to certain situations? And is it largely transparent to the developer once setup in this fashion, or does it need to be micro managed?

If the former in both then you've effectively turned a split memory architecture into a unified memory architecture while retaining all the advantages of a split memory architecture. If the latter in both, then things will still be improved but to a degree that will likely vary from virtually not all all, up to significant.

BitByte said:
Even with GPU upload heaps, it would still inferior to the consoles because the memory apis on consoles are better/easier to use. This in turn means that when it’s utilized, there’s a far greater chance of a successful implementation than anything with dx12.

Has anyone argued at any point that consoles are not easier to develop for than PC's? That's an unavoidable side effect of the PC being a modular system but you can hardly class if as an architectural issue given it's also it's primary strength. Of course there is always more scope with a PC to create a bad port. What is being argued here though is whether the PC has the potential to match or exceed the consoles with regards to IO, when a game is implemented well, and leverages all the modern hardware and API features of current PC's.

BitByte said:
True but, one could argue that the latency benefits are lost due to copying between ram pools.

Unless you don't have to copy. See: GPU Upload Heaps. And that is of course only relevant to data sets that need to be worked on by both the CPU and the GPU together. Many data sets sit exclusively on the CPU, and they will indeed enjoy the performance advantage of lower latency memory access.

BitByte said:
This can easily be solved by increasing the memory bandwidth on a hUMA system.

Lol, where you serious with this? Any increase to the memory bandwidth of a hUMA system (which is obviously impossible on a fixed console platform which is literally what makes the hUMA system possible in the first place) should of course be assumed to come along with a corresponding bandwidth increase in the non hUMA system.

Or put a much simpler way. For a given VRAM speed, the non hUMA system will have more effective bandwidth.

And as a side note, the only way to actually increase memory bandwidth on your current platform is to replace that specific component (GPU) which guess what, is only possible on a non hUMA system like a PC.

BitByte said:
True but again, Imo the downsides of split memory far outweigh the positives.

But you concede that regardless of your arguably (not really, because you used the word 'infinitely') inflated views of the upsides of hUMA, there are indeed some advantages of split pools too.

So if the advantages of hUMA could be brought to split pools by a versatile and easy to use GPU Upload Heaps feature (not saying that's what we're necessarily getting), then perhaps you can see how the previously "infinitely better" solution may not be better at all? At least from a raw capability standpoint. I think it will always be easier to use and thus extract the most out of in less skilled hands.

BitByte · Apr 6, 2023

Remij said:
-PCs have split memory
-You have to copy data across the PCI bus with added latency
-You have intrinsic overhead of the OS
-You have multiple vendors, APIs, and Drivers
-You have less specialized tools and documentation

-Consoles have unified memory
-CPU and GPU can both access memory directly
-Less overhead, more streamlined OS designed around gaming
-Single vendor, specialized APIs for the hardware, and guarantees from the drivers
-Specialized tools and documentation

Console development is easier than PC development... news at 11. Thanks MLID and random indie developer!

All of this to avoid accountability and taking responsibility. All you needed to write was "my bad" but it's cool. Now i know you argue with dishonest intent and when questioned about it, you further entrench yourself.

Remij · Apr 6, 2023

BitByte said:
All of this to avoid accountability and taking responsibility. All you needed to write was "my bad" but it's cool. Now i know you argue with dishonest intent and when questioned about it, you further entrench yourself.

Go shame someone somewhere else.. it's not going to work on me.

pjbliverpool · Apr 6, 2023

Globalisateur said:
Small correction: on PS5 it's ~20GB/s for assets that compress very well. ~10GBs is average bandwidth using Oodle compression, 8 or 9GB/s average when not using Oodle.

I believe the average is more like 11GB with Oodle texture (average 2:1 compression ratio). Of course that average includes the peaks of ~22GB/s (4:1 ratio) for the highly compressible data formats as well as data that will see less than the average 2:1 which gives you the overall average.

Assuming the AMD (and other Direct Storage) tests we've seen are representative of normal game loads then the 18GB/s on Remji's system (and I've seen in excess of 22GB/s in the thread I link earlier) is comparable to 9 or 11GB figures on the PS5 - depending on whether the PC tests are using RDO in the compression or just straight up lossless.

BitByte · Apr 6, 2023

pjbliverpool said:
I think you need to look up the definition of Infinite. It's use in that context it literally the definition of hyperbole.

It is hyperbolic but, it was intended to be hyperbolic? I thought that was implied.

pjbliverpool said:
I agree that the CPU and GPU having the ability to read and write from a shared pool of memory is better than having to copy the data between those pools for each processor to make it's changes. I haven't said otherwise. What I am saying is that upload heaps should mitigate that fundamental disadvantage of split memory pools to a degree. the extent of that degree is yet to be revealed and likely hinges on how versatile and easy to use the function is. i.e. can it literally do away with this type of memory transfer or is it only applicable to certain situations? And is it largely transparent to the developer once setup in this fashion, or does it need to be micro managed?

If the former in both then you've effectively turned a split memory architecture into a unified memory architecture while retaining all the advantages of a split memory architecture. If the latter in both, then things will still be improved but to a degree that will likely vary from virtually not all all, up to significant.

I didn't assert that GPU heaps wouldn't mitigate some of the fundamental disadvantages? I merely said that despite that, GPU heaps would still be inferior.

pjbliverpool said:
Has anyone argued at any point that consoles are not easier to develop for than PC's? That's an unavoidable side effect of the PC being a modular system but you can hardly class if as an architectural issue given it's also it's primary strength. Of course there is always more scope with a PC to create a bad port. What is being argued here though is whether the PC has the potential to match or exceed the consoles with regards to IO, when a game is implemented well, and leverages all the modern hardware and API features of current PC's.

Unless you don't have to copy. See: GPU Upload Heaps. And that is of course only relevant to data sets that need to be worked on by both the CPU and the GPU together. Many data sets sit exclusively on the CPU, and they will indeed enjoy the performance advantage of lower latency memory access.

It wasn't an argument, it was a mere observation. As to whether theoretical pcs can outperform a theoretical hUMA system, you might as well ask me to find the limit as both functions approach infinity. You could make a whole hosts of arguments for either. My initial point was that as of today, pc's memory subsystem is inferior to console and that hasn't changed yet.

pjbliverpool said:
Lol, where you serious with this? Any increase to the memory bandwidth of a hUMA system (which is obviously impossible on a fixed console platform which is literally what makes the hUMA system possible in the first place) should of course be assumed to come along with a corresponding bandwidth increase in the non hUMA system.

Or put a much simpler way. For a given VRAM speed, the non hUMA system will have more effective bandwidth.

And as a side note, the only way to actually increase memory bandwidth on your current platform is to replace that specific component (GPU) which guess what, is only possible on a non hUMA system like a PC.

As it stands, ignoring storage bottlenecks, the limiting bandwidth on pc systems is the PCIe bus. Increasing the bandwidth in non huma systems today does not bypass this limitation.

pjbliverpool said:
But you concede that regardless of your arguably (not really, because you used the word 'infinitely') inflated views of the upsides of hUMA, there are indeed some advantages of split pools too.

So if the advantages of hUMA could be brought to split pools by a versatile and easy to use GPU Upload Heaps feature (not saying that's what we're necessarily getting), then perhaps you can see how the previously "infinitely better" solution may not be better at all? At least from a raw capability standpoint. I think it will always be easier to use and thus extract the most out of in less skilled hands.

I'll always concede new data is presented. My stance on a matter is not tied to my ego so there's no reason for me not to concede when new evidence is provided.

BitByte · Apr 6, 2023

Remij said:
Go shame someone somewhere else.. it's not going to work on me.

I wasn't attempting to shame you at all. I just wanted to see if you were someone to be taken seriously and you quickly answered that question for me. I understand your modus operandi now so, I do not need to deceive myself into thinking that you're actually here to have meaningful honest discussion.

Remij · Apr 6, 2023

BitByte said:
I wasn't attempting to shame you at all. I just wanted to see if you were someone to be taken seriously and you quickly answered that question for me. I understand your modus operandi now so, I do not need to deceive myself into thinking that you're actually here to have meaningful honest discussion.

I don't take people who post and cite MLID videos seriously at all. I guess that was your mistake.

BitByte · Apr 6, 2023

Remij said:
I don't take people who post and cite MLID videos seriously at all. I guess that was your mistake.

Man, Muhammed Ali in his prime couldn't dodge like you. It's really a sight to behold lol.

Remij · Apr 6, 2023

BitByte said:
Man, Muhammed Ali in his prime couldn't dodge like you. It's really a sight to behold.

Let's cut this out before we get moderated. Ignore me and I'll ignore you.

chris1515 · Apr 6, 2023

Remij said:
Where did I ignore the points he was making?

If you can't understand that there's a difference between John Carmack speaking about something he knows directly about, and someone having me on a podcast basically regurgitating things that John Carmack has said... then I dunno what else to say. There's no need for you to get offended on this guys behalf.. I'm simply pointing out that he has no first hand knowledge.. and MLID shouldn't present him as someone who does. He can basically regurgitate things other people have told him as well as I or anyone else could. Then not only that, you get other sites and people posting this podcast with their own spin on it...

Here's an actual informed opinion about VRAM usage and scaling:

https://twitter.com/x/status/1643637612211744768

I have no doubt UE 5 can scale with 8 GB GPU but not all engine use virtual texture and virtual geometry. For example, Crystal Dynamics told when they choose UE, their own custom engine was not using virtual texturing.

EDIT: I agree with you all people are too obsessed by the SSD. The problem for developers can be solve evolving the engine to use virtual texturing and virtual geometry. Or in the case of TLOU Part 1 they need to have something between medium texture and high for 8 GB GPU. But I suppose rework all the texture is a long process and it will arrive later.

Remij · Apr 6, 2023

chris1515 said:
I have no doubt UE 5 can scale with 8 GB GPU but not all engine use virtual texture and virtual geometry. For example, Crystal Dynamics told when they choose UE, their own custom engine was not using virtual texturing.

EDIT: I agree with you all people are too obsessed by the SSD. The problem for developers can be solve evolving the engine to use virtual texturing and virtual geometry. Or in the case of TLOU Part 1 they need to have something between medium texture and high for 8 GB GPU. But I suppose rework all the texture is a long process and it will arrive later.

The problem is some people are trying really hard to draw big conclusions about architectures based off games that were very specifically designed for specific console platforms without any intention of ever developing them for PC. That's an even worse case scenario than games which are multiplatform and release on PC but still developed around consoles first and foremost.. like most games are. When we talk about the future, are we just talking about Sony first party games?

It's absolutely worth talking about the difficulties developers face when porting Playstation games to PC... but they shouldn't be used to make absolute statements about capabilities of certain platforms. Every individual port is a unique case with its own unique challenges. Yes, it's quite obvious that many recent PC games have had issues, many of which can be attributed to lack of proper QA.. which are often fixed relatively soon after launch.. and others are very ingrained into the differences between architectures and the resulting openness of the PC platform.

Do I think that if developers designed their games specifically around the PCs memory architecture and could guarantee a certain level of bandwidth throughout that things would be MUCH better.. yes I do. So it's kind of annoying when you get people stoking arguments acting like it's the fundamental design of the architecture which is to blame, and not the fact that developers simply develop in a different way which suits the other platform better.

Henry swagger · Apr 6, 2023

Nothing worse than a ugly game needing more than 8gb of vram.. plague tales requime is levels above the last of us and its not a vram hog.. naughty dogs engine is not a very good engine

pjbliverpool · Apr 6, 2023

BitByte said:
It is hyperbolic but, it was intended to be hyperbolic? I thought that was implied.

But why resort to hyperbole at all? We're supposed to be having a technical discussion on the relative merits of two solutions. Being deliberately hyperbolic in praise of your preferred solution is directly counterproductive to the discussion. That's why I suggested not to do it at the start.

BitByte said:
I didn't assert that GPU heaps wouldn't mitigate some of the fundamental disadvantages? I merely said that despite that, GPU heaps would still be inferior.

I agree it's not the same as hUMA because the dev still has to manage which datasets reside on the CPU and which reside on the GPU in order to maximise performance, which means more work for the developer, and more chance of making the wrong decision and getting non-optimal performance. All that said, as long as it's implemented well, it should greatly mitigate the biggest complaint the recently linked source has about the PC which is the copying of data back and forth between memory pools.

With that potentially significantly mitigated. The relative balance of benefits vs costs of split vs unified memory architectures shifts significantly, particularly in raw performance terms.

BitByte said:
It wasn't an argument, it was a mere observation. As to whether theoretical pcs can outperform a theoretical hUMA system, you might as well ask me to find the limit as both functions approach infinity. You could make a whole hosts of arguments for either. My initial point was that as of today, pc's memory subsystem is inferior to console and that hasn't changed yet.

And I would counter that by saying hUMA is simpler, and easier to extract maximum performance from, but not necessarily superior when it comes to maximum performance in a well developed application for a given memory bandwidth, especially given that PC's typically feature more overall memory in comparable configurations (e.g. the oft cited 3600x + 2700S combo which would generally come with a total of 24GB), let alone higher end ones. And I expect this statement to become more pertinent moving forwards with recent developments around GPU upload heaps helping to remove many of the memory copies that are currently needed.

BitByte said:
As it stands, ignoring storage bottlenecks, the limiting bandwidth on pc systems is the PCIe bus. Increasing the bandwidth in non huma systems today does not bypass this limitation.

What is your evidence that this is a limitation and why? All evidence that I've seen (i.e. actual benchmarks) suggests that there is no speed up when moving from the current, to a newly released iteration of the fastest PCIe interface.

But hey, if you can show me benchmarks from when PCIe4 was first launched that demonstrate games at that time saw a sudden performance boost going from PCIe3 to PCIe4 then I'd be interested to see them.

Similarly, PCIe5 exists now on motherboards. Why have neither Nvidia or AMD chosen to use it in their latest, just released GPU lines if this is bottlenecking the system? Surely that would be a relatively cheap way to gain a competitive advantage if that were the case?

And even if PCIe were a bottleneck, and again, I'm curious to understand what your reasoning is for thinking this, have you considered how GPU based decompression will significantly reduce the load on PCIe?

One final point to consider, if PCIe bus bandwidth is a bottleneck in PC's, and as you claim; increasing VRAM bandwidth does not bypass that bottleneck, then why when we increase VRAM bandwidth (in line with GPU compute resources), do we see performance go up? Surely if PCIe is a true bottleneck there, then performance should not increase at all. And yet when we swap out an already very big GPU (lets say a 4080) tethered to the end of this apparent bottlenecking PCIe interface with an even bigger GPU (lets say a 4090), we see a big performance gain.

Digital Foundry Article Technical Discussion [2023]

Remij

BitByte

Remij

Globalisateur

Globby

BitByte

Remij

Remij

pjbliverpool

B3D Scallywag

BitByte

Remij

pjbliverpool

B3D Scallywag

BitByte

BitByte

Remij

BitByte

Remij

chris1515

Remij

Henry swagger

pjbliverpool

B3D Scallywag

Similar threads