General Next Generation Rumors and Discussions [Post GDC 2020]

These "rumours" sound like a few of the voiced assumptions here how this could play out for Sony until they have a 5nm console. Doesn't make them valid though:)
 
One thing he mentions in that interview is that PS5 seems to be very easy to develop for. Which would be in line with Jim Ryan's comments not so long ago.

"One thing that makes me particularly optimistic that what we're hearing from developers and publishers, is the ease in which they are able to get code running on PlayStation 5 is way beyond any experience they've had on any other PlayStation platform."

Good for early launch games.
 
The 6GB have an effective bandwidth of 336 GB/s. Just as a hypothetical, if the GPU is accessing this memory more than Microsoft would have expected I guess there is potential to be bandwidth limited. Ideally you want access across all memory channels for full bandwidth, but I guess if you access that particular range too much you'll lower your effective bandwidth. This goes back to how the memory interleaving is set up and I'm by no means an expert.
It is 25% of the allocation, or 75% of the space is accessible at the highest rate. There would have to be a significant amount of space consumption in the 10GB space by the buffers or structures that are part of the highest-demand targets, and somehow a developer cannot find at least some data that doesn't need the same bandwidth as the ROPs or CUs running full-bore with streaming compute. Anything expected to be handled by the CPU during the frame will not be needing the higher number, and could not even reach the lower one. Any graphics resources that don't hit the same peaks as the pixel shading pipeline could still get along decently with what is still a pretty generous bandwidth amount.
Even if evenly distributing accesses across all portions, it should average out at ~500 GB/s.

Can the GPU reach that slower portion? I thought it was 10 GBs addressable by GPU at full speed, 16 GBs addressed by everything at slower speed.
My interpretation is that any client can read anywhere, just that there's more performance to be had in the 10GB/s GPU-optimized section, and only the GPU can create that kind of demand. Since the OS and other apps seem to be in the slow section, I would assume they'd like it if the GPU were able to read their buffers or otherwise know their requests exist.
 
Can we please not set things up to play the 'victim' aspect later on? It's really not anything that should be used in friendly water cooler or office space discussions about gaming. So let's keep things civil and not treat this as a debate. Let's treat this as a friendly open discussion.
 
Maybe that weird Roman numeral V shaped devkit had heat issues? [emoji16]

For what it's worth, there were rumors some weeks back that the silver colored v shape devkit was hot and noisy. A black newer revision was much cooler and less noisy.
 
Better tools and API, for example. Compare Vulkan and DX9/10, for example.

It doesn't make any sense : an explicit low overhead API only helps in CPU limited cases, not in GPU limited scenarios where all API will have the same performance

moreover in this case you're comparing API more than 10 years apart, here we're talking about DirectX12 which is mature and quite competitive with Vulkan.
 
You create sw api that is hw agnostic. Data in, data out. Preferably data out to ram in specific address like sony does. The API can be implemented in cpu/gpu/ssd. Implementing it in ssd is not great idea ever due to the pcie bus and the limited nature of those busses on regular cpu's. You want to have the implementation in cpu/gpu or in amd case in io-controller to save pcie lanes for better use.

Right now there is no interconnect connect between device drivers in the Windows kernel, this segmentation is deliberate. If you want the SSD to serve GPU with as little overhead (bottlenecks) as possible then ideally you want to connect these two device drivers and let them manage I/O without there kernel having too.


Yep, I understand that. But if we're talking about an uncompressed data stream rather than a compressed one (as I can't see how PC's can work with a compressed one without dedicated decompression hardware which they won't get any time soon outside of the GPU), then the steps you describe above are the same for either a PC or the PS5 with the exception of the additional step of data going over the PCIe 16x bus to the GPU.

Having to read data from SSD into RAM (over the south bridge) then write some of it over the PCI bus to the GPU (over the Northbridge) is a big bottleneck. People talk about these bus's maximum theoretical bandwidth like that is what is typically observed, which is never the case with maximum theoretical performance. Consoles have few performance performance advantages over a modern PC but unified memory solves the issue about having to move data so it's in the write place. If you need data accessibly to both CPU and GPU, a PC is the worst case scenario.

But on consoles, devs will certainly be taking advantage of compressed data - compressed data is less data to read from SSD, less data used on the SSD. You can't discount it in a fair comparison because it's no longer a fair comparison.

So yes that's an extra step not required by consoles which will add latency, but from a bandwidth perspective it's not a concern as that interface is 4x wider than the one between the SSD and the APU/CPU in either the PS5 or the PC.

Maximum theoretical transfers rates are almost never achieved.

I'd be curious to know how much extra compression the consoles custom formats gain you over those formats already in use natively by GPU's today. Is 5GB/s on a modern PC driver already the equivalent of something higher using those formats?

Naive GPU compressed texture formats and Kraken solve different problems. Kraken is designed to save storage space, I don't think anybody is suggesting PS5's APU can natively process compressed kraken images, if so why care about decompressing them?
 
Something is bothering me in all this talk about compression and effective bandwidths.
Aren’t textures already compressed data? We aren’t transferring plain text files, but structures that are already stored in compressed formats - so how much can we realistically expect an additional general compression scheme to yield? It’s related to the question above - I feel confused about what people are really saying, bandwidth wise.
 
Something is bothering me in all this talk about compression and effective bandwidths.
Aren’t textures already compressed data? We aren’t transferring plain text files, but structures that are already stored in compressed formats - so how much can we realistically expect an additional general compression scheme to yield? It’s related to the question above - I feel confused about what people are really saying, bandwidth wise.

The compression is low to be able to decompress in realtime inside the GPU. You can compress further on disk.
 
Having to read data from SSD into RAM (over the south bridge) then write some of it over the PCI bus to the GPU (over the Northbridge) is a big bottleneck.

But there is no Southbridge and Northbridge in the context of what we're discussing. The Zen2 platform has the IO controller and the PCIe controllers for both NVMe and GPU's directly on the CPU die itself. So in that respect it's exactly the same as a PS5. I believe if it's a non-m2 NVMe drive then it would indeed communicate with the CPU via the chipset but that's not the case for these high speed SSD's, it's direct to CPU communication. I'm not sure how modern Intel platforms handle it but they're already at a significant disadvantage given they won't support PCIe 4.0 until Rocketlake later this year.

Consoles have few performance performance advantages over a modern PC but unified memory solves the issue about having to move data so it's in the write place. If you need data accessibly to both CPU and GPU, a PC is the worst case scenario.

There's no argument there. I do understand that there's an extra step in a PC architecture to get data into graphics memory. I think the question of how much additional latency that will add though depends heavily on whether when using DirectStorage the GPU is able to bypass the CPU/system memory altogether. If not then that essentially prevents the PC in it's current form from benefiting from one of the key features of the new consoles. i.e. using the SSD as additional graphics memory.

But on consoles, devs will certainly be taking advantage of compressed data - compressed data is less data to read from SSD, less data used on the SSD. You can't discount it in a fair comparison because it's no longer a fair comparison.

I don't see why it's not a fair comparison if we're comparing the consoles compressed throughput vs the PC's uncompressed throughput. 4.8GB/s is the XSX compressed throughput. We acknowledge it's use of compression by comparing that figure with 5GB/s uncompressed on a modern NVMe SSD's. I guess to be completely fair we then also have to acknowledge that install sizes on the PC would also need to be larger. Same with PS5. We're comparing 7GB/s for the fastest PC drives with 8-9GB/s in the PS5.

Although again this doesn't take into account the native compression formats already handled by modern GPU's. So in reality that 5GB/s uncompressed throughput of the PC drive is higher, although not by as high a ratio as the new consoles with their custom compression.

Naive GPU compressed texture formats and Kraken solve different problems. Kraken is designed to save storage space, I don't think anybody is suggesting PS5's APU can natively process compressed kraken images, if so why care about decompressing them?

But the existing native GPU compression formats will still be saving both storage space and transfer bandwidth so it's valid to consider their impact.

The compression is low to be able to decompress in realtime inside the GPU. You can compress further on disk.

Do you know what the typical compression ratio is and what typical ratio of all data transfers from disk in a game are made up of those compressed data types? If we're only talking about a 5% benefit then it's probably not worth considering from the perspective of this conversation. But say 20% or more would be significant.
 
But there is no Southbridge and Northbridge in the context of what we're discussing. The Zen2 platform has the IO controller and the PCIe controllers for both NVMe and GPU's directly on the CPU die itself. So in that respect it's exactly the same as a PS5. I believe if it's a non-m2 NVMe drive then it would indeed communicate with the CPU via the chipset but that's not the case for these high speed SSD's, it's direct to CPU communication. I'm not sure how modern Intel platforms handle it but they're already at a significant disadvantage given they won't support PCIe 4.0 until Rocketlake later this year.



There's no argument there. I do understand that there's an extra step in a PC architecture to get data into graphics memory. I think the question of how much additional latency that will add though depends heavily on whether when using DirectStorage the GPU is able to bypass the CPU/system memory altogether. If not then that essentially prevents the PC in it's current form from benefiting from one of the key features of the new consoles. i.e. using the SSD as additional graphics memory.



I don't see why it's not a fair comparison if we're comparing the consoles compressed throughput vs the PC's uncompressed throughput. 4.8GB/s is the XSX compressed throughput. We acknowledge it's use of compression by comparing that figure with 5GB/s uncompressed on a modern NVMe SSD's. I guess to be completely fair we then also have to acknowledge that install sizes on the PC would also need to be larger. Same with PS5. We're comparing 7GB/s for the fastest PC drives with 8-9GB/s in the PS5.

Although again this doesn't take into account the native compression formats already handled by modern GPU's. So in reality that 5GB/s uncompressed throughput of the PC drive is higher, although not by as high a ratio as the new consoles with their custom compression.



But the existing native GPU compression formats will still be saving both storage space and transfer bandwidth so it's valid to consider their impact.



Do you know what the typical compression ratio is and what typical ratio of all data transfers from disk in a game are made up of those compressed data types? If we're only talking about a 5% benefit then it's probably not worth considering from the perspective of this conversation. But say 20% or more would be significant.

Kraken alone only give 20 to 30% improvement. RDO encoding of texture + Kraken or BCPack can give 50% compression improvement for BC7 texture from what rich geldrich said on his twitter account maybe other textures format(BC1-5) can compress more.
 
Kraken alone only give 20 to 30% improvement. RDO encoding of texture + Kraken or BCPack can give 50% compression improvement for BC7 texture from what rich geldrich said on his twitter account maybe other textures format(BC1-5) can compress more.
The combination of kraken and lossy BCn seems to reach almost 4:1 with real game data, and I wonder how much RDO improves this. But once they insert lossy compression it becomes difficult to make a comparison. Any claims of XYZ compression ratio is meaningless without a subjective evaluation of artifacts at the arbitrary level of data loss they have chosen for the test. And obviously any claim of a guaranteed lossless ratio is snake oil.

http://cbloomrants.blogspot.com/2018/02/oodle-260-leviathan-detailed.html

http://cbloomrants.blogspot.com/2018/03/improving-compression-of-block.html
 
The combination of kraken and lossy BCn seems to reach almost 4:1 with real game data, and I wonder how much RDO improves this. But once they insert lossy compression it becomes difficult to make a comparison. Any claims of XYZ compression ratio is meaningless without a subjective evaluation of artifacts at the arbitrary level of data loss they have chosen for the test. And obviously any claim of a guaranteed lossless ratio is snake oil.

http://cbloomrants.blogspot.com/2018/02/oodle-260-leviathan-detailed.html

http://cbloomrants.blogspot.com/2018/03/improving-compression-of-block.html

RDO + kraken and BCPack seems to be equivalent. This is why he choose this comparison. This is his comparison not mine. You can ask him on twitter. He is a pionner in compression and decompression of textures of images, I think he probably have a good idea of the subject.


 
Last edited:
But there is no Southbridge and Northbridge in the context of what we're discussing. The Zen2 platform has the IO controller and the PCIe controllers for both NVMe and GPU's directly on the CPU die itself.

North/southbridge functionality isn't gone in Zen2 it's now on chip and referred to with AMD nomenclature. The bus arrangement still exists and you're still moving things from here, to here to here. If Zen2's architecture can allow the SSD to route data direct over PCI to the GPU, without the existing steps then that is news to me.

I don't see why it's not a fair comparison if we're comparing the consoles compressed throughput vs the PC's uncompressed throughput. 4.8GB/s is the XSX compressed throughput.

I guess this depends on what you're trying to measure. If you trying to compare how each arrangement would benefit game's performance, you cannot discount compression. If you want to measure base throughput, then discount compression - but you must accept this is a nonsense because devs will be compressing everything on console with zlib at a minimum because decompression is free.

We acknowledge it's use of compression by comparing that figure with 5GB/s uncompressed on a modern NVMe SSD's. I guess to be completely fair we then also have to acknowledge that install sizes on the PC would also need to be larger. Same with PS5. We're comparing 7GB/s for the fastest PC drives with 8-9GB/s in the PS5.

The relatively small SSD sizes in netxgen consoles will hopefully incentivise devs to be more discerning about what data they include in their game install. I would very much expect to see far less pre-rendered videos and they may be less need because a) you're not using videos to hide loading times and b) you can load much faster to setup the scene to do it in-engine. I'm not saying they these methods are technically equipment, because they aren't but now in-engine will be a real option without worrying about load times.

Even on PC, digging through packs of assets in many games you'll find duplicates although probably not as many on today's consoles. It'll be interesting to compare install sizes of 4K base assets on nextgen consoles vs. PC. Both nextgen consoles promise more granular install options so those small SSDs may go a lot further.
 
North/southbridge functionality isn't gone in Zen2 it's now on chip and referred to with AMD nomenclature. The bus arrangement still exists and you're still moving things from here, to here to here.

Agreed, but my point is that the PS5 arrangement is the same if we're talking about moving data from the SSD to system memory/CPU. Both systems take the same path with the same number of steps over the same buses. As I said earlier though there is an additional step for PC's with dGPU's to push the data off the APU/CPU and over the PCIe 16x bus to the GPU. That won't impact overall bandwidth but will impact latency, especially if the data has to do via system memory first.

If Zen2's architecture can allow the SSD to route data direct over PCI to the GPU, without the existing steps then that is news to me.

3dilettante made some comments on this a few days back but I can't find the post now. If I recall correctly he said the hardware does support point to point requests (from GPU to SSD for example) but that the SSD itself would need to support that and presumably there would need to be some software API to allow it. Whether DirectStorage will fill that gap, who can say.

I guess this depends on what you're trying to measure. If you trying to compare how each arrangement would benefit game's performance, you cannot discount compression.

I think we both agree on this. This isn't something I'm attempting to do.

Even on PC, digging through packs of assets in many games you'll find duplicates although probably not as many on today's consoles. It'll be interesting to compare install sizes of 4K base assets on nextgen consoles vs. PC. Both nextgen consoles promise more granular install options so those small SSDs may go a lot further.

Yes agreed. I understand the current consoles use zlib for the optical drive storage (I'm not sure about the HDD's) so this seems like an extension of that. It's a shame that technology isn't available in the PC arena.
 
Back
Top