Yes, it's still a unified memory system. The GPU can access the full range. At least that's how I understand it.
I'm just not up on personalities so I didn't know if they we on major publications or ran a rumor twitter or something.Whatever you want them to be
Be worth you updating your message, as it leaves the wrong impression even though its been explained by the source that it gave wrong impression, and not heard of it having a problem.You asked for the source, i gave it,
It is 25% of the allocation, or 75% of the space is accessible at the highest rate. There would have to be a significant amount of space consumption in the 10GB space by the buffers or structures that are part of the highest-demand targets, and somehow a developer cannot find at least some data that doesn't need the same bandwidth as the ROPs or CUs running full-bore with streaming compute. Anything expected to be handled by the CPU during the frame will not be needing the higher number, and could not even reach the lower one. Any graphics resources that don't hit the same peaks as the pixel shading pipeline could still get along decently with what is still a pretty generous bandwidth amount.The 6GB have an effective bandwidth of 336 GB/s. Just as a hypothetical, if the GPU is accessing this memory more than Microsoft would have expected I guess there is potential to be bandwidth limited. Ideally you want access across all memory channels for full bandwidth, but I guess if you access that particular range too much you'll lower your effective bandwidth. This goes back to how the memory interleaving is set up and I'm by no means an expert.
My interpretation is that any client can read anywhere, just that there's more performance to be had in the 10GB/s GPU-optimized section, and only the GPU can create that kind of demand. Since the OS and other apps seem to be in the slow section, I would assume they'd like it if the GPU were able to read their buffers or otherwise know their requests exist.Can the GPU reach that slower portion? I thought it was 10 GBs addressable by GPU at full speed, 16 GBs addressed by everything at slower speed.
Maybe that weird Roman numeral V shaped devkit had heat issues? [emoji16]
Better tools and API, for example. Compare Vulkan and DX9/10, for example.
You create sw api that is hw agnostic. Data in, data out. Preferably data out to ram in specific address like sony does. The API can be implemented in cpu/gpu/ssd. Implementing it in ssd is not great idea ever due to the pcie bus and the limited nature of those busses on regular cpu's. You want to have the implementation in cpu/gpu or in amd case in io-controller to save pcie lanes for better use.
Yep, I understand that. But if we're talking about an uncompressed data stream rather than a compressed one (as I can't see how PC's can work with a compressed one without dedicated decompression hardware which they won't get any time soon outside of the GPU), then the steps you describe above are the same for either a PC or the PS5 with the exception of the additional step of data going over the PCIe 16x bus to the GPU.
So yes that's an extra step not required by consoles which will add latency, but from a bandwidth perspective it's not a concern as that interface is 4x wider than the one between the SSD and the APU/CPU in either the PS5 or the PC.
I'd be curious to know how much extra compression the consoles custom formats gain you over those formats already in use natively by GPU's today. Is 5GB/s on a modern PC driver already the equivalent of something higher using those formats?
Something is bothering me in all this talk about compression and effective bandwidths.
Aren’t textures already compressed data? We aren’t transferring plain text files, but structures that are already stored in compressed formats - so how much can we realistically expect an additional general compression scheme to yield? It’s related to the question above - I feel confused about what people are really saying, bandwidth wise.
Having to read data from SSD into RAM (over the south bridge) then write some of it over the PCI bus to the GPU (over the Northbridge) is a big bottleneck.
Consoles have few performance performance advantages over a modern PC but unified memory solves the issue about having to move data so it's in the write place. If you need data accessibly to both CPU and GPU, a PC is the worst case scenario.
But on consoles, devs will certainly be taking advantage of compressed data - compressed data is less data to read from SSD, less data used on the SSD. You can't discount it in a fair comparison because it's no longer a fair comparison.
Naive GPU compressed texture formats and Kraken solve different problems. Kraken is designed to save storage space, I don't think anybody is suggesting PS5's APU can natively process compressed kraken images, if so why care about decompressing them?
The compression is low to be able to decompress in realtime inside the GPU. You can compress further on disk.
But there is no Southbridge and Northbridge in the context of what we're discussing. The Zen2 platform has the IO controller and the PCIe controllers for both NVMe and GPU's directly on the CPU die itself. So in that respect it's exactly the same as a PS5. I believe if it's a non-m2 NVMe drive then it would indeed communicate with the CPU via the chipset but that's not the case for these high speed SSD's, it's direct to CPU communication. I'm not sure how modern Intel platforms handle it but they're already at a significant disadvantage given they won't support PCIe 4.0 until Rocketlake later this year.
There's no argument there. I do understand that there's an extra step in a PC architecture to get data into graphics memory. I think the question of how much additional latency that will add though depends heavily on whether when using DirectStorage the GPU is able to bypass the CPU/system memory altogether. If not then that essentially prevents the PC in it's current form from benefiting from one of the key features of the new consoles. i.e. using the SSD as additional graphics memory.
I don't see why it's not a fair comparison if we're comparing the consoles compressed throughput vs the PC's uncompressed throughput. 4.8GB/s is the XSX compressed throughput. We acknowledge it's use of compression by comparing that figure with 5GB/s uncompressed on a modern NVMe SSD's. I guess to be completely fair we then also have to acknowledge that install sizes on the PC would also need to be larger. Same with PS5. We're comparing 7GB/s for the fastest PC drives with 8-9GB/s in the PS5.
Although again this doesn't take into account the native compression formats already handled by modern GPU's. So in reality that 5GB/s uncompressed throughput of the PC drive is higher, although not by as high a ratio as the new consoles with their custom compression.
But the existing native GPU compression formats will still be saving both storage space and transfer bandwidth so it's valid to consider their impact.
Do you know what the typical compression ratio is and what typical ratio of all data transfers from disk in a game are made up of those compressed data types? If we're only talking about a 5% benefit then it's probably not worth considering from the perspective of this conversation. But say 20% or more would be significant.
The combination of kraken and lossy BCn seems to reach almost 4:1 with real game data, and I wonder how much RDO improves this. But once they insert lossy compression it becomes difficult to make a comparison. Any claims of XYZ compression ratio is meaningless without a subjective evaluation of artifacts at the arbitrary level of data loss they have chosen for the test. And obviously any claim of a guaranteed lossless ratio is snake oil.Kraken alone only give 20 to 30% improvement. RDO encoding of texture + Kraken or BCPack can give 50% compression improvement for BC7 texture from what rich geldrich said on his twitter account maybe other textures format(BC1-5) can compress more.
The combination of kraken and lossy BCn seems to reach almost 4:1 with real game data, and I wonder how much RDO improves this. But once they insert lossy compression it becomes difficult to make a comparison. Any claims of XYZ compression ratio is meaningless without a subjective evaluation of artifacts at the arbitrary level of data loss they have chosen for the test. And obviously any claim of a guaranteed lossless ratio is snake oil.
http://cbloomrants.blogspot.com/2018/02/oodle-260-leviathan-detailed.html
http://cbloomrants.blogspot.com/2018/03/improving-compression-of-block.html
But there is no Southbridge and Northbridge in the context of what we're discussing. The Zen2 platform has the IO controller and the PCIe controllers for both NVMe and GPU's directly on the CPU die itself.
I don't see why it's not a fair comparison if we're comparing the consoles compressed throughput vs the PC's uncompressed throughput. 4.8GB/s is the XSX compressed throughput.
We acknowledge it's use of compression by comparing that figure with 5GB/s uncompressed on a modern NVMe SSD's. I guess to be completely fair we then also have to acknowledge that install sizes on the PC would also need to be larger. Same with PS5. We're comparing 7GB/s for the fastest PC drives with 8-9GB/s in the PS5.
North/southbridge functionality isn't gone in Zen2 it's now on chip and referred to with AMD nomenclature. The bus arrangement still exists and you're still moving things from here, to here to here.
If Zen2's architecture can allow the SSD to route data direct over PCI to the GPU, without the existing steps then that is news to me.
I guess this depends on what you're trying to measure. If you trying to compare how each arrangement would benefit game's performance, you cannot discount compression.
Even on PC, digging through packs of assets in many games you'll find duplicates although probably not as many on today's consoles. It'll be interesting to compare install sizes of 4K base assets on nextgen consoles vs. PC. Both nextgen consoles promise more granular install options so those small SSDs may go a lot further.