You're still not getting it. Processing power is meaningless if the data is struggling to get where it needs to be. That is the core issue right now. That is the main factor that has changed this new generation. What has narrowed the relative performance gap between home consoles and PC. Do your best to stop thinking under outdate standards and understand why and how things will be different going forward.
Why would data be struggling to get to where it needs to be on a properly equipped PC and application? Like I'm seriously asking you to explain the bottleneck here in terms of interfaces, bandwidth and processing capability vs workload because all I'm hearing are wild claims about PS5 superiority without any technical details to back them up.
Obviously a PS5 is going to outperform an ill-equipped PC in this respect, (and as we've seen, if the application isn't properly utliising a correctly equipped PC, the result will be the same), but lets take an PCIe 4.0 NVMe equipped system utilising Direct Storage 1.1 with GPU decompression as a baseline along with a CPU and GPU that is at least a match for those in the console. And as a preview I will say that a little more CPU power on the PC side should be required for a similar result, but I'll leave you to explain the detail of why...
Thats not entirely true though. Because it comes down to the minimal PC Specs they would aim for.
And if Direct Storage realy is the savior of PC I/O ...
If Direct Storage cant keep up ( on PCs around PS5 Spec or below) with a fully utilized PS5 I/O then it becomes difficult.
Why would Direct Storage be unable to keep up? As with my above point, please explain this in terms of interfaces, bandwidth and processing capability vs workload. We already have benchmarks showing decompression throughput far in excess of the known limits of the hardware decompressor in the PS5, so what is it that you think will not be able to keep up, and why?
Perhaps it's the GPU's ability to keep up with the decompression workload at the same time as the rendering. Which begs the question of how much data are you expecting to be streamed in parallel to actual gameplay?
Even a modest GPU of around PS5 capability can decompress enough data to fill the entire VRAM of a standard 8GB GPU in less than 1 second. And you're never going to completely refresh you're VRAM like that mid gameplay. If I recall the Matrix awakens demo was streaming less then 150MB/s and I think
@HolySmoke presented some details for Rifts Apart here before which show even that has modest streaming requirements on average.
Of course there will be full scene changes which including load screens or animations (which includes ultra short loads animations like the rift transitions in Rifts Apart) where a relatively large amount of data will be loaded from disk in a very short period, but you aren't actually rendering much of anything on the GPU at those points and so the entire GPU resources can be dedicated to the decompression much in the same way that the hardware block on the PS5 is used.
For normal, much more modest streaming requirements, async compute is used targeting the spare compute resources on these GPU's as a GPU is very rarely 100% compute limited.
If for some reason that cobbled together Software solution is not quite on par with PS5s fixed funktion hardware array it could realy hurt PS5 Exclusives Development..
You realise that describing one as a "cobbled together Software solution" and the other "hardware array" (lol) isn't going to make anyone here think that one is faster than the other? For that we need, details, specifications, and ideally tests.
Sure but you also need to look at ressources used (vram, GPU) and latency. PC will need to brute force it if they want to stay competitive against PS5 I/O.
We know the VRAM used. As stated
here, the default staging buffer size is 32MB, but the optimal size is 128MB. More can be added for larger VRAM pools. So it's very minimal compared with the available VRAM size on any modern GPU.
We can also draw some conclusions around GPU resources required given the link I posted above shows the 6600XT pushing in excess of 10GB/s. So if we were to assume a HUGE streaming speed of 1GB/s then that would take at most around 10% of the compute resources of a 6600XT - via async compute. Obviously faster loading speeds could be utilised during initial or fast travel loads which would use more GPU resources but at those times the GPU isn't needed for rendering meaning 100% of it is available for decompression.
As to latency. The latency of all the various busses and memory components (DDR, GDDR) in this solution are orders of magnitude lower than that of the NVMe drive in the PS5 (and PC) and so there's no reason to expect that to be adding any appreciable latency over the PS5's end to end transfer latency which will be almost entirely made up by the NVME drives latency itself.
So to address your final point, what's your definition of "brute forcing" here? 128MB more VRAM? 10% more GPU compute power in the corner cases where a GPU's compute limits are 100% maxed out? Doesn't sound like much to worry about to me.