It is hyperbolic but, it was intended to be hyperbolic? I thought that was implied.
But why resort to hyperbole at all? We're supposed to be having a technical discussion on the relative merits of two solutions. Being deliberately hyperbolic in praise of your preferred solution is directly counterproductive to the discussion. That's why I suggested not to do it at the start.
I didn't assert that GPU heaps wouldn't mitigate some of the fundamental disadvantages? I merely said that despite that, GPU heaps would still be inferior.
I agree it's not the same as hUMA because the dev still has to manage which datasets reside on the CPU and which reside on the GPU in order to maximise performance, which means more work for the developer, and more chance of making the wrong decision and getting non-optimal performance. All that said, as long as it's implemented well, it should greatly mitigate the biggest complaint the recently linked source has about the PC which is the copying of data back and forth between memory pools.
With that potentially significantly mitigated. The relative balance of benefits vs costs of split vs unified memory architectures shifts significantly, particularly in raw performance terms.
It wasn't an argument, it was a mere observation. As to whether theoretical pcs can outperform a theoretical hUMA system, you might as well ask me to find the limit as both functions approach infinity. You could make a whole hosts of arguments for either. My initial point was that as of today, pc's memory subsystem is inferior to console and that hasn't changed yet.
And I would counter that by saying hUMA is simpler, and easier to extract maximum performance from, but not necessarily superior when it comes to maximum performance in a well developed application for a given memory bandwidth, especially given that PC's typically feature more overall memory in comparable configurations (e.g. the oft cited 3600x + 2700S combo which would generally come with a total of 24GB), let alone higher end ones. And I expect this statement to become more pertinent moving forwards with recent developments around GPU upload heaps helping to remove many of the memory copies that are currently needed.
As it stands, ignoring storage bottlenecks, the limiting bandwidth on pc systems is the PCIe bus. Increasing the bandwidth in non huma systems today does not bypass this limitation.
What is your evidence that this is a limitation and why? All evidence that I've seen (i.e. actual benchmarks) suggests that there is no speed up when moving from the current, to a newly released iteration of the fastest PCIe interface.
But hey, if you can show me benchmarks from when PCIe4 was first launched that demonstrate games at that time saw a sudden performance boost going from PCIe3 to PCIe4 then I'd be interested to see them.
Similarly, PCIe5 exists now on motherboards. Why have neither Nvidia or AMD chosen to use it in their latest, just released GPU lines if this is bottlenecking the system? Surely that would be a relatively cheap way to gain a competitive advantage if that were the case?
And even if PCIe were a bottleneck, and again, I'm curious to understand what your reasoning is for thinking this, have you considered how GPU based decompression will significantly reduce the load on PCIe?
One final point to consider, if PCIe bus bandwidth is a bottleneck in PC's, and as you claim; increasing VRAM bandwidth does not bypass that bottleneck, then why when we increase VRAM bandwidth (in line with GPU compute resources), do we see performance go up? Surely if PCIe is a true bottleneck there, then performance should not increase at all. And yet when we swap out an already very big GPU (lets say a 4080) tethered to the end of this apparent bottlenecking PCIe interface with an even bigger GPU (lets say a 4090), we see a big performance gain.