So your positions is that Sandybridge was obviously "orders of magnitudes" faster and better than Westmere in ways that neither manifested themselves in measurable ways? And you you think it's sensible to measure the transition point of the northbridge logic being incorporated into the CPU die with modern AMD technologies, like there was no evolutions of technology in the last eleven years? Rather than Westmere to Sandybridge - the transition point.
Please quote where I referenced Sandybridge and Westmere in that way? That's right, I did not. In the context of discussing
modern PC architectures you claimed, and I quote "the way the processor part of the die
connects to the external bus part of the die is not that different from when the separate processor chip was connected to the separate northbridge via the FSB". To which I responded that the bandwidth and latency of that connection is orders of magnitude better than the old FSB, and thus it is very different. I made absolutely no claims about the real word performance implication of that interconnect improvement because it had absolutely no bearing on the core argument.
You trying to twist that into me claiming that "Sandybridge was obviously "orders of magnitudes" faster and better than Westmere" is disingenuous at best. It's also entirely irrelevant to the core argument which was your incorrect implication that the Northbridge was some obstacle that the PC had to contend with which the consoles did not.
Nobody, myself included, said the northbridge - or the need for data to move over buses was an obstacle - just the path the data must take.
Here's what you said:
DSoup said:
but Microsoft's API cannot change the fundamental data flow across hardware on your average PC for which there are two effective setups:
2) for drives using NVMe/PCIe connections - your data is read off the storage cells by the drive controller, passed to the bus controller in the north-bridge, then has to be routed to either main memory or the graphics card memory. If the GPU is decompressing data it's doing that from GDDR then writing it back to the GDDR for graphics use or redirecting it across the north bridge controller to main memory for use by the CPU.
Current generation consoles have very simple (and limited) architectures. They read data off the storage cells by a single I/O controller which decompresses automatically - and is written to one pool of shared memory. So even where PC components and drives are much faster, they are still moving data around a lot more.
You're clearly framing the journeys over the Northbridge as an additional step the PC has the manage that the console does not. Again. This is wrong. You misrepresent both data flows above to make one seem significantly more complicated than the other. Here is what the console flow would look like if written in exactly the same context as you used for the PC flow:
"your data is read off the storage cells by the drive controller, passed to the decompression unit where it's written to the local cache, decompressed and then written back to that cache before being directed across the north-bridge, to main memory"
Why the inconsistency? If you're going to mention the Northbridge in one description, why miss it from the other?
I get that you like to pretend things that aren't on separate chips don't exist, despite all lithography analysis of Intel CPUs and Intel's own logic diagrams very clearly showing discrete logic functions still existing and there being clear logic path (the bus) twee the CPU and those blocks, but you.. ok. You can die on that hill.
Again, stop trying to wildly misrepresent what I've said. Clearly I've never tried to claim that the functions that used to be handled by the Northbridge "no longer exist". I have referenced multiple times over the past several posts how they were integrated into the CPU, including in the very first post on this subject. The issue here is you framing the Northbridge as something the PC has to deal with, and the console does not. Which once again, is wrong. Why are you so resistant to just admitting this and moving on? These last several pages of argument have been completely unnecessary. A simple acknowledgment that the "Northbridge" functionality/requirement is the same in both console and modern PC is all that was required rather than endless posts arguing about increasingly off topic minutia.
All people are saying is that on consoles, the I/O controller decompresses data (without any need to read and write data to RAM) during the process of transferring from. The data undergoes two stages;. first to the I/O controller, where decompression happens in the on-die cache, then it's written direct to memory. It doesn't get any simpler, smarter or more efficient than that. Having to move data around a bunch of places, reading of compressed data from RAM and writing decompressed data to RAM - is a less efficient approach. But the only one that exists on PC right now.
No, it isn't written direct to memory. In your own framing it is directed across the Northbridge to main memory. I realise that's just semantics but it's the framing that's important because you're trying to represent the simplicity of one data flow vs the complexity of another.
To illustrate this, take a PC that's using an APU instead. By your description above the data undergoes the same 2 stages, simply in reverse. Data is written direct to memory from the SDD where it is decompressed by the GPU (or CPU) ready for use. Just as simple as the console route you describe above. For PC's with a dGPU there is just one additional step - moving the GPU data from main memory to the GPU. And RTX-IO may even address that. Far from "moving data around a bunch of places" as you describe it above.
Also, what evidence do you have to support your suggestion that writing the data to GPU memory for decompression introduces any kind of performance penalty compared to doing it in the local memory of the consoles hardware decompressor? And if you're not saying that then why are you mentioning at all? We've been told that the GPU decompression is more than capable of keeping up with the fastest available NVMe drives so why does it matter what memory pool is being used for that? Except of course to frame it as somehow more complicated/inferior/likely to mitigate the benefits of faster components.