This makes a number of assumptions to do this.
The first is that AMD's cache hierarchy would distinguish page table data from other cached data. The normal behavior is that page table data is readily cached, and moves up and down the hierarchy.
Since the OS needs to be able to change page table bits, it certainly helps to be able to access that memory normally.
AMD has experience for what happens when TLB management and caching don't play nice, so mucking with this higher-risk and it hasn't been mentioned that Jaguar has changed this part of the system architecture.
This also requires that the extra SRAM be cache-coherent with the modules, since page table traffic and updates are just memory traffic.
How this SRAM would know that memory filling or spilling from the modules is for page data without housing and looking up the relevant page directory and table data for every access is another challenge.
This storage isn't significantly larger than the Jaguar L2s, so it would be quickly thrashed without this filtering.
The actual page tables for a AMD64 address space and multi-gigabyte physical memory space can readily exceed on-die storage as well.
http://www.eurogamer.net/articles/digitalfoundry-the-complete-xbox-one-interview
Perhaps this sheds light on the topic.
<><><><><><><><><><><><><
Digital Foundry: You're running multiple systems in a single box, in a single processor.
Was that one of the most significant challenges in designing the silicon?
Nick Baker: There was lot of bitty stuff to do.
We had to make sure that the whole system was capable of virtualisation, making sure everything had page tables, the IO had everything associated with them. Virtualised interrupts.... It's a case of making sure the IP we integrated into the chip played well within the system. Andrew?
Andrew Goossen: I'll jump in on that one. Like Nick said there's a bunch of engineering that had to be done around the hardware but the software has also been a key aspect in the virtualisation. We had a number of requirements on the software side which go back to the hardware.
To answer your question Richard, from the very beginning the visualization concept drove an awful lot of our design. We knew from the very beginning that we did want to have this notion of this rich environment that could be running concurrently with the title. It was very important for us based on what we learned with the Xbox 360 that we go and
construct this system that would disturb the title - the game - in the least bit possible and so to give as varnished an experience on the game side as possible but also to innovate on either side of that virtual machine boundary.
We can do things like update the operating system on the system side of things while retaining very good compatibility with the portion running on the titles, so we're not breaking back-compat with titles because titles have their own entire operating system that ships with the game. Conversely it also allows us to innovate to a great extent on the title side as well. With the architecture, from SDK to SDK release as an example we can completely rewrite our operating system memory manager for both the CPU and the GPU, which is not something you can do without virtualisation. It drove a number of key areas... Nick talked about the page tables. Some of the new things we have done - the GPU does have two layers of page tables for virtualisation. I think this is actually the first big consumer application of a GPU that's running virtualised. We wanted virtualisation to have that isolation, that performance. But we could not go and impact performance on the title.
We constructed virtualisation in such a way that it doesn't have any overhead cost for graphics other than for interrupts. We've contrived to do everything we can to avoid interrupts... We only do two per frame. We had to make significant changes in the hardware and the software to accomplish this. We have hardware overlays where we give two layers to the title and one layer to the system and the title can render completely asynchronously and have them presented completely asynchronously to what's going on system-side.
System-side it's all integrated with the Windows desktop manager but the title can be updating even if there's a glitch - like the scheduler on the Windows system side going slower... we did an awful lot of work on the virtualisation aspect to drive that
and you'll also find that running multiple system drove a lot of our other systems. We knew we wanted to be 8GB and
that drove a lot of the design around our memory system as well.
><><><><><><><><><><><><><><
They keep on referring back to significant alterations and the whole chip designed around virtualization to facilitate a 'rich environment that could be running concurrently with the title [game]'.
Half-speed L2 arrays might add several core clock cycles of latency.
Numbers ranging from 188 to 192 don't seem like a big thing to quibble over.
And as we both know that tiny difference could be down to the slightly different layout and slightly more distance of the cpu clusters on ps4 than the xbox. Given how access times are so senstive to interconnect length, which is why l1 is so much faster than l2, its not hard to see it a slight difference.