EDRAM
This information about PS2 and it's EDRAM use is amazing to read.
"I have my doubts as to whether it can ever really happen per se. This is a case where technical hurdles are every bit as large as the financial hurdles.
I mean, that eDRAM framebuffer and texture cache allowed you to have not only extremely high bandwidth, but also extremely low latency, and in turn meant that one render pass was extremely fast. Many times more so than any chip made today where the focus of the architecture is towards doing more work per pixel in a single pass (but in turn making the drawing of a single pixel many times slower and more complicated than it was on the GS).
There is quite simply more framebuffer and texture bandwidth on the PS2 than the PS3 has. And even the eDRAM of something like Xenos doesn't really help here because the latency is higher and it's explicitly a transient state holding buffer, rather than a final output field. GS not only allowed you to work in eDRAM, but that was the main VRAM itself... it stored the front and back buffer as well as working textures in there. On current chips, the backbuffer and frontbuffer have to go through a resolve step and are always stored in main VRAM.
All these sorts of things are exploited rather explicitly in the vast majority of PS2 games.
Even aside from the peculiarities of the GS, there are peculiarities of the interaction between the EE and GS. For instance, it's quite common to have the render loop be in a race condition against the GPU. We'll issue a display list to the GS which tells it to start processing from a vertex list... except that vertex list hasn't been filled at this point. Instead, the VU will go through the equivalent of a vertex shader and actually fill vertex data into that list while the GS is reading the stream.
The timing is just so that the VU stays constantly a few clock cycles ahead of the GS. That kind of tight timing resolution and synchronicity is 100% impossible on all current architectures because there is just not that kind of relationship between the components, nor is there quite so much constant predictability to the operating performance of any component (i.e. nothing ever *always* takes up x # of cycles anymore).
Also, synchronization primitives for modern bus architectures are actually very strict, so some of these types of things which were perfectly valid on a PS2 would actually not work at all because the bus would detect that a memory page is dirty and therefore wait for a cache write from the CPU before it does anything. This is of course done because it makes for a machine that is infinitely more stable, but that wasn't really the case for the PS2 which was more of a pure console. This sort of restriction means codependent read-write operations must wait for the CPU to finish -- a delay of many hundreds of thousands of clock cycles, while the PS2 only had to wait about 5 or 6 clock cycles."
http://psinsider.e-mpire.com/index.php?categoryid=17&m_articles_articleid=1315