In the case in question, we are talking about a max memory bandwidth in the range of 75-100 GB/s and a realistic bandwidth on the order of 70-80% of peak for a range of 56-75 GB/s. With gives a per frame bandwidth of roughly 1GB or a bit higher at 60 FPS.
I thought we were talking about known current systems and current unified memory architectures (Trinity, Sandy/Ivy Bridge, maybe even Xbox 360), not some rumored future ones. We only have actual facts and benchmark data of existing systems (everything else is pure speculation, and as I said earlier, I am not interested in participating in next gen speculation).
Please don't tell me you think a 70-100 GB/s unified memory architecture is considered "slow" by today's standards. Not even Intel's highest end 12 thread Sandy Bridge E and the fully enabled 16 thread Xeon server CPU versions are equipped with a memory system that fast. Quad channel DDR3-1600 is the fastest officially supported, and it provides a 51 GB/s theoretical bandwidth (37 GB/s in benchmarks, not far from AMDs utilization percentages:
http://www.anandtech.com/show/5091/...-bridge-e-review-keeping-the-high-end-alive/4). These chips cost 1000$+ and the motherboards supporting quad channel memory aren't cheap either.
Lets look at highest end desktop APUs available with unified memory. Dual channel DDR3-1600 is the maximum officially supported memory for Intel's flagship desktop APU (Ivy Bridge). Dual channel DDR3-1833 is the maximum officially supported memory for AMD's flagship desktop APU (Trinity). Memory bandwidths are 25.6 GB/s and 29.2 GB/s respectively. These figures match perfectly with my calculations for the "slow" memory system (common DDR3 memory at highest commonly available clocks).
Of course you can find memory kits designed for CPU overclockers. I actually bought these kind of premium memory sticks to my old Q6600 based desktop. The problem with these kind of enthusiast kits is that they are produced in very low quantities (cherry picked parts), and thus the price is very high. For example cheapest DDR3-2400 kit (2 x 4 GB) I found from newegg.com was G.SKILL Ripjaws Z series at 96.99$. In comparison you will find standard DDR-1600 kits (2 x 4 GB) for 40.99$. As DDR-1600 is the highest officially supported on Intel platforms, it is commonly used in brand new high end gaming desktops, and thus is the most relevant high volume product that we can still somehow qualify as "slow and cheap".
Then if the design has a high speed temporary buffer of reasonable size (32MB+), this also reduces the amount of non-static texture data that must be stored and read further increasing the relative size of the texture bandwidth and therefore the streaming texture cache space required.
Relatively large manual high speed "caches" such as the Xbox 360 EDRAM are very good for reducing redundant bandwidth usage (especially for GPU rendering). EDRAM removes all the memory bandwidth waste you get from blending, overdraw, MSAA and z-buffering. Basically you get all these for free. The bandwidth free overdraw of course also helps with shadowmaps as well, but since Xbox 360 cannot sample from EDRAM, you have to eventually copy the shadowmap to main memory (consumes memory bandwidth) and sample it from there (consumes memory bandwidth just like any static texture). Same is true for g-buffer rendering and sampling (must be copied eventually to main memory and sampled from there consuming memory bandwidth).
However no matter how excellent EDRAM is, it cannot increase the maximum total accessible unique memory per frame. It can "only" (drastically) reduce the waste for double (or even higher) access counts to same memory regions, and thus get us more near to the theoretical maximum (= 200 MB unique memory per frame, assuming we still use the current highest end desktop APU unified memory systems as our "system of choice"). I have already stated in many threads how much I like the EDRAM in Xbox 360, so I don't do that again
The available memory bandwidth per frame is only interesting as it applies to total memory if you can predict with some degree of certainty which 200MB or so you're going to touch, and you can actually get it off disk before you will need it.
Of course. Without exact knowledge of your access patterns and excellent prediction algorithms and good fall back plans (stalling doesn't count
) you need to keep considerable extra overhead data in your memory (just in case).
So while above some threshold more memory doesn't help you with higher res textures that doesn't make it useless.
Extra memory is of course always a good thing to have. It allows you to keep some (hard to predict) data components permanently in memory. And it saves you development time as well. That's not an insignificant gain. More is always better, unless it means we have to compromise somewhere else. Aaronspink stated he would prefer to have 2 GB of extra memory instead of a 3-4x faster GPU, and that's something I cannot agree with (especially if that GPU is 3-4x slower because of bandwith limitations that in other hand limit the usability of the extra 2 GB I would get in the trade).
What most would refer to procedural content, at some level you can consider parametric content to be data compression with extreme compression ratios.
Really the memory is still just a cache, but it's a cache for computation rather than disk reads. It's one area I'd be seriously looking at going forwards.
Parametric content (artist controlled) will be very important in the future. However I also see it as a way to reduce memory accesses. Why would you store the calculations to memory, if you can recalculate it every time to the L1 cache instead and waste no bandwidth at all? ALU is basically free (compared to memory accesses), and it will become even more free in the future (while memory accesses will remain expensive).
However if the parametric generation consumes more bandwidth than the access of the generated data, then I am a huge supporter for caching it. For example in our virtual texturing system, the terrain texture is generated (blended with a complex formula) from a huge amount of artist placed decals. In the worst case areas there are almost 10 layers of decals on top of each other, but we burn that data once to the virtual texture cache, and during the terrain rendering a single texture lookup is enough (generated data gets repeatedly reused 60 times per second just like loaded data from HDD).
As an aside The Sony paper is interesting but doesn't age well, you can still kill yourself with virtual function calls
That's not the main point of the paper. Yes it's nice that you can evade some branches and virtual calls, but the main point (and main performance gain) is the improved memory access pattern. Component model is a good approach, and many developers are using it in their newest engines.