Yeah but only with 4k texture versus 16K pc version.
Are you absolutely sure that the ps3 uses only 256MB?
I don't want to doubt his information, but you have a link to support that?
I wonder if rendering feedback buffer as a cubemap would work.
It would use more memory for cache, but it should reduce loading spikes when turning around.
I don't think we can determine how Rage uses the two main memory banks on the PS3. But I'd say it's a safe bet that the VRAM is too big to store the various render targets and the texture used for rendering - so at least some cache related data is probably in there as well.
Edit: also, the fact that Rage probably has to use both main RAM and VRAM to cache megatexture data is most likely one of the main reasons why Carmack has some problems with the architecture, it must be a crazy job to manage two memory pools in the same kind of task...
It seems that there are many misconceptions about how virtual texturing actually works...
I have programmed the virtual texturing system we use in our next Xbox 360 game. I have pretty deep knowledge about all the gritty fine details of this particular streaming technology. According to the released technical information about id software's system, our system is very similar to theirs. Lionhead's virtual texturing system also seems to be very similar with our and id's systems. So the information I am going to post is pretty general, and should describe id's and Lionhead's systems also pretty well.
The basic idea about virtual texturing (or basically any fine grained on demand texture streaming) is that you only need a single texel of texture data to draw a single pixel in the screen (if filtering is not counted). So if you had the most optimal streaming system, you would only need a single 1280x720 resolution texture (1) in memory, and nothing more.
Assuming console gaming at 720p. 1280x720 = 921k pixels, so the screen actually has slightly less pixels than a single 1024x1024 texture (=1048k pixels). Console games without real time streaming tend to use mainly 256x256 and 512x512 textures for each single object in the game world. Some key objects (such as the main player characters) might use a 1024x1024 texture. To keep hundreds of textures in memory at all times is a huge waste of memory. 200 MB of memory wasted just for textures sounds like a lot, if you could theoretically manage with a single 1024k texture (less than 4 megabytes depending on texture format).
The ideal case of keeping just a single texel per pixel in screen is of course not possible, since loading single pixels stored around the DVD/HDD would be really slow. Spinning media has very long seek times compared to solid media (memory chips or flash based devices). All the three virtual texture systems combat this by loading 128x128 pixel tiles instead of individual pixels (16384 pixels are loaded at once). When a pixel is required, the whole tile containing it is loaded to the tile cache. Usually textures are mapped quite regularly around the objects, so if you need a certain pixel, you very likely also need the pixels around it. Also scenes tend to animate in a way that surfaces near the surfaces that used to be visible become visible in the next frame. Of course there are geometry boundaries that make some loaded tile contents partially map areas that are not currently visible. The bigger the tiles are the more pixels in pixel cache get "wasted" this way. 128x128 tile size seems to a really good compromise between wasted memory and storage device seek latency (since all three virtual textured engines that I know use the same tile size).
We and id sofware (on consoles) use a 4096x4096 texture as the tile cache. Lionhead uses a 4096x2048 texture as their tile cache. You can fit 1024 of 128x128 tiles in a 4096x4096 tile cache (it's basically a simple texture atlas). All the tiles used to texture the currently visible scene need to be loaded to the tile cache, since the tile cache represents only the visible part of the whole (huge) virtual texture. In our system we have measured that in a common scene, there's usually around 200 to 300 unique tiles visible at once. So in average around 1/4 of the cache is used to render the current visible scene. This translates roughly to the pixel count of a single 2048x2048 texture. Compared to the theoretical minimum of having just a single texel per pixel in memory, our system requires around four times as much texture pixels. This is really good since a 128x128 tile contains 16384 pixels (all must be loaded at once even if only a single one is needed), and in reality textures are also sampled in fractional positions (bilinear needs four texel samples, and trilinear needs eight, four from two mip levels).
The 4096x4096 tile cache is enough to render a scene (at 720p) with as much texture detail as you want, since the texture detail is irrelevant to virtual texturing (assuming reasonable uv mapping of course). You only need a single texel per screen pixel + "wasted" area from the 128x128 tiles in memory to draw the scene, no matter how detailed textures the scene contains. For higher resolutions than 720p you of course need a larger tile cache. Id sofware has stated that they are using a 8192x8192 texture for their tile cache in PC version of Rage. 1080p would require a tile cache of 2.25x and 2560x1600 4.44x size compared to 720p. The required tile cache size scales linearly to screen resolution (actually slightly sub linearly, since the tiles become smaller in proportion to the screen resolution and this means slightly less wasted pixels). So it's completely natural that id software uses a larger tile cache on PC, since PC gamers tend to play at higher resolutions. A 8192x8192 tile cache on consoles (720p) would not improve the texture detail at all. However the (4x) larger tile cache would reduce the data streaming from the game media (but not drastically, since increasing cache size usually gives only logarithmic gains).
On Xbox 360, our system uses a combination of texture formats that make our material 2.5 bytes per pixel (2xBC3+BC4). As our tile cache is 4096x4096 pixels, the total amount of texture (material) data we have in memory is always 40 megabytes. In addition to the tile cache, our system has an single 2048x2048 indirection texture with a full mip chain (16 bits per pixel, 5551 format). Indirection texture is 10 megabytes, and is used by the GPU to do a fast lookup to find the proper tile in the tile cache (based on texture coordinates and mip level). Also we have eight loader buffers of 128x128 pixels (2.5 bytes per pixel = 320 KB total) for background data loading on CPU (we have a background CPU thread doing our texture loading all the time). So in total the system takes around 50 megabytes of GPU memory, and less than a megabyte of CPU memory. I don't see any problems in a system like this on PS3, since the loader buffers are stored in the CPU memory, and all new tiles could be simply copied to the GPU tile cache at start of each frame. Also the 256 megabyte size of the both memory pools is not in any way a problem (actually virtual texturing makes it much easier to live with smaller memory).
Lets analyze the rotating camera "problem" a bit. For simplicity, lets assume a 90 degree field of view and a full 360 degree turn. When you turn around you see 360/90 = 4 completely different views that do not share any surfaces with each other. With the perfect streaming system, you would only need to load four 720p images in the time of the full turn. Virtual texturing systems use similar texturing compression systems than the most advanced video codecs (jpeg2k, ptc, etc), so it's not hard to believe that you should be easily able to stream four frames of full screen compressed image data in the time that the fastest thumbstick turn takes. Earlier I stated that on average it takes 1/4 of our tile cache (250 tiles out of 1024) to store the currently visible surface tiles. In the average case you could keep rotating around with no HDD activity at all, since our cache would be large enough to fit all the data of the four 90 degree views. However is we have a stressful situation (complex scene with lots of overlapping narrow geometry) around 1/3 of the cache could be needed to store a single view. In this situation, the system starts to constantly stream data when the camera is rotated. Now if we assume that a single view is around 340 tiles (1/3 of the cache), and we see approximately four full views during the rotation, we would need to load 1360 tiles during the rotation (assuming our cache doesn't help a bit). If we assume a compressed tile size of 10 KB (in HDD), the total bandwidth required during the turn becomes 13.6 MB. A 360 thumbstick turn on a console shooter could take 2 seconds, so we need 6.8 MB/s transfer rate (assuming no seek penalties = a wrong assumption obviously). Slow 5400 RPM notebook hard drives have around 100 MB/s sustained transfer rates and DVD drives have 20+ MB/s, so transfer rate is clearly not a possible bottleneck in any scenario. Seek latency however can be a big bottleneck on DVDs, since a worst case seek can take 100ms. So the engines use different methods to reduce seeking. Id hasn't told much about their methods, but they surely have implemented some state of the art methods implemented to optimize the virtual texture page ordering in their DVD images. Lionhead's tech papers reveal that they are storing nearby objects and terrain close each other in the virtual texture (maximizing the tile usage in lower mips) and they group four 128x128 tiles in a single continuous compressed region in disc (and have a small additional 2x2 macrotile loader cache). We also try to keep similar objects nearby in the virtual texture and optimize our loads in many ways, but as our game is guaranteed to be installed to HDD or memory units (flash based memory) our seeking problems (and solutions) are limited compared to these two other disc based games.
A six way render (cubemap) could be used to sense tiles that are required in near future, but this only helps in the camera rotation case, and would waste a lot of cache space and bandwidth loading stuff that might be never used (players usually move forward a lot more than backward). A better system renders the tile query using a slightly wider field of view and using an approximated future camera. We calculate an approximation based on the current camera speed (and acceleration) to determine where it would be in 6 frames of time. We jitter the tile query camera also a bit from frame to frame to be sure to hit all narrow objects and to have slight extra data in the cache for sudden unpredictable movements. Lionhead's system also has some prediction logic and they do periodic randomly rotated queries. Id hasn't revealed inner workings of their tile queries yet, but I am sure they are using some sophisticated prediction methods to determine the data they need ahead of time.
I personally think the biggest problem in virtual texturing is the currently popular disc DVD/BR based storage that has awful seek times. As digital distribution becomes more popular these problems are slowly fading away. Hard drives are fast enough to load everything before you can notice the missing detail, and new flash based (SSD) memories make virtual texturing even better (super low seek times). Fortunately flash memory based devices have become really popular recently (ultraportables, tablets, smartphones), but in the other hand we have the cloud, and network based virtual texture streaming sounds like a really great idea. As 720p requires only around 6.7 MB/s to stream all required texture data, even current network connections have more than enough bandwidth for streaming... however hiding the constant 200ms network latency would require lot of additional research.
---
(1) By a "single texture" I actually mean a single material = a color map and normal map with same unwrapped texture coordinates. Many engines also pack some material properties in the texture channels (specular properties, etc). When I am talking about a visible color texture, I always mean a material with all the needed texture layers.