spin off RAM & Cache Streaming implications

manux · Sep 7, 2011

I wrote some posts already to another thread but I would just reiterate that if next gen console would have a decent flash based memory with good read speed(write speed not relevant) it would completely change the dynamics of streaming. Suddenly you actually can trust to get a lot of stuff and detail on demand whereas with optical you might not ever be able to load significant high detail models/textures on demand. Using spectacularly fast and expensive ram as huge cache just because PC's do that would be stupid on consoles.

Assuming the flash ram is optimized mainly for read speed it makes optimizations possible that are not feasible in pc space(cheaper IO controller which can cut down even from today's ssd specs). Also because we would know on OS and driver level we always use SSD we can optimize the hell out of it on software level without caring about regular HDD. Better yet even 50MB/s or slower write speed would be enough because almost all the writes always come from optical which is not that fast to begin with.

Even if the flash ram is fairly expensive in 2 years for let's say 64GB capacity the flash price scales better than regular HDD. Also having a really good baseline for streaming allows developers rely on streaming in a whole new way. Just imagine if you could load 100MB models on demand for football game. On SSD it would take fraction of a second, on optical it would take probably 4 seconds or more and on a good 2.5" HDD maybe 1-2seconds. Really good streaming speed would allow insane graphics on closeups and variety in gameplay as pulling models and textures in would be brilliantly fast. Imagine what pulling 400MB of unique content every second would do to grand theft auto(instead of 10MB per second or less if seeks needed).

Edit. I don't think 400MB/s read, 50MB/s write and simplified IO controller would be out of question in 2 years time for console launch. Better stuff is already in the market and the question is just lowering current spec and making it affordable in some reasonable capacity.

manux · Sep 7, 2011

Some funny numbers. Assuming we would increase memory from 512MB to 4GB next gen that is 8x improvement. Assuming we would jump to 400MB/s read speed flash ram streaming speed would be increased 40x compared to optical(xbox360 baseline and only on outer rim). And I suppose the even bigger factor would be diminished seek times and insane amount of IOPS that can be done next gen.

Also optimizing console OS with flash ram makes a lot of sense. The once cursed swapping could be utilized super efficiently to pull in OS resources and services on demand without actually halting everything due to excessive contention of IO operations(i.e. seeks). This would nicely reduce amount of ram used to just cache data because user might use some service at any given time.

TheWretched · Sep 7, 2011

Well... it's not just graphics that need RAM. All CPU operations that aren't directly graphics related need RAM too, though probably less. And in bigger interactive worlds with more interactive elements, say pedestrians, cars, "objects", all of which are physics enabled. Comparing PC GTA4 to console GTA, by bumping up the traffic count to 100, it is nearly where I want it to be. A city like New York is always congested, and so games should mirror this to a degree (as long as it doesn't bog down fun).

More RAM should also enable more... randomness. In GTA4 (again) there aren't too many different cars in any given scene. As the game already has a LOT of different cars, it isn't a problem of assets in this case.

It should also enable more... persistance in these games. Crash your car, turn around walk 100m and everything behind the view frustum is lost. There are games that do save these states, but mostly, they don't, which breaks immersion a bit.

manux · Sep 7, 2011

TheWretched said:
Well... it's not just graphics that need RAM. All CPU operations that aren't directly graphics related need RAM too, though probably less. And in bigger interactive worlds with more interactive elements, say pedestrians, cars, "objects", all of which are physics enabled. Comparing PC GTA4 to console GTA, by bumping up the traffic count to 100, it is nearly where I want it to be. A city like New York is always congested, and so games should mirror this to a degree (as long as it doesn't bog down fun).

More RAM should also enable more... randomness. In GTA4 (again) there aren't too many different cars in any given scene. As the game already has a LOT of different cars, it isn't a problem of assets in this case.

It should also enable more... persistance in these games. Crash your car, turn around walk 100m and everything behind the view frustum is lost. There are games that do save these states, but mostly, they don't, which breaks immersion a bit.

The persistence you talk about requires some tens/hundreds of bytes of ram per character/object. Trivially small stuff compared to textures, models and buffers required by rendering. Persistence is more about computing power to actually simulate everything at the same time. Imagine if GTA4 tried to simulate ALL the cars in the game in all the streets at the same time and also calculate physics for all the collisions every second... Or similarly if all of oblivion was war at the same time and wolves would be attacking an NPC 10 miles away from you.

MUDs would be a good way to see what the persistence requires from ram. Not that much

TheWretched · Sep 7, 2011

I think you intentionally drive my point ad absurdum.

I am not talking of a fully realized city like you say here. But that your actions do indeed have an impact, as I described above. And though it is correct that it is just "tens/hundreds of bytes of ram per character/object", it's also a question of how many objects". But those can easily lead to several hundrets or thousands, or even millions of objects in your vincinity. Just look at something like Red Faction and imagine the distruction to be not just that, but that the stuff you do destroy stays there and produces particles and objects. Destroying a house doesn't mean it breaks into 3 big pieces, but probably millions of those, all which need physics. And these particles musn't vanish either, because it just looks stupid in most games today.

manux · Sep 7, 2011

TheWretched said:
I think you intentionally drive my point ad absurdum.

I am not talking of a fully realized city like you say here. But that your actions do indeed have an impact, as I described above. And though it is correct that it is just "tens/hundreds of bytes of ram per character/object", it's also a question of how many objects". But those can easily lead to several hundrets or thousands, or even millions of objects in your vincinity. Just look at something like Red Faction and imagine the distruction to be not just that, but that the stuff you do destroy stays there and produces particles and objects. Destroying a house doesn't mean it breaks into 3 big pieces, but probably millions of those, all which need physics. And these particles musn't vanish either, because it just looks stupid in most games today.

Which all doesn't consume memory that much more but puts considerably more strain on CPU to simulate world and GPU to draw it. if you have a 100k polygon model for house and split it to ten pieces of 10k polygons each it's still 100k polygons alltogether. But to achieve splitting you need substantial amount of cpu power(physics) and more gpu power as you have more objects to feed to gpu.

I really don't think more dynamic worlds and more persistent worlds is a memory limitation. To me that really is a computing power limitation. It also might be a lot dependent on what kind of games masses buy and where devs/publishers therefor want to put their money in.

Arwin · Sep 7, 2011

I actually think it may even be a QA problem.

Heinrich4 · Sep 7, 2011

fehu said:
still it's running on the 256MB of the ps3

Yeah but only with 4k texture versus 16K pc version.

Are you absolutely sure that the ps3 uses only 256MB?

I don't want to doubt his information, but you have a link to support that?

Heinrich4 · Sep 7, 2011

liolio said:
Actually that the best argument I read in favor of 4GB of RAM

So 8 cores sounds like a good bet, assuming Charlie is right and MS uses a SoC straight power A2 or derivative sounds like a decent possibility.

Please lets dreaming a little bit.

It would be interesting indeed, with customization required for closed box(lesse cache,less wattage,witdraw things on pipeline/die not used in game console etc) would you prefer 8 A2 or 4 cores POWER7 like?

assurdum · Sep 8, 2011

Heinrich4 said:
Yeah but only with 4k texture versus 16K pc version.

Are you absolutely sure that the ps3 uses only 256MB?

I don't want to doubt his information, but you have a link to support that?

I''ll be not so surprise after the cryengine 3 surprise,,, by the way, Carmack has claimed to have understood how to use the split RAM on the ps3, so it would sounds pretty bizzarre. Is there any writer report about the last Q& A of carmack?

patsu · Sep 8, 2011

Laa-Yosh said:
There were some mentions of texture pop-in, but no further word about the circumstances, like if it was a full HDD install on a X360 or something less capable, or maybe a fast PC. Which is why I'd prefer to wait with any judgment at least until all versions are released and DigitalFoundry can do a proper analysis.

Arwin said:
I think I read Carmack himself that it was hard to eliminate pop-in on HDD-less, and then the journalist mentioned the HDD enabled systems also showed some pop-in, though less. We'll see what the final version does.

In Carmack's keynote, he mentioned that on hindsight, he would like to try another streaming approach (for PS3's BR + HDD config). Instead of loading the low quality textures first, he wanted to load the desired quality right away to avoid the pop-in. Too bad they ran out of time. Would be an interesting experiment.

jlippo · Sep 8, 2011

I wonder if rendering feedback buffer as a cubemap would work.
It would use more memory for cache, but it should reduce loading spikes when turning around.

Rolf N · Sep 8, 2011

Heinrich4 said:
Yeah but only with 4k texture versus 16K pc version.

Are you absolutely sure that the ps3 uses only 256MB?

I don't want to doubt his information, but you have a link to support that?

We're all aware that the PS3 has 512MB total RAM, in split pools, right?

Heinrich4 · Sep 9, 2011

Rolf N said:
We're all aware that the PS3 has 512MB total RAM, in split pools, right?

Let's relax a little ....

No way ... the ps3 is formed by 1024Qbit processor capable of delivering unimaginable capacity photonic technobable ....

Of course it is ...

We formely the hypothesis the producer/developer has used only 256MB (out of 512 clearly shared with several other procedures) for texture and doing a specific job with streaming from Drive/midia to HDD(all ps3 have one). What do you think?

Laa-Yosh · Sep 9, 2011

I don't think we can determine how Rage uses the two main memory banks on the PS3. But I'd say it's a safe bet that the VRAM is too big to store the various render targets and the texture used for rendering - so at least some cache related data is probably in there as well.

Edit: also, the fact that Rage probably has to use both main RAM and VRAM to cache megatexture data is most likely one of the main reasons why Carmack has some problems with the architecture, it must be a crazy job to manage two memory pools in the same kind of task...

liolio · Sep 9, 2011

Let's'change the beat a little beat. This HDD was already standard on the ps3, is it impossible that MS will follow suit next gen?
We knows that the WiiU won't, a standard HDD on both Sony and MS systems would set quiet a difference between those products, it will impact what games can do and what the OS and the system as whole can do.
We see this gen mandatory instal on the ps3 (not full game) on the 360 (full game). It would also be interesting to see the stats for the 360 about how many games get installed as a choice on the 360. I installed every single game I play, I uninstalled only due to storage constrain. Don't you believe (all) that at this point a clear policy on the matter would be the way to go?
I envision something more clever than what MS did with the 360, how about only installing data that will require the kind of ~fast access a HDD provides? Clearly I see no point in wasting storage for things as music and cgi parts.
It may not be that much of a save but if instal is std it will still helps with storage issues and slightly shortens the instals time.

It's a bit like a wider bus 256bits vs an extra chip of edram, etc. At some point I believe that streamline the whole thing is the way to go. Storage is critical to on-line content and overall business model go for it, instead of considering multiple SKU, Flash only, flash + HDD, etc. Ain't free for sure so spend less elsewhere if rumor about a SoC for MS is true they already cutting elsewhere (silicon budget cooling system among other things).

Andrew Lauritzen · Sep 9, 2011

hoho said:
The physical memory percentage at the bottom of task manager is pretty meaningless as well.

Too true. People take a look at the little task manager graph and figure they understand RAM in terms of filling a bottle with a limited capacity. That's not how it works at all on PC.

You really need to dive into the performance monitor with a good knowledge of what some of the memory metrics mean to get a reasonable idea of the working set. More realistically to diagnose problems due to too little RAM you need to look at the reactive metrics like page faults.

Shifty Geezer · Sep 10, 2011

I did not know this.

sebbbi · Sep 10, 2011

Heinrich4 said:
Yeah but only with 4k texture versus 16K pc version.
Are you absolutely sure that the ps3 uses only 256MB?
I don't want to doubt his information, but you have a link to support that?

jlippo said:
I wonder if rendering feedback buffer as a cubemap would work.
It would use more memory for cache, but it should reduce loading spikes when turning around.

Laa-Yosh said:
I don't think we can determine how Rage uses the two main memory banks on the PS3. But I'd say it's a safe bet that the VRAM is too big to store the various render targets and the texture used for rendering - so at least some cache related data is probably in there as well.

Edit: also, the fact that Rage probably has to use both main RAM and VRAM to cache megatexture data is most likely one of the main reasons why Carmack has some problems with the architecture, it must be a crazy job to manage two memory pools in the same kind of task...

It seems that there are many misconceptions about how virtual texturing actually works...

I have programmed the virtual texturing system we use in our next Xbox 360 game. I have pretty deep knowledge about all the gritty fine details of this particular streaming technology. According to the released technical information about id software's system, our system is very similar to theirs. Lionhead's virtual texturing system also seems to be very similar with our and id's systems. So the information I am going to post is pretty general, and should describe id's and Lionhead's systems also pretty well.

The basic idea about virtual texturing (or basically any fine grained on demand texture streaming) is that you only need a single texel of texture data to draw a single pixel in the screen (if filtering is not counted). So if you had the most optimal streaming system, you would only need a single 1280x720 resolution texture (1) in memory, and nothing more.

Assuming console gaming at 720p. 1280x720 = 921k pixels, so the screen actually has slightly less pixels than a single 1024x1024 texture (=1048k pixels). Console games without real time streaming tend to use mainly 256x256 and 512x512 textures for each single object in the game world. Some key objects (such as the main player characters) might use a 1024x1024 texture. To keep hundreds of textures in memory at all times is a huge waste of memory. 200 MB of memory wasted just for textures sounds like a lot, if you could theoretically manage with a single 1024k texture (less than 4 megabytes depending on texture format).

The ideal case of keeping just a single texel per pixel in screen is of course not possible, since loading single pixels stored around the DVD/HDD would be really slow. Spinning media has very long seek times compared to solid media (memory chips or flash based devices). All the three virtual texture systems combat this by loading 128x128 pixel tiles instead of individual pixels (16384 pixels are loaded at once). When a pixel is required, the whole tile containing it is loaded to the tile cache. Usually textures are mapped quite regularly around the objects, so if you need a certain pixel, you very likely also need the pixels around it. Also scenes tend to animate in a way that surfaces near the surfaces that used to be visible become visible in the next frame. Of course there are geometry boundaries that make some loaded tile contents partially map areas that are not currently visible. The bigger the tiles are the more pixels in pixel cache get "wasted" this way. 128x128 tile size seems to a really good compromise between wasted memory and storage device seek latency (since all three virtual textured engines that I know use the same tile size).

We and id sofware (on consoles) use a 4096x4096 texture as the tile cache. Lionhead uses a 4096x2048 texture as their tile cache. You can fit 1024 of 128x128 tiles in a 4096x4096 tile cache (it's basically a simple texture atlas). All the tiles used to texture the currently visible scene need to be loaded to the tile cache, since the tile cache represents only the visible part of the whole (huge) virtual texture. In our system we have measured that in a common scene, there's usually around 200 to 300 unique tiles visible at once. So in average around 1/4 of the cache is used to render the current visible scene. This translates roughly to the pixel count of a single 2048x2048 texture. Compared to the theoretical minimum of having just a single texel per pixel in memory, our system requires around four times as much texture pixels. This is really good since a 128x128 tile contains 16384 pixels (all must be loaded at once even if only a single one is needed), and in reality textures are also sampled in fractional positions (bilinear needs four texel samples, and trilinear needs eight, four from two mip levels).

The 4096x4096 tile cache is enough to render a scene (at 720p) with as much texture detail as you want, since the texture detail is irrelevant to virtual texturing (assuming reasonable uv mapping of course). You only need a single texel per screen pixel + "wasted" area from the 128x128 tiles in memory to draw the scene, no matter how detailed textures the scene contains. For higher resolutions than 720p you of course need a larger tile cache. Id sofware has stated that they are using a 8192x8192 texture for their tile cache in PC version of Rage. 1080p would require a tile cache of 2.25x and 2560x1600 4.44x size compared to 720p. The required tile cache size scales linearly to screen resolution (actually slightly sub linearly, since the tiles become smaller in proportion to the screen resolution and this means slightly less wasted pixels). So it's completely natural that id software uses a larger tile cache on PC, since PC gamers tend to play at higher resolutions. A 8192x8192 tile cache on consoles (720p) would not improve the texture detail at all. However the (4x) larger tile cache would reduce the data streaming from the game media (but not drastically, since increasing cache size usually gives only logarithmic gains).

On Xbox 360, our system uses a combination of texture formats that make our material 2.5 bytes per pixel (2xBC3+BC4). As our tile cache is 4096x4096 pixels, the total amount of texture (material) data we have in memory is always 40 megabytes. In addition to the tile cache, our system has an single 2048x2048 indirection texture with a full mip chain (16 bits per pixel, 5551 format). Indirection texture is 10 megabytes, and is used by the GPU to do a fast lookup to find the proper tile in the tile cache (based on texture coordinates and mip level). Also we have eight loader buffers of 128x128 pixels (2.5 bytes per pixel = 320 KB total) for background data loading on CPU (we have a background CPU thread doing our texture loading all the time). So in total the system takes around 50 megabytes of GPU memory, and less than a megabyte of CPU memory. I don't see any problems in a system like this on PS3, since the loader buffers are stored in the CPU memory, and all new tiles could be simply copied to the GPU tile cache at start of each frame. Also the 256 megabyte size of the both memory pools is not in any way a problem (actually virtual texturing makes it much easier to live with smaller memory).

Lets analyze the rotating camera "problem" a bit. For simplicity, lets assume a 90 degree field of view and a full 360 degree turn. When you turn around you see 360/90 = 4 completely different views that do not share any surfaces with each other. With the perfect streaming system, you would only need to load four 720p images in the time of the full turn. Virtual texturing systems use similar texturing compression systems than the most advanced video codecs (jpeg2k, ptc, etc), so it's not hard to believe that you should be easily able to stream four frames of full screen compressed image data in the time that the fastest thumbstick turn takes. Earlier I stated that on average it takes 1/4 of our tile cache (250 tiles out of 1024) to store the currently visible surface tiles. In the average case you could keep rotating around with no HDD activity at all, since our cache would be large enough to fit all the data of the four 90 degree views. However is we have a stressful situation (complex scene with lots of overlapping narrow geometry) around 1/3 of the cache could be needed to store a single view. In this situation, the system starts to constantly stream data when the camera is rotated. Now if we assume that a single view is around 340 tiles (1/3 of the cache), and we see approximately four full views during the rotation, we would need to load 1360 tiles during the rotation (assuming our cache doesn't help a bit). If we assume a compressed tile size of 10 KB (in HDD), the total bandwidth required during the turn becomes 13.6 MB. A 360 thumbstick turn on a console shooter could take 2 seconds, so we need 6.8 MB/s transfer rate (assuming no seek penalties = a wrong assumption obviously). Slow 5400 RPM notebook hard drives have around 100 MB/s sustained transfer rates and DVD drives have 20+ MB/s, so transfer rate is clearly not a possible bottleneck in any scenario. Seek latency however can be a big bottleneck on DVDs, since a worst case seek can take 100ms. So the engines use different methods to reduce seeking. Id hasn't told much about their methods, but they surely have implemented some state of the art methods implemented to optimize the virtual texture page ordering in their DVD images. Lionhead's tech papers reveal that they are storing nearby objects and terrain close each other in the virtual texture (maximizing the tile usage in lower mips) and they group four 128x128 tiles in a single continuous compressed region in disc (and have a small additional 2x2 macrotile loader cache). We also try to keep similar objects nearby in the virtual texture and optimize our loads in many ways, but as our game is guaranteed to be installed to HDD or memory units (flash based memory) our seeking problems (and solutions) are limited compared to these two other disc based games.

A six way render (cubemap) could be used to sense tiles that are required in near future, but this only helps in the camera rotation case, and would waste a lot of cache space and bandwidth loading stuff that might be never used (players usually move forward a lot more than backward). A better system renders the tile query using a slightly wider field of view and using an approximated future camera. We calculate an approximation based on the current camera speed (and acceleration) to determine where it would be in 6 frames of time. We jitter the tile query camera also a bit from frame to frame to be sure to hit all narrow objects and to have slight extra data in the cache for sudden unpredictable movements. Lionhead's system also has some prediction logic and they do periodic randomly rotated queries. Id hasn't revealed inner workings of their tile queries yet, but I am sure they are using some sophisticated prediction methods to determine the data they need ahead of time.

I personally think the biggest problem in virtual texturing is the currently popular disc DVD/BR based storage that has awful seek times. As digital distribution becomes more popular these problems are slowly fading away. Hard drives are fast enough to load everything before you can notice the missing detail, and new flash based (SSD) memories make virtual texturing even better (super low seek times). Fortunately flash memory based devices have become really popular recently (ultraportables, tablets, smartphones), but in the other hand we have the cloud, and network based virtual texture streaming sounds like a really great idea. As 720p requires only around 6.7 MB/s to stream all required texture data, even current network connections have more than enough bandwidth for streaming... however hiding the constant 200ms network latency would require lot of additional research.

---

(1) By a "single texture" I actually mean a single material = a color map and normal map with same unwrapped texture coordinates. Many engines also pack some material properties in the texture channels (specular properties, etc). When I am talking about a visible color texture, I always mean a material with all the needed texture layers.

Shifty Geezer · Sep 10, 2011

Superb post! Thanks muchly for taking the time for that. A common requrest and expectation for next-gen is inbuilt flash as a drive cache, and it should enable best-case virtual texturing.

I have to say, having you spell the numbers out like that, it makes it so very clear how grossly inefficient massive textures are! It even suggests a 2GB next-gen console won't be such a bad option if virtual texturing does as well as we could hope.

spin off RAM & Cache Streaming implications

manux

manux

TheWretched

manux

TheWretched

manux

Arwin

Now Officially a Top 10 Poster

Heinrich4

Heinrich4

assurdum

patsu

jlippo

Rolf N

Recurring Membmare

Heinrich4

Laa-Yosh

I can has custom title?

liolio

Aquoiboniste

Andrew Lauritzen

Moderator

Shifty Geezer

uber-Troll!

sebbbi

Shifty Geezer

uber-Troll!

Similar threads

*spin off* RAM & Cache Streaming implications

Now Officially a Top 10 Poster

Recurring Membmare

I can has custom title?

Aquoiboniste

Moderator

uber-Troll!

uber-Troll!

Similar threads

spin off RAM & Cache Streaming implications