John Carmack: Doom 4 will support Partially Resident Textures (Tiled Resources)

Cyan

orange
Legend
Supporter
John Carmack has said that he will implement Partially Resident Textures in Doom 4. Both the PlayStation 4 and Xbox One support this technology via hardware.

This technology is expected to feature in most next gen engines, which is a biggy, taking into account most game designers will be more familiar with AMD tech now, and it will help to improve the performance of AMD cards on the PC, so it's good for both the consoles and the PC.

Some of the advantages of the technology is that texture Sizes can be up to 32 TB :oops: now, that fast small pools of RAM like the 32MB of eSram of the Xbox One can store up to 6GB worth of textures alone ( http://www.giantbomb.com/forums/xbox-one-8450/x1-esram-dx-11-2-from-32mb-to-6gb-worth-of-texture-1448545/ ), and can be defined as some ultra high resolution streaming technology for the PS4 and Xbox One.

This means massive textures without the use of massive amounts of memory, which is going to help both consoles tremendously in the long run.

BPQHo.jpg


@ID_AA_Carmack Are you going to add support in Rage for the 7970's hardware PRT Mega texturing anytime soon?

@david20player 7970 hardware virtual textures are limited to 32k, so it requires layout changes to avoid tex crossing. Doom will support


https://twitter.com/ID_AA_Carmack/status/157512179749371907
 
I think the problem will be streaming the data from the HDD/ODD fast enough? Rage had the well known texture morphing problem during fast camera pans.
 
I think the problem will be streaming the data from the HDD/ODD fast enough? Rage had the well known texture morphing problem during fast camera pans.
The thing is that if you're using a very low amount of memory to store those textures you could have a lot of free memory -RAM- to utilise it as some kind of temporal storage without having to *touch* the HDD or disc at all, which is a significant advantage.
 
So everybody has not read sebbbi's figures about virtual texturing. Some good soul should put links to his most enlightening posts on the matter here, just to save some time.
 
Some of the advantages of the technology is that texture Sizes can be up to 32 TB :oops: now, that fast small pools of RAM like the 32MB of eSram of the Xbox One can store up to 6GB worth of textures alone
No. It seems every time you link to an article or post, it's got some very wrong conclusion. Tiled resources allows you to address more texture space than you can fit in RAM. It does not allow you to fit more textures in RAM. There's no correlation between RAM size and virtual texture resolution - the limiting factor is bandwidth (RAM and storage). There's no sane logic that fits the way the technology works that turns 32 MBs ESRAM into 6 GBs of texture data. 32 MBs ESRAM, or eDRAM, or RAM, can fit 32 MBs of textures to be accessed by the GPU at any given moment. The system can swap out those textures for new ones so the texture data isn't static, but capacity is unchanged.

As for eSRAM's ultra fast BW being useful, the limiting factor will be HDD access speed. Or, if you've cached a load of tiles in RAM, the bottleneck for tiled textures will be DDR3 > ESRAM, or rather DDR3 > GPU because you wouldn't waste time and BW copying textures to eSRAM to read them again in the GPU, meaning a 68 GB/s cap to texture bandwidth. Which doesn't matter because you don't need that much with tiled resources! You only load and use the tiles you need to fit the pixels on screen.

I heartily recommend in future that you believe any article or forum post you find on the 'net without running it by B3D first to learn if it's legitimate or hogswash. ;)
 
As for eSRAM's ultra fast BW being useful, the limiting factor will be HDD access speed. Or, if you've cached a load of tiles in RAM, the bottleneck for tiled textures will be DDR3 > ESRAM, or rather DDR3 > GPU because you wouldn't waste time and BW copying textures to eSRAM to read them again in the GPU, meaning a 68 GB/s cap to texture bandwidth. Which doesn't matter because you don't need that much with tiled resources! You only load and use the tiles you need to fit the pixels on screen.

So if I understand what you're saying, the use of the PRT technology means that 68GB/sec is ample bandwidth for reading textures into the GPU? Even at 60 fps and with other tasks contending for the bus you could still comfortably load 500MB of textures a frame into the GPU? And that seems a huge amount if you only need enough texture tiles to cover the pixels on the screen...a few MB?
So 68GB/s ddr3 bandwidth is ample (not going to be a bottleneck) and the real bandwidth requirement is to the ESRAM as the GPU does it's thing? And there's bucket loads of that, with low latency to boot.
 
As for eSRAM's ultra fast BW being useful, the limiting factor will be HDD access speed. Or, if you've cached a load of tiles in RAM, the bottleneck for tiled textures will be DDR3 > ESRAM, or rather DDR3 > GPU because you wouldn't waste time and BW copying textures to eSRAM to read them again in the GPU, meaning a 68 GB/s cap to texture bandwidth. Which doesn't matter because you don't need that much with tiled resources! You only load and use the tiles you need to fit the pixels on screen.
Now I think that bolded part would not necessarily be true. Considering that for typical scenes, the displayed picture doesn't change much during consecutive frames, it could be well worth having a small PRT "cache" in the ESRAM. Hence, it would be more efficient to have fast and low-latency access to the unchanged parts from the last frame in the ESRAM, while only loading the missing parts from external memory whenever they need to be updated.
I haven't played with PRT programming myself yet, but I guess that having it in ESRAM could be beneficial if you don't want to stall your pipeline with many external lookups from the DRAM on each and every frame.
 
So if I understand what you're saying, the use of the PRT technology means that 68GB/sec is ample bandwidth for reading textures into the GPU? Even at 60 fps and with other tasks contending for the bus you could still comfortably load 500MB of textures a frame into the GPU?
That's not the right way to think of it. There are ~2 million pixels on screen in a 1080p image. That requires 2 million individual texture samples (excluding AA) for a total of 6 megabytes a frame (no transparency, 24 bit colour, keeping things simple), 360 MB/s required bandwidth to draw every pixel. However, we are unable to access the textures on a per pixel level granularity, so we have to load in tiles which conveniently span multiple pixels. Theoretically, 360 MB/s is all it would take to texture a 1080p screen in perfect, per pixel fidelity. How a game actually performs depends on the textures and engines, and I won't hazard a guess what the real world in-game BW consumption on average is for a virtually textured game. Sebbbi probably covered it in his insightful post on the subject. The amount of textures you can access depends on how quickly you can load tiles. You can have a 32 TB texture map for the world and only need, I dunno, 1 GB/s to texture via PRT. What we won't need is every texture to be transferred in full to the GPU - we won't need hundreds of 1k and 2k textures streaming across the bus.

Now I think that bolded part would not necessarily be true. Considering that for typical scenes, the displayed picture doesn't change much during consecutive frames, it could be well worth having a small PRT "cache" in the ESRAM. Hence, it would be more efficient to have fast and low-latency access to the unchanged parts from the last frame in the ESRAM, while only loading the missing parts from external memory whenever they need to be updated.
I haven't played with PRT programming myself yet, but I guess that having it in ESRAM could be beneficial if you don't want to stall your pipeline with many external lookups from the DRAM on each and every frame.
If you fill the ESRAM up with textures, you won't have any room for anything else like render buffers and so be capped at 68 GB/s for all your FB ops. ;) The ESRAM is there for read/write bandwidth and it'll be used as such. Conceptually one could cache frequently used tiles there, but given the BW requirement is so low for PRT, I don't see the advantage. Passes using the textures will be few and far between relative to total workload, so low latency wouldn't gain you much benefit at all.
 
No. It seems every time you link to an article or post, it's got some very wrong conclusion. Tiled resources allows you to address more texture space than you can fit in RAM. It does not allow you to fit more textures in RAM. There's no correlation between RAM size and virtual texture resolution - the limiting factor is bandwidth (RAM and storage). There's no sane logic that fits the way the technology works that turns 32 MBs ESRAM into 6 GBs of texture data. 32 MBs ESRAM, or eDRAM, or RAM, can fit 32 MBs of textures to be accessed by the GPU at any given moment. The system can swap out those textures for new ones so the texture data isn't static, but capacity is unchanged.

As for eSRAM's ultra fast BW being useful, the limiting factor will be HDD access speed. Or, if you've cached a load of tiles in RAM, the bottleneck for tiled textures will be DDR3 > ESRAM, or rather DDR3 > GPU because you wouldn't waste time and BW copying textures to eSRAM to read them again in the GPU, meaning a 68 GB/s cap to texture bandwidth. Which doesn't matter because you don't need that much with tiled resources! You only load and use the tiles you need to fit the pixels on screen.

I heartily recommend in future that you believe any article or forum post you find on the 'net without running it by B3D first to learn if it's legitimate or hogswash. ;)
First of all, thanks for the clarifications. Ok, I shall take your advice from now on. Some of the info is just too juicy to ignore though. It's not that easy to discern what constitute legitimate or hogswash when you find great articles on a particular subject.

I think it is going to be an essential feature on the PS4 and Xbox One to run such attractive games, not to mention it is also a highly-touted feature in Windows 8.1. I think you can achieve unprecedented levels of detail with it, as shown in some games using the new iD engine.

As I understand it, it basically works like some kind of culling, because the code instructs your console to render the fully detailed textures in the areas that your character is focusing on, while at the same time it sheds detail in other areas of the game that you can't immediately see. When you move around you need to have a very fast pool of memory or cache to have most detail at all times, it is pretty fascinating stuff if you ask me.

As for eSRAM's ultra fast BW being useful, the limiting factor will be HDD access speed. Or, if you've cached a load of tiles in RAM, the bottleneck for tiled textures will be DDR3 > ESRAM, or rather DDR3 > GPU because you wouldn't waste time and BW copying textures to eSRAM to read them again in the GPU, meaning a 68 GB/s cap to texture bandwidth. Which doesn't matter because you don't need that much with tiled resources! You only load and use the tiles you need to fit the pixels on screen.
That part is what I don't understand about this technology.

Do you mean that you store -let's say- a very large 4GB texture, for example, in the main RAM to be accessible at all times and divide it in very small tiles which can fit small pools of RAM or save memory usage?

...or simply that you have ....say a 32TB texture, and it gets cached from the disk to the main RAM just utilising the parts or tiles that it needs at any given time?

I am not trying to be flippant here, but in order to have that incredible and massive amount of textures in memory first you need some kind of massive storage for such a large texture. I mean, if you want to use a textures that weighs 32TB of data, you need a physical storage or RAM to store it fully first, right?

Somehow, I just don't get this part regardless of how creative I want to be thinking of the possibilities.
 
...or simply that you have ....say a 32TB texture, and it gets cached from the disk to the main RAM just utilising the parts or tiles that it needs at any given time?
This. Take for example a texture for a person, on a high resolution 4096 by 4096 texture. Conventional texturing would store that entire texture in RAM. PRT would chop it into little tiles and only load the tiles you're seeing. So if you are looking at the character from the front and only their top half, only the tiles including the face and front torso texture would load, and the back of the head and body and leg textures wouldn't be loaded. As you then change viewpoint, those other tiles have to be loaded. If they are on HDD, you need to load them from there which can cause pop-in (see RAGE). If there is plenty of RAM, you can cache them there. RAM is plenty fast enough to serve up as many tiles as needed given the limits rendering resolution imposes, so as long as all the tiles are cached, you effectively get 'perfect' texturing.

Also PRT as a concept isn't limited to textures. Lionhead Studios has been experimenting on the same concept for meshes, so you can have higher resolution models without needing to store them in RAM. Likewise, volumes can be stored as tiled data for lighting, allowing for SVO lighting to fit in RAM without massive overheads. PRT is a significant optimisation over the brute-force methods of storing everything you are using in RAM.

I am not trying to be flippant here, but in order to have that incredible and massive amount of textures in memory first you need some kind of massive storage for such a large texture. I mean, if you want to use a textures that weighs 32TB of data, you need a physical storage or RAM to store it fully first, right?
Indeed! Although the 32 TB figure quoted would be uncompressed data in its raw form. That'd get compressed down to whatever size (which would reduce some of the texture clarity, so I guess we won't quite get perfect textures just yet ;)).
 
This. Take for example a texture for a person, on a high resolution 4096 by 4096 texture. Conventional texturing would store that entire texture in RAM. PRT would chop it into little tiles and only load the tiles you're seeing. So if you are looking at the character from the front and only their top half, only the tiles including the face and front torso texture would load, and the back of the head and body and leg textures wouldn't be loaded. As you then change viewpoint, those other tiles have to be loaded. If they are on HDD, you need to load them from there which can cause pop-in (see RAGE). If there is plenty of RAM, you can cache them there. RAM is plenty fast enough to serve up as many tiles as needed given the limits rendering resolution imposes, so as long as all the tiles are cached, you effectively get 'perfect' texturing.

Also PRT as a concept isn't limited to textures. Lionhead Studios has been experimenting on the same concept for meshes, so you can have higher resolution models without needing to store them in RAM. Likewise, volumes can be stored as tiled data for lighting, allowing for SVO lighting to fit in RAM without massive overheads. PRT is a significant optimisation over the brute-force methods of storing everything you are using in RAM.

Indeed! Although the 32 TB figure quoted would be uncompressed data in its raw form. That'd get compressed down to whatever size (which would reduce some of the texture clarity, so I guess we won't quite get perfect textures just yet ;)).
Once again, thanks for your enlightening reply to my post, it was very informative, and thanks for all your candid and thoughtful explanations and comments overall, and politely suggesting me some things.

I guess from your posts that developers need to work out the details on how it functions, but I get from reading you that let's say we want to draw a character on screen, so we load the parts of the superb large texture on RAM involved in the scene and we ask the game to draw the eyes, the nose, chin, etc etc, so we have a lookup table and the texture is coded in a very special and unique way where every tile has an unique name.

So let's say we want to draw both eyes, so we tell the gaming console to load the tile named "Left eye" -from x pixel to Y pixel within the massive texture-, the right eye -from X pixel to Y pixel- and the "nose" tile, etc etc. It sounds as if you were cropping a picture in an image editor --excuse me if I am wrong.

This is a very exciting technology, I think, and Carmack has always been ahead of its time and really smart. I'd like to play Rage someday but it seems like a game which is certainly more suited for the PS4 and XB1 technology tbh. The console technology wasn't ready then.

Cheers Shifty.
 
I guess from your posts that developers need to work out the details on how it functions, but I get from reading you that let's say we want to draw a character on screen, so we load the parts of the superb large texture on RAM involved in the scene and we ask the game to draw the eyes, the nose, chin, etc etc, so we have a lookup table and the texture is coded in a very special and unique way where every tile has an unique name.
Yep.
 
If you fill the ESRAM up with textures, you won't have any room for anything else like render buffers and so be capped at 68 GB/s for all your FB ops. ;) The ESRAM is there for read/write bandwidth and it'll be used as such. Conceptually one could cache frequently used tiles there, but given the BW requirement is so low for PRT, I don't see the advantage. Passes using the textures will be few and far between relative to total workload, so low latency wouldn't gain you much benefit at all.

I can't help feeling that eVolvE might be onto something. With virtual texturing there can be a lot of baking of decals and transcoding. By making sure that all writes (and reads in the case of baking transparent decals to the tile buffer) are done using the esram I bet you could improve performance a great deal - especially if you're using trilinear + high aniso during rendering.

With more power at your disposal you could even do expensive per-texel lighting on the tiles. Despite having to light texels that wouldn't necessarily be used in each frame, re-usability could possibly be a net win. You could almost start to treat the tile cache as an intermediate buffer and look at the cost that way. And the cost of copying out your tile cache if you wanted to use esram for something else would be tiny relative to the available BW.
 
That's not the right way to think of it. There are ~2 million pixels on screen in a 1080p image. That requires 2 million individual texture samples (excluding AA) for a total of 6 megabytes a frame (no transparency, 24 bit colour, keeping things simple), 360 MB/s required bandwidth to draw every pixel. However, we are unable to access the textures on a per pixel level granularity, so we have to load in tiles which conveniently span multiple pixels. Theoretically, 360 MB/s is all it would take to texture a 1080p screen in perfect, per pixel fidelity. How a game actually performs depends on the textures and engines, and I won't hazard a guess what the real world in-game BW consumption on average is for a virtually textured game. Sebbbi probably covered it in his insightful post on the subject. The amount of textures you can access depends on how quickly you can load tiles. You can have a 32 TB texture map for the world and only need, I dunno, 1 GB/s to texture via PRT. What we won't need is every texture to be transferred in full to the GPU - we won't need hundreds of 1k and 2k textures streaming across the bus.

If you fill the ESRAM up with textures, you won't have any room for anything else like render buffers and so be capped at 68 GB/s for all your FB ops. ;) The ESRAM is there for read/write bandwidth and it'll be used as such. Conceptually one could cache frequently used tiles there, but given the BW requirement is so low for PRT, I don't see the advantage. Passes using the textures will be few and far between relative to total workload, so low latency wouldn't gain you much benefit at all.

Ok, so I think the question I was really asking was "if PRT has low ddr3 bandwidth requirements, where is the bottleneck going to be?"

For work I run large mission critical Oracle databases, on big tin. Performance (Eg end user response time) is always bottlenecked somewhere, if the sql is optimised, I like this to be the cpu. Therefore if the clock speed of the cpu is increased, the end user gets their data faster (assuming the bottleneck doesn't move elsewhere...).

So with the x1's architecture, using PRT, where do you (or anyone else for that matter!) See the bottleneck being? To my untrained eye it's looking like it will be keeping the various parts of the GPU supplied with data, which if the ESRAM is employed correctly, will be mostly serviced from there. Which makes this very high bandwidth, low latency ram the bottleneck...which seems to me to be a "spot on" design decision. Or am I completely up the pole?
 
I can't help feeling that eVolvE might be onto something. With virtual texturing there can be a lot of baking of decals and transcoding. By making sure that all writes (and reads in the case of baking transparent decals to the tile buffer) are done using the esram I bet you could improve performance a great deal - especially if you're using trilinear + high aniso during rendering.
Are you thinking of allocating a portion of ESRAM to dedicated texture cache, or writing and reading in tiles per frame?

Ok, so I think the question I was really asking was "if PRT has low ddr3 bandwidth requirements, where is the bottleneck going to be?"
Depends on the game. ;) We'll also still have simple textures in addition to PRT textures, so there will be BW demands for texturing as well as everything else. PRT is definitely a big win though, and for all platforms going forwards - PC and PS4. We don't know what advantage if any XB1 has regards PRT versus PS4.
 
Depends on the game. ;)

Ha! Yes it always depends on the software...all devs were not created equal after all! But if MS have targeted bits of their architecture towards getting a lot of the grunt work involved in PRT done in hardware (move engines - tile/un-tile, esram etc) to the point where it effectively nullifies the PS4 on paper spec advantage then hats off to them. Judging by comments they have made "there's no way we are giving up a 30% performance difference to the ps4" and "games look the same if not better blah blah blah", they think they have achieved it.
 
This isn't a versus thread. Also GCN supports DMA and tiled memory access (AFAIK), so it's not like XB1 has PRT and PS4 hasn't. XB1 may have a PRT advantage, which is a discussion in itself and one which hasn't born fruit. Don't jump to conclusions about what box can do what, though.
 
Are you thinking of allocating a portion of ESRAM to dedicated texture cache, or writing and reading in tiles per frame?

I was thinking of a dedicated, fixed size texture tile cache in esram (for either all tiles immediately required or some constant proportion of them) with the possibility of having a "none render" based period of the update cycle for compute when you could transfer them all out.

I agree that dedicating a portion of esram to textures seems expensive, but with the number of reads and writes that could possibly be involved on a relatively small set of data it just seems like it might be a good fit. And with the news today from Digital Foundry that 1Bone can split a single render target over both esram and ddr, maybe you could look on BW saved by locating tiles permanently in esram as freeing up DDR bandwidth for partial render targets ...? Not to mention that it could (should?) speed up GPU modification and management of the tile cache.
 
Function said:
I agree that dedicating a portion of esram to textures seems expensive, but with the number of reads and writes that could possibly be involved on a relatively small set of data it just seems like it might be a good fit.

So could the texturing benefit from the eSRAM, considered that rendering graphics involves a predictable memory access pattern and the DDR bw is high enough to keep up with the required (compressed) texel rates? Or is the XB1's DDR bw far from optimal for this virtual tiling thing?

I think Shifty had a good point mentioning to skip the SRAM, though I could be misunderstanding the complete thing totally ofcourse:)
 
Back
Top