Kristof's comments on render to texture

OpenGL guy

Veteran
Kristof said:
There is quite a good reason why rendered surfaces are slower to render from that regular uploaded textures. When you upload a texture its not just uploaded, the data is re-ordered aka twiddled/swizzled/whatever, this data re-ordering improves the cache hit ratio and make bilinear/trilinear operations pretty much guaranteed to be free. When you render to a texture you do so in scaline way, you do not render twiddled. So the data order for a rendered surface is not optimal for accessing as a texture
You are assuming that the memory tiling for textures are different from the tiling for backbuffers. This doesn't have to be the case, and your own argument is a good reason why :)

Chips with tiled memory, in my experience, don't render in a "scanline" way: That would defeat the purpose of having tiled memory. The rendering is done tile by tile in order to get the most memory bandwidth.

Just wanted to clear that up.
 
I was just going to add something that I had noticed in the comments made in the previous thread before it was closed...

The question of the shadows was for the objects at the side of the road. AFAIK, render-to-textures shadows are used for the moving objects only. I'm not 100% certain of this (the text reads "dynamic shadows for non-static objects only") but if the shadows of the scenery were produced in some other way then perhaps R-t-T is not the cause for the attention.
 
OpenGL guy said:
Chips with tiled memory, in my experience, don't render in a "scanline" way: That would defeat the purpose of having tiled memory. The rendering is done tile by tile in order to get the most memory bandwidth.

Just wanted to clear that up.

There is still a potential difference between scanline->tiled scanlines and twiddled order. Not that I am excluding the possibility for current or future hardware to support hardware twiddling and rendering to a twiddled format.

But generally current hardware AFAIK does not render in twiddled form and thus sees a slow down when texturing from render targets.

K-
 
Kristof said:
But generally current hardware AFAIK does not render in twiddled form and thus sees a slow down when texturing from render targets.
I wouldn't have said what I did if I didn't have strong evidence to the contrary :)
 
Kristof said:
I know but you only talk for one specific party :)
So how far back does this functionality go ? ;)
At both graphics companies I've worked at, there has been no difference between texture tiling and surface tiling. It's bit depth, not content, that determines the tile layout.
 
Kristof:

Thanks for the explanation, but why would they make the feature 'intentionally' slow? If framebuffer writes are "swizzled" or whatever when written to optimize for bandwidth, why aren't texture render targets treated the same way? They are just like a small framebuffer after all!

It seems illogical to introduce special case scenarios, especially if all it'll accomplish is to slow things down...


-FaaR-
 
There is another reason why render targets are usually slow when used as a texture. Generally they are not mipmapped which means that even though they might be swizzled on some hardware, the texture cache is still going to thrash due to lacking lower mipmap levels.

I am trying to find a twiddled data diagram for textures but can't seem to find one... sigh... I know there are out there... somewhere :)

K-
 
Kristof said:
There is another reason why render targets are usually slow when used as a texture. Generally they are not mipmapped which means that even though they might be swizzled on some hardware, the texture cache is still going to thrash due to lacking lower mipmap levels.
K-
That's why many developers also creates mip maps from their render targets. One can even use mip mapped RT without fill lower levels mipmaps to do some neat effect. (ie. render shadow maps and fill with white the lower mip map levels, if your shadow cast all the time far from the view point u see the shadow decrease its intensity going far from the observer, if the light is near the camera too. of course trilinear filtering should be activated :) )

ciao,
Marco
 
There is another reason why render targets are usually slow when used as a texture. Generally they are not mipmapped which means that even though they might be swizzled on some hardware, the texture cache is still going to thrash due to lacking lower mipmap levels.
Presumably this must depend on what the render target is being created for in the first place. For something like cube maps, surely it would make sense to have all mipmap levels possible, unless specifically told not to.
 
I'd just like to say that things like swizzling really would make it seem like a darned good idea to have a separate texture-management unit on the GPU (Well, they already need to have something like this for TC...but it would be nice to use for render-to-texture situations).

Personally, it just seems obvious to me that swizzling of render targets will just be a natural optimization that will come when such things are common. Hopefully some high-end hardware will have such optimizations earlier.

Additionally, does anybody know if AGP textures are stored in swizzled format? That also just seems like a natural optimization to make. It would also be nice if virtual AGP texturing were implemented in upcoming video cards.

One final thing. Might not rendering in swizzled format actually improve memory bandwidth efficiency while rendering? (While at the same time possibly reducing it during buffer switch or display)
 
Yes, you can create a mipmap tree when rendering, but then the question is how do you do that ? Or more specifically if the driver has support for it how does the driver/hardware do that ?

In the end :

uploaded texture = twiddled + mipmap pre-generated and uploaded once.

render target texture = twiddled (not guaranteed) + mipmap missing or generated at a cost and uploaded possibly for each frame.

MipMap generation can either be done by rendering the same scene multiple times at the various MipMap resolutions, possibly only a couple of levels. Or by downsampling the rendered topmap, similar to how AA is done. However its done its going to be an expensive operation.

K-
 
Kristof said:
MipMap generation can either be done by rendering the same scene multiple times at the various MipMap resolutions, possibly only a couple of levels. Or by downsampling the rendered topmap, similar to how AA is done. However its done its going to be an expensive operation.

K-

Rendering the same scene multiple times would be far too expensive to carry out (and the quality would probably be a fair bit lower), especially for highly complex scenes.

Downsampling the rendered top map would both carry with it automatic AA, and would be far less performance-intensive. For optimal performance, it would require dedicated hardware (Which may already exist in current hardware...), but that shouldn't be a huge deal. The main question here is, can the video card continue rendering while this dedicated hardware is working on generating the MIP maps? That is, is the rendered texture needed immediately after it is rendering, stalling further rendering until the downsampling is done?

It seems to me that if either the drivers or the software inserts a delay between when the texture is rendered and when it is needed, there needn't be much performance hit at all from generating the MIP maps.
 
Using GL_SGIS_generate_mipmap (which is supported by R100 and up and GF3 and up + maybe some others) you can automatically generate mipmaps in hardware. It's quite fast actually, haven't noticed any significant performance reduction by using it. The rendering of the level 0 mipmap tends to be much more expensive than generating the mipmaps.
This is a feature I really miss in Direct3D. No autogeneration of mipmap basically renders RTT more or less useless, nice that it's going to be there in DX9 though.
 
Kristof said:
Or by downsampling the rendered topmap, similar to how AA is done. However its done its going to be an expensive operation.
K-

Actually, in our engine is not expensive at all with several shadow maps (all contained in a single big render target). we tested it on a gf3/gf4/8500 with mip mapping on and off and we couldn't detect any significative difference in performance, and we are not cpu limited.

ciao,
Marco
 
Kristof said:
Or more specifically if the driver has support for it how does the driver/hardware do that ?

With a little bit of thought? :p

Kristof said:
render target texture = twiddled (not guaranteed) + mipmap missing or generated at a cost and uploaded possibly for each frame.

Some older hardware can't always render to texture if the texture is not a power of 2. If they do, therein lies the twiddling/non-twiddling issue. I suppose this type of limitation will be outdated in a few years once the older hardwares die off.

Kristof said:
However its done its going to be an expensive operation.

I wouldn't necessarily say expensive, unless it falls to some software mip generation method, or if you are mip generating and switching render targets for every other polygon.
 
Kristof said:
MipMap generation can either be done by rendering the same scene multiple times at the various MipMap resolutions, possibly only a couple of levels. Or by downsampling the rendered topmap, similar to how AA is done. However its done its going to be an expensive operation.

K-

Have you ever tried this? It's very, very fast to generate mipmaps in hardware. A complete mipmap chain is 1.333 times as big as the top mip level. so a 512x512 texture only takes an additional ~87k pixels for the mipmaps. With 2 gpix fillrate, it's not big deal.
 
fresh said:
Have you ever tried this? It's very, very fast to generate mipmaps in hardware. A complete mipmap chain is 1.333 times as big as the top mip level. so a 512x512 texture only takes an additional ~87k pixels for the mipmaps. With 2 gpix fillrate, it's not big deal.
I don't think Kristof was concerned about the fillrate. You'd have to send down the same geometry multiple times in order to generate the miplevels, and that could affect performance.
 
Back
Top