DX9 and deffered rendering without tiles

pascal

Veteran
Other day Basic showed to me the deferred rendering algorithm. http://64.246.22.60/~admin61/forum/viewtopic.php?t=891

After that I was thinking is it necessary to have tiles to do a deffered rendering? Specially now that many futures games (DX9) will have many levels of multitexturing.

What about a simple algorithm like that:
Code:
// Deferred without tiles
for( all polys) 
 { 
   for( all pixels in poly ) 
   { 
     if( pixel visible ) 
     { 
       store z 
       store poly reference 
     } 
   } 
 } 
for( all pixels ) 
{ 
    calc and store pixel color 
}
edited: corrected the algorithm above
edited2: some correction again.
 
:-? CONFUSED :-?

Your code still contains :

Code:
for( all polys in tile )

Try explaining what you want to do... do you want to use one big tile = screen ?
 
Ooops, sorry my mistake :oops: I did some cut and paste from Basic algorithm. I corrected it.

The idea is do deffered rendering without any tile.
It will not save framebuffer bandwith but will save texture bandwith and save all multiple passes required by future games. On the other hand all eventual problems that come from sorting the large number of polygons in tiles is gone.

The reason is that probably future games will use a large number of passes and DX9 is designed for this high level of multiple passes.

IIRC Carmack stated that he is looking forward to be able to do 100 passes in future engines.

Does it make sense?
 
It would only be advantageous for the bandwith needed for the display list, because you would be building a smaller display list on the first pass and also reading back a smaller display list on the second pass (if you there was no occlusion or you were rendering back to front the display list build on the first pass would be the same as with tiling, although you would still be reading less of it on the second one ... removing invisible geometry from the list during the first pass would be very hard, because of fragmentation, and likely counter-productive).

Sorting is NOT an issue though, it never was. Tiling the tri's is only a bin-sort, that means its O(N) which is the same as vertex shading ... but compared to vertex shading its leading coefficient is insignificant (and growing even less significant as vertex shaders grow more complex and less tri's have overlap between tiles). You'd think that after all these years we would have gotten that down on this board :)
 
Hey Mfa, remenber I am a layman :)

Lets say that the future (2005) average pixel needs 20 passes, then what about the possible texture and multiple passes potential saving (in layman words)?
 
Its hard to say, most all of those passes would be lighting passes ... and I dont want to wager a bet on how lighting will be done in 3 years.
 
Joe:
Well, by then all "traditional" DX9 pipelines will be able to combine 20 textures in a single pass anyway.
Then this is one more motivation to use deffered rendering without tiles because we dont need to worry about framebuffer bandwith, we will only save texture bandwith and fillrate.

SCENARIO:
By Xmas 2002 we will have new games like U2 and Doom3, then the companys (Epic , id) will concentrate in the development of their next engine. We can expect a three years cycle and by Xmas 2005 we will have new game engines based on 2002 technology (DX9 level technology as common denominator, id will use the OpenGL).

These new engines will be more complex and will require only (guess) 3 or 4 times the number of passes of the U2 and Doom3 engines. This because their new games will work on old 2002 DX9 cards too with some minimal speed (the large installed base of consumers).

People will want better lighting (many lights), n-patches, displacement mapping (large outdoors), better and larger textures, many levels of texture, many samples (great filtering like 64tap), all pixel shaded, etc..

This will generate a huge texture bandwith and will need lot of fillrate. On the other hand the framebuffer trafic will be relativally smaller.

Mfa:
Its hard to say, most all of those passes would be lighting passes ... and I dont want to wager a bet on how lighting will be done in 3 years.
Think about how you could do it today with DX9.
 
In a deferred renderer all polygons are tested for visibility at each pixel, before being textured. Tiles are used to (greatly) reduce the number of polygons to check against. So making a one tile deferred renderer, simply does not make sense; It's quite hard to check against 150K polys per pixel at reasonable speed. I believe KYRO tests against 32 polygons per pixel per cycle (Kristof ? )

Cheers
Gubbi
 
If you really needs 20 or 100 passes rendering, there is a simple way to save some texture bandwidth. Just render all triangles without writing to the color buffer, only to Z buffer. Later, render all passes using Z test "equal."

Of course, since Doom 3 already uses a lot of stencil shadows, I suspect that it already uses similar technique.
 
PCChen:
If you really needs 20 or 100 passes rendering, there is a simple way to save some texture bandwidth. Just render all triangles without writing to the color buffer, only to Z buffer. Later, render all passes using Z test "equal."

Of course, since Doom 3 already uses a lot of stencil shadows, I suspect that it already uses similar technique.
Yeah, but it will not be transparent to the programmer and will be slower.
Why not have it done by hardware using a deffered rendering?

Gubbi:
In a deferred renderer all polygons are tested for visibility at each pixel, before being textured. Tiles are used to (greatly) reduce the number of polygons to check against. So making a one tile deferred renderer, simply does not make sense; It's quite hard to check against 150K polys per pixel at reasonable speed. I believe KYRO tests against 32 polygons per pixel per cycle (Kristof ? )
I was thinking that tiles are used to reduce framebuffer bandwith.
 
pascal said:
What about a simple algorithm like that:
Code:
...
       store poly reference 
...

Explain this step in detail and we can judge whether this is a simple algorithm.

Dont forget the >100 alpha blended textures affecting the single pixel where there's a complex particle effect.
Explain how you want to chain the poligon references.
Explain how to store the interpolated color and texture-coordinate values for the pixels or whether you want to (re)calculate them at rendering time.
Explain how you want to lookup, read and change render states per pixel-"fragments".

And don't forget to calculate the memory requirement of such storage.
 
pascal said:
Yeah, but it will not be transparent to the programmer and will be slower.
Why not have it done by hardware using a deffered rendering?

Because without an internal buffer (edram or small tiles), it will be slower than direct rendering if only few passes are used.
 
PCChen:
Because without an internal buffer (edram or small tiles), it will be slower than direct rendering if only few passes are used.
I understand, but we are talking about a lot of passes and a hardware for the future, not for now.

Hyp-X:
Explain this step in detail and we can judge whether this is a simple algorithm.

Dont forget the >100 alpha blended textures affecting the single pixel where there's a complex particle effect.
Explain how you want to chain the poligon references.
Explain how to store the interpolated color and texture-coordinate values for the pixels or whether you want to (re)calculate them at rendering time.
Explain how you want to lookup, read and change render states per pixel-"fragments".

And don't forget to calculate the memory requirement of such storage
Do you want me to solve everything?
Explain you to me how PowerVR does it.
 
pascal said:
I understand, but we are talking about a lot of passes and a hardware for the future, not for now.

Then it is perhaps better to leave these works to applications. At least it can save the burdens for the 3d chip to handle transparent polygons. In near future, I expect many applications which need to render many passes will utilize similar scheme and functions like the NV_occlusion_query to cull some invisible objects.

In more distant future, scenegraph-like API may take over and applications no longer have to care about these things.
 
pascal said:
Do you want me to solve everything?

No.
What I was trying to say is that sounds very inefficient.

That's like a tile based rendering method with 1x1 tiles.

It will require more data because you need to have one "bucket" for each pixel instead of each tile. Say a TBR does 16x16 tiles, thats up to 256x the data you need to store in intermediates buffers.
And the data storage is already getting to be a problem in TBR. PVR guys seems to develop newer and newer methods to solve this problem.

So all I can see is there are serious disadvantages compared to TBR.
 
Humm, now I understand.
Is it not more or less the size of a framebuffer?

So all I can see is there are serious disadvantages compared to TBR.
Yeah, but is it better than IMR with future games?
 
Well, IMHO it has almost all disadvantages of a TBR, but the advantage over IMR is very small, perhaps cancelled by the disadvantage. That's why I said it's better to let the applications to do the job.
 
Its not the same as a bucket, if you are rendering front to back you only need 1 reference per pixel.

If you lay down the Z-buffer first you will of course have to transform everything twice.
 
pascal said:
Gubbi:
In a deferred renderer all polygons are tested for visibility at each pixel, before being textured. Tiles are used to (greatly) reduce the number of polygons to check against. So making a one tile deferred renderer, simply does not make sense; It's quite hard to check against 150K polys per pixel at reasonable speed. I believe KYRO tests against 32 polygons per pixel per cycle (Kristof ? )
I was thinking that tiles are used to reduce framebuffer bandwith.

In an IMR, tiles are used to increase the granularity of the (external) memory transactions. This is good because longer burst increases bandwidth efficiency, (under the assumption that you have good spatial locality).

In a deferred renderer, each pixel is only rendered once, and hence only written once (semi-transparent polys not included), so there really isn't much bandwidth to save. Instead the tile is used to keep the polygons in each bin (tile) manageble.

Cheers
Gubbi
 
Back
Top