HSR vs Tile-Based rendering?

Well, the deferred renderers from PowerVR have always expected that there won't be overflows. I think that the only possibly good tiler would be one that expects there to be overflows. Scene buffer overflows are the worst problem with the design, and there's just no way that the drivers are going to know for certain that the buffer is large enough for any scene a game would want to render, and even if it is, it may be too big, leading to a waste of video memory.

So, as I've suggested in the past, if you have a "tiler" that is designed to overflow, its maximum performance might not be so high, but its minimum won't be as low, either. The problem comes when it is assumed that overflows aren't going to happen.

Anyway, Xmas did point out a flaw in my logic. I was comparing only framebuffer traffic, which would multiply by a huge amount when there's an overflow. But I forgot to think that there's a lot of other things competing for bandwidth anyway.
 
Chalnoth said:
I'd say it has much more to do with the fillrate hit. If you're going to take an extra clock to do texture filtering, that's an extra clock that you don't have a z-buffer or frame-buffer access.
Texture fetches, z fetches, and frame buffer accesses happen simultaneously. Unless there's not enough bandwidth. ;)
 
Thanks for all the info. Sounds like tilers are generally a much cleverer architecture and the only potential problem is from an overly complicated scene that overflows its fast internal buffer.

How much benefit does this provide over good programming practices like data structures that eliminate the need to draw occluded polygons like the ones described in one of the articles posted?

Also, I'm not entirely sure about what techniques IMR's currently employ to reduce overdraw. It's been said that they still don't match TBDR's efficiency but is there any explanation?
 
3dcgi said:
Chalnoth said:
I'd say it has much more to do with the fillrate hit. If you're going to take an extra clock to do texture filtering, that's an extra clock that you don't have a z-buffer or frame-buffer access.
Texture fetches, z fetches, and frame buffer accesses happen simultaneously. Unless there's not enough bandwidth. ;)
And if the texture filtering takes two clocks to complete on a pixel with one texture (due to the level of anisotropy), with adequate caching, your framebuffer accesses can be spread across two clocks. That's the point I was trying to make: lots of texture processing should tend to make the video card more fillrate limited than memory bandwidth limited.
 
Raqia said:
Also, I'm not entirely sure about what techniques IMR's currently employ to reduce overdraw. It's been said that they still don't match TBDR's efficiency but is there any explanation?
There are numerous techniques, but they all stem from one basic precept: early z rejection.

The basic idea is that if you have enough cache, you can check and see if the pixels that are to be processed will be visible or not before doing any processing on them. Because the visibility check (z test) takes some time, the architecture must queue up a few pixels for processing before actually starting to work on them. This allows architectures to spend a minimum amount of time on pixels that won't be visible.

But it requires that the frame buffer already has information about which pixels are hidden and which aren't. This can be achieved in one of two ways:
1. Render everything in a front to back manner. Quick approximations can be done very easily, but a full front-to-back sort is very challenging, and would tend to be too CPU-intensive to be useful.
2. Fill the z-buffer before rendering the scene. Some rendering algorithms will call for this anyway (such as DOOM3, due to its use of global stencil shadowing), but it may become a performance win for any game that uses very long shaders.
 
Chalnoth said:
Raqia said:
Also, I'm not entirely sure about what techniques IMR's currently employ to reduce overdraw. It's been said that they still don't match TBDR's efficiency but is there any explanation?
There are numerous techniques, but they all stem from one basic precept: early z rejection.

The basic idea is that if you have enough cache, you can check and see if the pixels that are to be processed will be visible or not before doing any processing on them. Because the visibility check (z test) takes some time, the architecture must queue up a few pixels for processing before actually starting to work on them. This allows architectures to spend a minimum amount of time on pixels that won't be visible.

But it requires that the frame buffer already has information about which pixels are hidden and which aren't. This can be achieved in one of two ways:
1. Render everything in a front to back manner. Quick approximations can be done very easily, but a full front-to-back sort is very challenging, and would tend to be too CPU-intensive to be useful.
2. Fill the z-buffer before rendering the scene. Some rendering algorithms will call for this anyway (such as DOOM3, due to its use of global stencil shadowing), but it may become a performance win for any game that uses very long shaders.

These are both really programmer reliant techniques! I'm starting to appreciate TBDR more and more... Sign me up for PowerVR's next card!
 
Raqia said:
Thanks for all the info. Sounds like tilers are generally a much cleverer architecture and the only potential problem is from an overly complicated scene that overflows its fast internal buffer.
It's worth bearing in mind that an overly complicated scene on a tiler is also a complicated scene on a standard IMR.
 
Raqia said:
These are both really programmer reliant techniques! I'm starting to appreciate TBDR more and more... Sign me up for PowerVR's next card!
I was describing techniques you'd use for an IMR.
 
Chalnoth said:
Raqia said:
These are both really programmer reliant techniques! I'm starting to appreciate TBDR more and more... Sign me up for PowerVR's next card!
I was describing techniques you'd use for an IMR.

Seems like tilers do a better job more efficiently w/o reliance on the programmer to organize things in these ways before pushing things onto the graphics card.
 
Raqia said:
Seems like tilers do a better job more efficiently w/o reliance on the programmer to organize things in these ways before pushing things onto the graphics card.
It does rely on the programmer not doing daft things, eg, like passing in opaque textures but saying that they are translucent! It used to occur but now that the IMRs have got early Z they don't want this sort of thing happening either.
 
Simple sorting is a relatively easy operation, and only needs to be programmed once. An initial z-pass is required for some rendering techniques, and is pretty much trivial.
 
Simon F said:
And what if you need to sort sort objects by material/state?
That's an optimization technique, too. There's obviously some balance that needs to be struck (if you're not going to be doing an initial z-pass). I would tend to think that state changes will become less important as world polycounts increase, such that, for programs not doing an initial z-pass, the shift will be towards depth sorting instead of state sorting.
 
Hrmm...

Your scene buffer overflows, so you are left with insufficient space to write your remaining scene data...

You dont know which tiles this data will be written too..... or do you?

If you know *where* you need to write this scene data the surely it could all be stored in a cache whilst the scene is rendered for the first pass. Or just written to a new scene buffer.

Once this has happened you have a number of tiles with data in still needing to be rendered. But just because the scene has been split does not mean ALL tiles need another rendering pass does it. Some of those tiles are already comletely rendered so why bother outputting their z-buffer and why bother saving any other space for them?

Just a thought...

Also, how many triangles can you fit in a 32MB scene buffer? whats 32mb out of 128/256?
 
What about anisotropic filtering? For tilers without an external z-buffer, this would probably have to be handled internally. The number of samples that need to be taken might vary wildly from tile to tile, is there a fast internalizable method for doing aniso or will a traditional z-buffer still be needed for the best results?
 
Raqia said:
What about anisotropic filtering? For tilers without an external z-buffer, this would probably have to be handled internally. The number of samples that need to be taken might vary wildly from tile to tile, is there a fast internalizable method for doing aniso or will a traditional z-buffer still be needed for the best results?

There is no link between anisotropic filtering and the ZBuffer or any other external buffer. Anisotropic filtering takes a variable number of texels to create a value passed into the Pixel Shader. No where near the buffers and thus no difference between IMR and TBDR.

K-
 
Kristof said:
Raqia said:
What about anisotropic filtering? For tilers without an external z-buffer, this would probably have to be handled internally. The number of samples that need to be taken might vary wildly from tile to tile, is there a fast internalizable method for doing aniso or will a traditional z-buffer still be needed for the best results?

There is no link between anisotropic filtering and the ZBuffer or any other external buffer. Anisotropic filtering takes a variable number of texels to create a value passed into the Pixel Shader. No where near the buffers and thus no difference between IMR and TBDR.

K-

Since the location of the samples textels depends on the perspective of the polygon in question, z-values or some information about perspective must be involved... the reason I question this is because a certain opengl Playstation emulator plugin doesn't feature anisotropic filtering and cites the PSX's lack of a z-buffer as the reason. Perhaps this is done differently, enlighten me? ;)
 
Raqia said:
Since the location of the samples textels depends on the perspective of the polygon in question, z-values or some information about perspective must be involved...
Texture mapping is done by evaluating hyperbolic equations, i.e.

U_pixel = (A * X + B * Y + C) / (P * X + Q * Y + R)
V_pixel = (D * X + E * Y + F) / (P * X + Q * Y + R)

where X and Y are the coordinates of the screen pixel, and the nine constants, A,B..F,P,Q,&R are evaluated during triangle setup from the X,Y and 1/W positions of the vertices and their associated U&V values.

In a sense the "/ (P * X + Q * Y + R)" is typically encoding the perspective depth but it doesn't have to be.
 
Back
Top