HSR vs Tile-Based rendering?

aZZa said:
Chalnoth, you are very knowledgable, but when will you finally concede that TBDR has plenty of advantages vs IMR? If nVidia jumped ship, what would you do?? :eek:
I still think that TBDR has too many problems with high geometry densities to be useful going into the future. What I specifically don't support is the deferred part of this rendering algorithm. If a TBDR is released that fixes the problems I'm currently seeing gracefully, then fine.

Most of the disadvantages and problems with the style have long been overcome and solved across patents or by other means.
I'm sorry, but I have to make fun of that statement. Patents don't solve problems :) Patents are applied for when somebody solves a problem.
 
Well lets get series 5 out the door and show just how well higher poly games run on PowerVR tech.


Remember the Dagoth Moor Zoological Gardens demo that Nvidia cited to show off its TnL but the KYRO ran it faster?
 
If it gets out the door, yes, we will see.

And, of course, you have to remember that with DMZG much of the polycount was taken up by trees, which produces quite a bit of overdraw, which was definitely hard on the GeForce 256 which had essentially zero overdraw reducing techniques. Furthermore DMZG was quoted as having around 80k polys/frame. I think the turnover where TBDR's will start to lose effective fillrate will be at closer to 300k-500k polys/frame (i.e. we're talking UE3 stuff).
 
Chalnoth said:
aZZa said:
What I specifically don't support is the deferred part of this rendering algorithm. If a TBDR is released that fixes the problems I'm currently seeing gracefully, then fine.

Hmmm and the only way to fix the deferred part is if the developer ensures the code delivers the polygons in a front to back order with ones that intersect are properly clipped? Heh, and if the developer does that then IMR will work just as well. Anyways I definately agree on that deferred part. Means you either have a frame of latency (best you can do I can see is to be rendering the last frame while doing the entire geometery setup for the next frame it seems). I always like ideas like S-Buffers and the such but those ideas never ramped up to using polygons unfortunately and even Carmack couldn't figure out to get it to work reasonably :p Too much trouble to write something completely elegant versus just use brute force with a little overdraw.
 
No, a TBDR pretty much doesn't care in what order primitives are sent. The only thing is that you're substituting a z-buffer for a scene buffer, and each vertex carries quite a bit more information than one z-buffer value, so when triangles get small, a TBDR can start to get inefficient.
 
The problem is how can you not have it deferred rendering if they are sent in any order was what I meant. I agree with you on the issues though.
 
Chalnoth said:
No, a TBDR pretty much doesn't care in what order primitives are sent.
Umh..a TBDR may care about primitives submission order..
Imagine a TBDR that build some kind of hierarchical representation of the scene at the same time geometry is submitted.
Such architecture could reject geometry batches thus avoiding to bin/store them, if it knows that single batch would be occluded by some other batch it has already binned.

The only thing is that you're substituting a z-buffer for a scene buffer, and each vertex carries quite a bit more information than one z-buffer value, so when triangles get small, a TBDR can start to get inefficient.
I have the feeling Simon is not going to agree with that statement..;)
 
"I have the feeling Simon is not going to agree with that statement.."

So am I...

To a TBDR many, small polygons mean taking up more space in the scene buffer and more bandwidth reading it back. The 6MB buffer in the KYRO could almost handle a 500,000 polyon scene (3dmark 2k1 high poly count demo) so it stands to reason that the buffer isn't going to get muh bigger than, say 12MB-15MB. This isn't really going to effect performance that much IMO. To read a 15MB buffer every frame requires 900MB/s @ 60 fps. Bandwidth is measured in GB/s on modern graphics boards.

On an IMR small polygons means their overdraw reduction methods dont work as well for polygons behind these tiny ones. It also means that all memory accesses associated with these tiny triangles are using the memory bus at less than full efficiency. Also, they cannot be rendered as quads by the IMR so the texture acesses will have less spatial locality and hence cache luck.

Dave
 
G'day aZZa
aZZa said:
This is a very informative thread, and thanks to the guys from PowerVR for contributing.
No worries...
Speaking of latencies... a couple (probably trivial) questions to the PVR guys - Im interested in what sort of latency improvements were possible by synchronising the core clock & memory on kyro and older architectures?
Disclaimer: I'm primarily a SW/Algorithms bloke who dabbles in some minor HW design...
There are probably two reasons for this, neither of which is specific to PVR or even 3D graphics:
  • My understanding is that *DRAM technology has some fixed-time delays so if you clock something faster (and stay in spec) I believe you have to increase some cycles somewhere. You'll get higher data throughput but perhaps also higher latency.
  • If the external and internal busses aren't synchronised you have to include a lot more pipeline/FIFO stages in a synchroniser circuit which means (a) more latency and (b) more gates for the synchroniser and (c) even more gates to try to hide the latency. You might be better off using that gate budget for something else <shrug>
 
Thanks Simon. I was kinda thinking along some of those lines.

With regards to binning space Dave B- would it be be better to provide say 24-32 megs than 15-16 megs - essentially miles more than enough, especially as most modern cards have 256 - (still leave ~224megs), to cover any absolute extreme conditions (for a few years) with ease?? That still only needs about 2GB/s at 60fps - and utilising a 256-bit ddr bus that cards have used for the past couple of years should have little worries with delivering these?

Or would this be better designed to utilise a dynamic scene buffer - say when it gets 85-90% full, it allocates another 25% or something? Surely it would easier just to setup a huge buffer which should never entirely fill-up in the foreseeable future - or unless dealing in extreme non-realtime work.

Could you provide a separate (possibly high speed/bandwidth) memory (like some workstation graphics use) - but for the specific purpose of providing this buffer? Or would this just complicate things more with extra memory logic on chip?
 
Dave B(TotalVR) said:
To a TBDR many, small polygons mean taking up more space in the scene buffer and more bandwidth reading it back. The 6MB buffer in the KYRO could almost handle a 500,000 polyon scene (3dmark 2k1 high poly count demo) so it stands to reason that the buffer isn't going to get muh bigger than, say 12MB-15MB. This isn't really going to effect performance that much IMO. To read a 15MB buffer every frame requires 900MB/s @ 60 fps. Bandwidth is measured in GB/s on modern graphics boards.
It also has to be written each frame, and you will also have overlap between tiles (particularly with high anisotropy, which will depend upon the geometry and view point). As far as memory space is concerned, you'll also want to double-buffer the scene buffer if you want to be able to sort/bin at the same time as rendering, doubling that memory requirement.
 
As far as memory space is concerned, you'll also want to double-buffer the scene buffer if you want to be able to sort/bin at the same time as rendering, doubling that memory requirement.

I think there is something along along the lines of meta-tiling. During the render processes the render is processing one tile while tile - 1 is being sorted, once the sort of that tile is done the memory space can be reclaimed. With meta-tiling you can begin binning the next frame in a some of the reclaimed memory space, which means that two frame do not need to be buffered.
 
Chalnoth said:
you will also have overlap between tiles (particularly with high anisotropy, which will depend upon the geometry and view point).

How do aniso and tile overlap relate to each other ? What aniso are we talking about aniso texture sampling ?

K-
 
DaveBaumann said:
I think there is something along along the lines of meta-tiling. During the render processes the render is processing one tile while tile - 1 is being sorted, once the sort of that tile is done the memory space can be reclaimed. With meta-tiling you can begin binning the next frame in a some of the reclaimed memory space, which means that two frame do not need to be buffered.
I think you're talking about the depth sorting, not storing the scene buffer and binning the triangles in their respective tiles. Anyway, you may be able to begin storing the scene buffer for the next frame, but you won't be able to begin binning. So if you want to parallelize the binning, you'll still need to double-buffer.
 
Chalnoth said:
Anyway, you may be able to begin storing the scene buffer for the next frame, but you won't be able to begin binning. So if you want to parallelize the binning, you'll still need to double-buffer.

Storing = Binning !

Its all about memory management...

K-
 
Okay, so let's imagine you do it this way:
1. Have a small buffer that stores pointers to triangle information for each tile.
2. Have a larger buffer that stores the actual triangle information.

The pointer buffer you'd need to double-buffer, since you cannot know the tile location of the incoming triangles before sending them.

The triangle buffer you'd probably still want to double-buffer, since it is a nontrivial problem to store triangles in a non-linear access pattern. That is, since triangles incoming to the scene buffer are going to come in in a different order than they are "cleared" from it, you're going to have massive fragmentation of the available memory, and that fragmentation will magnify if you attempt to make use of that freed memory.

No, unless you decide you can suck up the performance hit of binning/storing the scene buffer, which would destroy the parallelism of geometry and pixel processing, you'd pretty much need to double-buffer.
 
Do I need to spell out how memory management works ?

You have pages, you need to store something you request a page. You no longer need a page you free the memory... you can do this both for pointers and geometry. Its that simple and hence nothing is really double buffered - it might be but its unlikely to ever be fully double buffered given that pages are released as soon as possible. And fragmentation is no issue since its all based on standard sized pages, a page that becomes free can be used.

K-
 
It would be simple if the storage and retrieval were done in the same order. But no, geometry is stored in the order of draw calls, and is retrieved in screen-space order. You may be able to gain some memory space efficiency by doing what you're describing, but it won't alleviate the need for double buffering to prevent pipeline stalls.
 
Kristof said:
Chalnoth said:
you will also have overlap between tiles (particularly with high anisotropy, which will depend upon the geometry and view point).

How do aniso and tile overlap relate to each other ? What aniso are we talking about aniso texture sampling ?
Of course they don't except, say, in the Chal42 graphics chip.

Chalnoth, once visibile pixels** are determined, they can be textured individuaklly (or at worst, as 4x4 blocks). There are no boundaries to worry about.

Kristof said:
Do I need to spell out how memory management works ?
K-
Give up Kristof, it's not worth it.


**slightly simplified.
 
Kristof said:
How do aniso and tile overlap relate to each other ? What aniso are we talking about aniso texture sampling ?

K-
If you have long, thin triangles, you'll need to read in that triangle data for more than one tile. I was using the term high anisotropy to describe this situation, since, for example, a uniformly-tesellated mesh when viewed at high angles will exhibit this problem.
 
Back
Top