If NV30 uses tile-based rendering, will Ati convert too?

MfA · Oct 29, 2002

Chalnoth said:
MfA said:

Geometry is not send very often as it is now, it will almost never be sent in the future. Geometry will be referenced, or created on the fly by the GPU. If you know beforehand in which tile the geometry is going to end up without creating and or transforming every vertex of it, from the bounding volume, then you can defer the creation and or transformation until you are rendering the tile (in which case just like an immediate mode renderer you can use the result and throw it away). So your scenebuffer just has to store the rendering commands and the geometry references, instead of the transformed vertices, which constitutes vastly less data.

Click to expand...

But to do this completely, you'd need to transform every vertex twice

If you have the (hierarchical) bounding volumes and they are sufficiently finegrained (compared to the tile size) then no, you dont have to do that.

UT2k3 has the ability to reference the same geometry multiple times, but it turns out that it is slower to do it that way).

I am not going to touch that

Let me ask a simple question ... do you think that future engines (of the kind which could push enough polygons that storing a scenebuffer would be a problem) would not 99% of the time be using drawing commands referencing vertex buffers created well ahead of time?

I cant imagine us disagreeing on this really ... to be able to use the available performance in the future, engines will have to almost never touch a vertex themselves (not to change it anyway). That is just how it will have to be, since from DX9 onward vertex shaders will be flexible enough to handle all animation/deformation/etc needs it wont be a problem for developers either.

Marco

KimB · Oct 29, 2002

MfA said:
Let me ask a simple question ... do you think that future engines (of the kind which could push enough polygons that storing a scenebuffer would be a problem) would not 99% of the time be using drawing commands referencing vertex buffers created well ahead of time?

I cant imagine us disagreeing on this really ... to be able to use the available performance in the future, engines will have to almost never touch a vertex themselves (not to change it anyway). That is just how it will have to be, since from DX9 onward vertex shaders will be flexible enough to handle all animation/deformation/etc needs it wont be a problem for developers either.

Marco

Right. It's really not possible for good performance to have anything but this for dynamic geometry. I was describing static geometry. Since UT2k3 is the game that currently appears to have the most geometry, it seems that lumping all static geometry into one buffer is better than splitting it up in the interest of data storage. Obviously this could change as geometry counts get higher and higher.

I still have a hard time seeing how a tiling technique can be flawless and still operate on such a high level. It would require one heck of a lot of processing.

Ailuros · Oct 29, 2002

Chalnoth,

Instead of constantly moving in the theoretical realm with issues concerning tiling/deferred rendering, I'd suggest you wait a bit until there finally is a representative sample out there that can be compared to IMR's on the market one way or the other.

It's all nice in theory, even using countless of mathematical equations, I'd like to have it proven in real time first. Tilers have advantages and disadvantages, yet that's one oversimplyfied reality that IMR's don't skip either (no matter if API's are on their side or not).

OT: I was having a long hard laugh the other day, because I thought there isn't much out there that can bring my Ti4400@300/325 to it's knees. Until I ran into the main room of DM-Borel(DS)/UT. Give it a shot. Just a couple of bots and you're running immediately in the teens.

MfA · Oct 29, 2002

Chalnoth said:
Right. It's really not possible for good performance to have anything but this for dynamic geometry. I was describing static geometry. Since UT2k3 is the game that currently appears to have the most geometry, it seems that lumping all static geometry into one buffer is better than splitting it up in the interest of data storage. Obviously this could change as geometry counts get higher and higher.

Static/dynamic is of little importance, what exactly do you think UT2K3 is doing? It has to be able to do culling effectively, view frustum and occlusion, so it cant put the entire level in one big vertex buffer and draw it with a single command ... if all it does is put all the vertices in one big buffer and use a lot of indexed drawing commands that is pretty much inconsequential. The bounding volumes would be associated with the index sets instead.

I still have a hard time seeing how a tiling technique can be flawless and still operate on such a high level. It would require one heck of a lot of processing.

Not really, not compared to the vertex/pixel processing costs ... view frustum culling is a trivial part of the 3D pipeline, and this would just amount to doing view frustum culling a little more often than usual.

Marco

KimB · Oct 29, 2002

MfA said:
Static/dynamic is of little importance, what exactly do you think UT2K3 is doing? It has to be able to do culling effectively, view frustum and occlusion, so it cant put the entire level in one big vertex buffer and draw it with a single command ... if all it does is put all the vertices in one big buffer and use a lot of indexed drawing commands that is pretty much inconsequential. The bounding volumes would be associated with the index sets instead.

All that I know is what Daniel Vogel stated. It could be that subsets of the entire level are sorted and rendered separately. It could be that the entire level is rendered in one command. I don't really know. I'm sure there's a happy medium in performance between overdraw limitations, CPU load, and switching overhead. However, I do know that he stated that the load from switching between objects was larger than transforming all objects into worldspace before rendering.

As for using bounding volumes, I still see it as a nontrivial solution to a complex problem. Regardless of how it's looked at, however, there are significant problems with moving to a deferred renderer as we go into the future. And what you have to realize is that we may never get a choice between which comes into widespread use, so we may never get to see a good comparison between an immediate mode renderer and a deferred renderer for highly-complex scenarios.

Quite simply, the decision needs to be made now, based entirely upon theory, as to which path is taken. It is very true that I don't have all of the information or done the research required to make a final decision. I just see that the fillrate of immediate mode renderers is not really much of a concern. Moving to deferred rendering just isn't necessary. However, since triangle counts are what need to increase most of the next couple of years, immediate mode rendering just seems like the natural way to go, as it presents fewer problems toward development in that direction.

MfA · Oct 29, 2002

Chalnoth said:
However, I do know that he stated that the load from switching between objects was larger than transforming all objects into worldspace before rendering.

Sounds to me he is just talking about the difference between instanced and fully static geometry. Instanced geometry needs an extra matrix update for each object, which indeed comes with a cost. This is irrelevant for the granularity at which the geometry is drawn, it is the state change caused by the matrix update he is referring to as being costly ...

As for using bounding volumes, I still see it as a nontrivial solution to a complex problem.

It is a necessary step for any of the more advanced occlusion culling methods, you cant even use bad old occlusion queries without them.

Regardless of how it's looked at, however, there are significant problems with moving to a deferred renderer as we go into the future.

Problems which can be solved at low implementation cost, the only significant part of it would be overcoming stubborness from developers. As I said before though, if we all used tilers it would not be a problem ... those who could adapt would, those who wouldnt would perish. End effect would still be that tilers were not an impediment to increasing polygon counts.

And what you have to realize is that we may never get a choice between which comes into widespread use, so we may never get to see a good comparison between an immediate mode renderer and a deferred renderer for highly-complex scenarios.

It is not my choice to make, and it was not what this arguement was about. You said that fundamentally tilers could not deal with "high" polygon counts, and that arguement is bunk.

Quite simply, the decision needs to be made now, based entirely upon theory, as to which path is taken. It is very true that I don't have all of the information or done the research required to make a final decision. I just see that the fillrate of immediate mode renderers is not really much of a concern. Moving to deferred rendering just isn't necessary. However, since triangle counts are what need to increase most of the next couple of years, immediate mode rendering just seems like the natural way to go, as it presents fewer problems toward development in that direction.

I just saw a nice pic from a new engine on <A HREF=http://www.gamedev.net/community/forums/topic.asp?topic_id=120983>the gamedev board</A> (in the post from Yann L). Effects like this is what the future is about.

Marco

KimB · Oct 29, 2002

MfA said:
Sounds to me he is just talking about the difference between instanced and fully static geometry. Instanced geometry needs an extra matrix update for each object, which indeed comes with a cost. This is irrelevant for the granularity at which the geometry is drawn, it is the state change caused by the matrix update he is referring to as being costly ...

Right, but I thought that's what you were describing earlier (I guess this does make me wrong about the possible overdraw hit, however...). After all, remember you were talking about caching rendering commands and not geometry. The primary way to make this more efficient would be to have less geometry in the rendering commands than after going through the vertex pipeline, though there will be natural efficiencies due to values that don't always exist before vertex processing (such as color values calculated from lighting).

It is a necessary step for any of the more advanced occlusion culling methods, you cant even use bad old occlusion queries without them.

Right, but what is the granularity supposed to be with these? I would think that for them to be useful for a tiler, you'd need a relatively fine-grained test, one that's done on the CPU. For opimal rendering with an IMR, you really only need to properly-order the rendering, while only having very basic occlusion techniques (such as portal-based technologies).

As an example, UT had one of the most comprehensive occlusion detection systems of any recent renderer. In software mode, no z-buffer was used, and all rendering was done using occlusion detection techniques (this worked most of the time...but not for meshes intersecting world polygons). UT also was limited to just a couple hundred polygons per frame.

Problems which can be solved at low implementation cost,

But the problem I have with this is that you have to solve these problems at all. Quite plainly, I just don't see any problem with IMR's. Fillrate is increasing fast enough. Triangle rate isn't increasing quickly enough, though hopefully that will change will change with the advent of new HOS techniques. I only see tilers decreasing possible advancement in triangle rates. While it is true that it is just a problem to be overcome, I really do not see how a tiler can ever be more efficient at producing higher polycounts. It really seems to me that the tiler will always be <= to an IMR in triangle counts. So, while in a best-case scenario it may be just as good, in a worst-case scenario, it can be much worse.

It is not my choice to make, and it was not what this arguement was about. You said that fundamentally tilers could not deal with "high" polygon counts, and that arguement is bunk.

I believe I said that fundamentally tilers have problems with high polygon counts. Why deal with such problems if you don't have to?

I just saw a nice pic from a new engine on <A HREF=http://www.gamedev.net/community/forums/topic.asp?topic_id=120983>the gamedev board</A> (in the post from Yann L). Effects like this is what the future is about.

Marco

That does look quite nice. Lighting is most definitely one of the things that needs to improve vastly for realtime graphics. I hope I didn't give the impression that I thought improving geometry was the only thing needed for improved graphics.

MfA · Oct 29, 2002

Chalnoth said:
After all, remember you were talking about caching rendering commands and not geometry. The primary way to make this more efficient would be to have less geometry in the rendering commands than after going through the vertex pipeline

In neither case there is any geometry at all being send to the card, whether the vertices are in object space (allowing easy instancing at random locations) or in world space (allowing you to use a single transform matrix). In both cases those vertices will have been stored in memory well ahead of time.

Right, but what is the granularity supposed to be with these? I would think that for them to be useful for a tiler, you'd need a relatively fine-grained test, one that's done on the CPU.

You could do it there, but it is bad PR ... you would probably do it on the GPU.

For opimal rendering with an IMR, you really only need to properly-order the rendering, while only having very basic occlusion techniques (such as portal-based technologies).

Only as long as you stick to situations where basic occlusion culling methods work well, occlusion queries havent been invented for nothing you know ... they were meant for IMRs.

There is nothing simple about what you call basic occlusion techniques ... screen space occlusion culling methods are the simplest and most efficient occlusion culling methods period. All the object space techniques which have been growing in complexity since good old BSP+PVSs wont ever be able to deal as well (or well at all for that matter) with dynamic occlusion or dense occlusion caused by lots of small objects.

Beware, even if we stick with IMRs hardware occlusion culling above the polygon level is a problem which has to be tackled there too to allow developers enough freedom to use all those polygons you want for something else than to just build prettier boxes connected by tunnels.

That there is an overlap between solving this and removing one of the problems of tilers is merely serendipitous.

It really seems to me that the tiler will always be <= to an IMR in triangle counts. So, while in a best-case scenario it may be just as good, in a worst-case scenario, it can be much worse.

I disagree, worst case such a tiler might do a little more vertex processing than the IMR ... more importantly though, I dont think the worst case scenarios where IMRs break down (and where a tiler wouldnt) will dissapear either. Bandwith will remain a bottleneck.

That does look quite nice. Lighting is most definitely one of the things that needs to improve vastly for realtime graphics. I hope I didn't give the impression that I thought improving geometry was the only thing needed for improved graphics.

My point was more the way it was implemented ... yes there is some smart shader work going on, but on the flipside there is a lot of rendering which relies on nothing but the most basic rasterization. The simple pixels profide the data to be used by the smart pixels, same is true for Doom3. I think it evens out, and in the end fillrate/bandwith will just have to keep increasing to make longer shaders usefull (and vertex rates will have to increase even more, because polygon sizes definetely are decreasing).

Marco

Ailuros · Oct 29, 2002

Quite plainly, I just don't see any problem with IMR's.

Another small interruption: I don't think anyone ever said or implied something along the line.

TBR's are alternative sollutions, with their advantages and disadvantages. If then compared to an IMR with the exact same specs/abilities and averaged I doubt there will be a clear winner for either/or.

I personally prefer Tilers just because I'm a passionate FSAA freak, one department where Tilers can have still worth-mentioning advantages (not limited to just that but that's on top of my list anyway).

psurge · Oct 30, 2002

MfA, you are referring to HZ (or something like it) to do the occlusion culling at scene-graph traversal time then?

If I'm getting this it's essentially a z-only first pass with near optimal drawing order and "node" rejection which results in a rough set of visible "nodes" per tile...

how would you defer shading? how would you avoid retransforming geometry data?

anyways, i'm just curious... you didn't reply to my post so i wondered if this means -i have the right idea -i am way way off the mark -you are busy

MfA · Oct 30, 2002

You just traverse the scenegraph front to back and in tile order, so at every level in the graph you depth sort the bounding volumes ... then you tile them ... select the ones which are in the tile you are rendering at the moment ... check the visibility of the leaf nodes against the Z-buffer (possibly, but not necessarily hierarchical) and render them if visible ... check the visibility of the other nodes and descend further if so ... repeat till you are out of stuff to do for the present tile and then move on to the next till you are out of tiles

Marco

PS. Im skipping the issue of overlap, to the point that I am not going to describe what it is for the people who didnt get it by now, because some would clutch onto it as a big problem ... and really it aint that big of a deal IMO as long as the tiles are "large enough" and the average size of bounding volumes "small enough", a small amount of temporary storage should nearly always be enough to deal with it. When it isnt you just do some stuff twice, hmm this turned out a little too long for what was meant as a cryptic remark ... oh well.

KimB · Oct 30, 2002

You could do it there, but it is bad PR ... you would probably do it on the GPU.

I don't think it's just bad for PR. It's also quite bad for games. In particular, I want games that have more realistic physics and AI. Besides that, just passing graphics instructions to the graphics card can take quite a bit of processor time today. I think it would be very counterproductive to attempt to use more CPU power to make the video card more effective.

That said, unless I'm reading you wrong, what you're describing can only be effectively done at the scene graph level. This seems, to me, to mean that we would require a whole new API structure to move such calculations to the video card.

I disagree, worst case such a tiler might do a little more vertex processing than the IMR ... more importantly though, I dont think the worst case scenarios where IMRs break down (and where a tiler wouldnt) will dissapear either. Bandwith will remain a bottleneck.

Memory bandwidth isn't nearly as much of a bottleneck for IMR's as it was two years ago, and it's going in the direction of being less of a bottleneck in the near future. Obviously you can still find situations where it's a significant limitation for today's hardware, but we're far from an optimal implementation of immediate-mode rendering.

I'll attempt to again bring up my argument that the primary issue at hand is not so much total memory bandwidth usage, but the ratio between computational power and memory bandwidth requirements. I really don't see how memory bandwidth requirements can increase faster than computation requirements. If anything, it certainly looks like it's going to go the other way. With the rapid pace of memory developments recently, it doesn't seem like memory bandwidth is going to be much of a problem heading into the future.

My point was more the way it was implemented ... yes there is some smart shader work going on, but on the flipside there is a lot of rendering which relies on nothing but the most basic rasterization. The simple pixels profide the data to be used by the smart pixels, same is true for Doom3. I think it evens out, and in the end fillrate/bandwith will just have to keep increasing to make longer shaders usefull (and vertex rates will have to increase even more, because polygon sizes definetely are decreasing).

Marco

Right, both fillrate and bandwidth will need to keep increasing. However, what I was attempting to state was that fillrate (pixel processing power) will have to increase faster than memory bandwidth, as it should.

MfA · Oct 31, 2002

Chalnoth said:
You could do it there, but it is bad PR ... you would probably do it on the GPU.

Click to expand...

I don't think it's just bad for PR. It's also quite bad for games. In particular, I want games that have more realistic physics and AI. Besides that, just passing graphics instructions to the graphics card can take quite a bit of processor time today. I think it would be very counterproductive to attempt to use more CPU power to make the video card more effective.

This just shows how bad a PR problem it is, I did not make a guess on how many procent of CPU cycles it would take ... you dont hazard a guess either, but still have no problem attacking it anyway.

On second thought though, if you want the 3D card to be able to do its own occlusion culling you would need feedback if you did it on the CPU ... and that would defeat the whole purpose.

That said, unless I'm reading you wrong, what you're describing can only be effectively done at the scene graph level.

No. If the immediate mode APIs were extended with a bounding volume occlusion culling method which could be entirely hardware accelerated, and the developers used it, that would be enough.

This seems, to me, to mean that we would require a whole new API structure to move such calculations to the video card.

I think in the case of OpenGL glCallListIfVisible( GLuint list , GLuint boundingVolume ) and some coding conventions would be enough.

Right, both fillrate and bandwidth will need to keep increasing. However, what I was attempting to state was that fillrate (pixel processing power) will have to increase faster than memory bandwidth, as it should.

If a large percentage of pixels are "simple" there isnt a whole lot of difference.

If NV30 uses tile-based rendering, will Ati convert too?

MfA

KimB

Ailuros

Epsilon plus three

MfA

KimB

MfA

KimB

MfA

Ailuros

Epsilon plus three

psurge

MfA

KimB

MfA

Similar threads