Welcome, Unregistered.

If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

 
Old 12-Mar-2002, 01:29   #1
3dcgi
Senior Member
 
Join Date: Feb 2002
Posts: 2,019
Default TBR bandwidth vs Immediate EZR

Does a tile based renderer like the Kyro reduce texture bandwidth more than an immediate mode renderer with an early z reject unit or does it just save z bandwidth?

I'm thinking that a TBR doesn't have any extra texture benefit in this case, but I haven't really thought about it too hard yet.
3dcgi is offline  
Old 12-Mar-2002, 02:48   #2
Dave
Member
 
Join Date: Jan 2002
Posts: 167
Send a message via MSN to Dave
Default

The only way that the two can be comparable is if the scene has a strict front-to-back rendering order, and even then a deferred architecture will have a slight benifit.
Dave is offline  
Old 12-Mar-2002, 07:34   #3
MfA
Regular
 
Join Date: Feb 2002
Posts: 5,227
Send a message via ICQ to MfA
Default

For framebuffer bandwith it will always win out, ignoring very small tiled textures ... though personally I would not like hardware for which performance would break down as soon you have lots of individual texturing, so you need the bandwith anyway.

Bandwith needed for geometry is shifting ... increasing against the tilers favour (since it can tripple it). With immediate mode API's tiling has its greatest advantage with relatively low poly counts and relatively simple pixel operation with multi-pass rendering, lots of back to front overdraw and with anti-aliasing. Or in other words their greatest advantage would have been in the past (they fucked up royal).

The time is running out for them, they have to hit it big soon and introduce some major API extensions to be able to even try to compete in the future. (For instance, if developers start heavily using NVIDIA's occlusion culling support where will that leave IMG? They would need to include in the API the ability to associate bounding volumes with geometry to be able to do the same thing as developers do in software with immediate mode rendering and feedback ... and when I say API I of course mean D3D, so they need leverage with m$, this would also nicely solve the geometry problem since it could just tile bounding volumes.)
MfA is offline  
Old 12-Mar-2002, 14:58   #4
Entropy
Senior Member
 
Join Date: Feb 2002
Posts: 1,865
Default

MfA wrote:
"Bandwith needed for geometry is shifting ... increasing against the tilers favour (since it can tripple it). With immediate mode API's tiling has its greatest advantage with relatively low poly counts and relatively simple pixel operation with multi-pass rendering, lots of back to front overdraw and with anti-aliasing. Or in other words their greatest advantage would have been in the past (they fucked up royal). "

As far as I can see your list of pros and cons seems OK.
But while poly counts most definitely is rising, I can also see multi-pass rendering increasing, overdraw increasing as complex outdoor areas and lots of mobile players/creatures/objects gain more widespread use, and anti-aliasing that gives good framerates is on most everybodys' wish list.

So by my limited understanding, some of the benefits of TBRs will continue to be valued and perhaps even increase a bit in importance.

Not that it matters one whit if noone brings compelling hardware to the market.

Entropy
Entropy is offline  
Old 13-Mar-2002, 01:37   #5
elimc
Member
 
Join Date: Feb 2002
Location: Eastern Washington
Posts: 85
Default

So will TBR renderers actually slow down games in the future if they have high enough polygon counts?
elimc is offline  
Old 13-Mar-2002, 03:23   #6
3dcgi
Senior Member
 
Join Date: Feb 2002
Posts: 2,019
Default

Quote:
Originally Posted by Dave
The only way that the two can be comparable is if the scene has a strict front-to-back rendering order, and even then a deferred architecture will have a slight benifit.
I guess a flaw in my logic was I was automatically thinking of strick front-to-back ordering, but of course this isn't the case in the real world.

I've been thinking of a feature that defered architectures could implement, maybe you all have an idea if it would work or not or if it can already be done.

Currently defered renderers transform all polygons to screen space before rasterization. Could this data be read by the CPU so host effects could be performed with the results being written back to the graphics card? I'm sure developers could think of something to do with this kind of flexibility. Of course, an argument against this is to just make pixel shaders flexible enough to do everything someone might think of to do.
3dcgi is offline  
Old 13-Mar-2002, 04:09   #7
Nexus
Junior Member
 
Join Date: Feb 2002
Posts: 87
Default

Quote:
Originally Posted by elimc
So will TBR renderers actually slow down games in the future if they have high enough polygon counts?
No, because just like an IMR it can be built to satisfy future high poly needs. The two disadvantages a TBR has with high poly counts is storage space for the polys (several MB, not a problem with todays 64MB+ cards) and more work for the hidden surface removal unit (ISP). The latter does highly parallel work, so you can easily throw more transistor on it to give it more power to be able to scope with more polys.

You may find Kristof's PowerVR article interesting:
http://216.12.218.25/domain/www.beyo...ing/index1.php
Nexus is offline  
Old 15-Mar-2002, 00:07   #8
fremin
Junior Member
 
Join Date: Feb 2002
Location: NJ, USA
Posts: 15
Send a message via ICQ to fremin
Default

Quote:
storage space for the polys (several MB, not a problem with todays 64MB+ cards)
This isn't a problem now, but It will probably become one in the future if PVR doesn't address the issue. Don't get it wrong, I don't think it will affect their currently available boards, or boards set for the immediate future for that matter, but the fact is that if they want to survive in this market they will eventually need to fix this problem since poly counts will increase far more than memory in the future (once we hit 128MB or 256MB i foresee us sticking with it for a while..or ditching it altogether for a different memory architecture in the future). I don't think this will be a problem for years, I just beleive that it will have to be addressed sooner or later (I heard 3dfx/gigapixel had a solution to this...anyone know any validity to this claim?)
fremin is offline  
Old 15-Mar-2002, 02:22   #9
elimc
Member
 
Join Date: Feb 2002
Location: Eastern Washington
Posts: 85
Default

What about when wee get to DX9. Won't it introduce more HOS? This would help reduce the polys by quite a bit wouldn't it?
elimc is offline  
Old 15-Mar-2002, 10:35   #10
mboeller
Member
 
Join Date: Feb 2002
Location: Germany
Posts: 845
Default

Based on the Article from Kristof the Scene-Buffer requirements seem really high; so I made an crude calculation :

20 Mio Vertex/sec @ 60fps => ~ 334000 vert's/frame
One vertex is around 64byte
One pointer is an 32bit number

=>

2 x 64byte x 334000 + (32/8)byte x 334000 = ~ 42 Mbyte !!

2 x 42Mbyte @ 60fps (read + write ) = ~ 5 GB/sec bandwidth !!


This numbers are rather high. Could this be correct??

Based on this numbers ( if correct!) I can understand that m$ uses an IMR in the XBox, cause the bandwidth demand for the scene-buffer alone would exceed the useful bandwidth of the XBox. And the scene-buffer would need most of the memory of the XBox, so only a small amount is left for the game and content.


Manfred
mboeller is offline  
Old 15-Mar-2002, 10:47   #11
Dave Baumann
Gamerscore Wh...
 
Join Date: Jan 2002
Posts: 12,950
Default

Quote:
What about when wee get to DX9. Won't it introduce more HOS? This would help reduce the polys by quite a bit wouldn't it?
The problem is – what will they store in the bin? Will the bin store post tessellated polygon information, or the pre tessellated algorithm?

Storing the pre-tesselated algorithm data in the bin will reduce poly load for the bin but then you have to question of what to do when you actually start rendering the tiles. If the HOS data covers many tiles then you either have to tesslate all the HOS data for all the tiles it covers once you meet the first tile that that contains some HOS information; or you re-tessellate several times for each individual tile which could increase the poly load on the T&L.

Of course, this assume you have an architecture what can tessellate and transform HOS in hardware; KYRO, for instance, could only ever store the post tessellated information in the bin as the CPU has to tessellate and transform everything.
__________________
Expand. Accelerate. Dominate.
Tweet Tweet!
Dave Baumann is offline  
Old 15-Mar-2002, 11:36   #12
Kristof
Senior Member
 
Join Date: Jan 2002
Location: Abbots Langley
Posts: 732
Default

Quote:
Originally Posted by mboeller
This numbers are rather high. Could this be correct??
Your forgetting quite a few things like clipping, backface culling, actual storage technique/format, realistic throughput of todays and tomorrows TnL engines etc...

K-
Kristof is offline  
Old 15-Mar-2002, 12:28   #13
mboeller
Member
 
Join Date: Feb 2002
Location: Germany
Posts: 845
Default

Quote:
Originally Posted by Kristof
Quote:
Originally Posted by mboeller
This numbers are rather high. Could this be correct??
Your forgetting quite a few things like clipping, backface culling, actual storage technique/format, realistic throughput of todays and tomorrows TnL engines etc...

K-
I thought so myself.

Can You give an better, more accurate example (based on the 20Mio Vert's/sec)? Or is this proprietary knowledge? An real example would help to see the possibilities and drawbacks of TBR's better. I think quite a lot of people will use simplified examples like mine above to calculate the storage and bandwidth demands of an TBR and come to the same wrong assumptions.


Manfred
mboeller is offline  
Old 15-Mar-2002, 16:06   #14
MfA
Regular
 
Join Date: Feb 2002
Posts: 5,227
Send a message via ICQ to MfA
Default

For a new console it would be trivial to shift the burden from storage to computation ... just let the developer send stuff in tile order, no storage needed and hierarchical frustum culling aint that expensive.
MfA is offline  
Old 15-Mar-2002, 19:48   #15
Roger Kohli
Member
 
Join Date: Feb 2002
Posts: 99
Default

Would that work if you wanted to use the GPU's T&L unit?
Roger Kohli is offline  
Old 15-Mar-2002, 21:34   #16
MfA
Regular
 
Join Date: Feb 2002
Posts: 5,227
Send a message via ICQ to MfA
Default

Yes.
MfA is offline  
Old 15-Mar-2002, 21:45   #17
Humus
Crazy coder
 
Join Date: Feb 2002
Location: Stockholm, Sweden
Posts: 3,216
Send a message via ICQ to Humus Send a message via MSN to Humus
Default

Exactly what is "just let the developer send stuff in tile order" supposed to mean?
Humus is offline  
Old 15-Mar-2002, 22:12   #18
mboeller
Member
 
Join Date: Feb 2002
Location: Germany
Posts: 845
Default

Quote:
Originally Posted by MfA
For a new console it would be trivial to shift the burden from storage to computation ... just let the developer send stuff in tile order, no storage needed and hierarchical frustum culling aint that expensive.
OK; my example was not good, cause on an closed plattform you could optimise for the specific chip; but how can you work around this in the PC? Is it possible to do it in drivers (I suppose not)?.
mboeller is offline  
Old 15-Mar-2002, 23:26   #19
Roger Kohli
Member
 
Join Date: Feb 2002
Posts: 99
Default

Mfa:
Quote:
Yes.
How do you know which tile something will appear in before you have applied the transformations?
Roger Kohli is offline  
Old 16-Mar-2002, 00:22   #20
MfA
Regular
 
Join Date: Feb 2002
Posts: 5,227
Send a message via ICQ to MfA
Default

You dont, but the developer can make a conservative guess ... thats what frustum culling is about. A conservative guess is all you need to make it work.

To make it work well you need a good enough guess of course. I said nothing about overhead, I dont want to get into it really ... I just wanted to say that its easy to transform it into a (fairly tractable) computational problem instead of a storage one for a closed platform. I think computational cost will drop fast enough with increasing tile size to make it an attractive option, Im sure others will feel otherwise.
MfA is offline  
Old 16-Mar-2002, 04:42   #21
arjan de lumens
Senior Member
 
Join Date: Feb 2002
Location: gjethus, Norway
Posts: 1,256
Default

Having the developer pass polygon data in tile order doesn't really sound like a good idea. The problem is that the exact tile set covered by each polygon cannot be computed until after at least the transform part of T&L is performed. You can do a conservative approximation by, for each object in a scene, compute the tile set covered by its bounding box/sphere/whatever, and use the data to perform object-level binning, but the result will be that you pass every polygon in the object for every tile covered by the object's bounding box - which is incredibly wasteful in terms of memory bandwidth and T&L (as you keep reading in and T&L-ing the same polygons over and over again) once you get objects covering more than about 2 tiles; at that point, it would be cheaper wrt memory bandwidth usage to just do traditional post-T&L polygon binning.

Of course, you can split objects into smaller sub-objects to get around the efficiency problem, but then you need either really small sub-objects or really large tile buffers (so that the average sub-object dimensions are less than 1/2 of the average tile dimensions). Small sub-objects will cause large memory requirements, and large tile-buffers are expensive. Also, you still have to do a lot of software transforms to get the bounding boxes for all the sub-objects.
arjan de lumens is offline  
Old 16-Mar-2002, 07:47   #22
MfA
Regular
 
Join Date: Feb 2002
Posts: 5,227
Send a message via ICQ to MfA
Default

You would use hierarchies, and memory requirements would always be swamped by the lowest level of the hierarchy (the actual vertices) so I dont quite see how thats an issue.

And you dont have to transform stuff multiple times, you can always store what belongs to other tiles till you render them if it makes sense. Storage cost would be minimal compared to a full screen display list.
MfA is offline  
Old 16-Mar-2002, 14:07   #23
Dave B(TotalVR)
Member
 
Join Date: Feb 2002
Location: Essex, UK (not far from IMGTEC:)
Posts: 491
Send a message via ICQ to Dave B(TotalVR)
Default

I suppose this would be a good point to turn people to my 'Tilers and High poly Counts' article at www.powervr.org.uk.

Well, I would but me bloody site has run out of bandwidth again Should be up soon.

Dave
Dave B(TotalVR) is offline  
Old 17-Mar-2002, 02:43   #24
arjan de lumens
Senior Member
 
Join Date: Feb 2002
Location: gjethus, Norway
Posts: 1,256
Default

Some comments to the article (which I hadn't read before):

12 bytes per vertex sounds awfully little when you have to keep gouraud colors and texture coordinates around on a per-vertex basis. I'd expect backface culling to cull a little more than 50% of all polygons and thus a little less than 50% of all vertices, so you could more reasonably assume 24 bytes per vertex (and half of them culled - this would probably apply equally to the 3dmark2001 test and 'gloom3'.) You don't seem to take into account that the buffers for vertex data have to be written to as well as just read from, which doubles the figure from 1.4 GBytes/sec to 2.8, which still does not take into account vertex pointers and vertices read more than once (which could double the number again). Also: 1600 * 1200pixels * 32 BYTES per pixel * 60 fps * 2 = 7 GBytes/sec matches the number you state for framebuffer traffic, but shouldn't that be 32 BITS rather than 32 BYTES ...? Same applies for the texture bandwidth number as well.

And to MfA:
Actually, when I think about it, doing binning of objects larger than single polygons may be a rather good idea - you could then defer most of T&L (all of it except T for bounding boxes for each 'object') until you actually start to render each tile, much as in IMRs. This would reduce memory usage and traffic substantially, as you would no longer need to buffer all T&Led vertices in off-chip memory all the time. Bin sizes would be much smaller too. With some caching of vertices (before or after T&L), you may even get near-IMR level geometry performance, even in the case of memory bandwidth being the main bottleneck. And it might not require developer support either - the driver could very well process vertex arrays into suitable 'object' hierarchies.

A variant of this method would be to defer tessellation of Higher-Order Surfaces until after binning, till just before rendering; such a scheme may be needed to keep TBRs from choking on the huge polygon counts that HOS tessellation tends to produce.
arjan de lumens is offline  
Old 17-Mar-2002, 02:59   #25
MfA
Regular
 
Join Date: Feb 2002
Posts: 5,227
Send a message via ICQ to MfA
Default

Without the developer being able to give bounding volume hints you arent going to be able to deduce where HOS's or vertex buffers will end up on the screen on the fly, analyzing the vertex shader to see what the hell it does is too much work :(
MfA is offline  

 

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Article on Rambus XDR2 and ever-increasing bandwidth needs Megadrive1988 Console Technology 12 26-Jul-2005 20:41
Xenos - invention of the BackBuffer Processing Unit? Shifty Geezer Console Technology 69 24-May-2005 18:59
GeForce 6200 TurboCache Review Dave Baumann Beyond3D Articles 27 08-Feb-2005 15:27
PowerVR Serie 5 is a DX9 chip? ActionNews 3D Architectures & Chips 269 15-Apr-2003 19:26
Bandwidth - bend it like Beckham? Neeyik 3D Architectures & Chips 17 05-Mar-2003 00:53


All times are GMT +1. The time now is 07:27.


Powered by vBulletin® Version 3.8.6
Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.