If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.
|
|
#1 |
|
Senior Member
Join Date: Feb 2002
Posts: 2,019
|
Does a tile based renderer like the Kyro reduce texture bandwidth more than an immediate mode renderer with an early z reject unit or does it just save z bandwidth?
I'm thinking that a TBR doesn't have any extra texture benefit in this case, but I haven't really thought about it too hard yet. |
|
|
|
|
#2 |
|
Member
|
The only way that the two can be comparable is if the scene has a strict front-to-back rendering order, and even then a deferred architecture will have a slight benifit.
|
|
|
|
|
#3 |
|
Regular
|
For framebuffer bandwith it will always win out, ignoring very small tiled textures ... though personally I would not like hardware for which performance would break down as soon you have lots of individual texturing, so you need the bandwith anyway.
Bandwith needed for geometry is shifting ... increasing against the tilers favour (since it can tripple it). With immediate mode API's tiling has its greatest advantage with relatively low poly counts and relatively simple pixel operation with multi-pass rendering, lots of back to front overdraw and with anti-aliasing. Or in other words their greatest advantage would have been in the past (they fucked up royal). The time is running out for them, they have to hit it big soon and introduce some major API extensions to be able to even try to compete in the future. (For instance, if developers start heavily using NVIDIA's occlusion culling support where will that leave IMG? They would need to include in the API the ability to associate bounding volumes with geometry to be able to do the same thing as developers do in software with immediate mode rendering and feedback ... and when I say API I of course mean D3D, so they need leverage with m$, this would also nicely solve the geometry problem since it could just tile bounding volumes.) |
|
|
|
|
#4 |
|
Senior Member
Join Date: Feb 2002
Posts: 1,865
|
MfA wrote:
"Bandwith needed for geometry is shifting ... increasing against the tilers favour (since it can tripple it). With immediate mode API's tiling has its greatest advantage with relatively low poly counts and relatively simple pixel operation with multi-pass rendering, lots of back to front overdraw and with anti-aliasing. Or in other words their greatest advantage would have been in the past (they fucked up royal). " As far as I can see your list of pros and cons seems OK. But while poly counts most definitely is rising, I can also see multi-pass rendering increasing, overdraw increasing as complex outdoor areas and lots of mobile players/creatures/objects gain more widespread use, and anti-aliasing that gives good framerates is on most everybodys' wish list. So by my limited understanding, some of the benefits of TBRs will continue to be valued and perhaps even increase a bit in importance. Not that it matters one whit if noone brings compelling hardware to the market. Entropy |
|
|
|
|
#5 |
|
Member
Join Date: Feb 2002
Location: Eastern Washington
Posts: 85
|
So will TBR renderers actually slow down games in the future if they have high enough polygon counts?
|
|
|
|
|
#6 | |
|
Senior Member
Join Date: Feb 2002
Posts: 2,019
|
Quote:
I've been thinking of a feature that defered architectures could implement, maybe you all have an idea if it would work or not or if it can already be done. Currently defered renderers transform all polygons to screen space before rasterization. Could this data be read by the CPU so host effects could be performed with the results being written back to the graphics card? I'm sure developers could think of something to do with this kind of flexibility. Of course, an argument against this is to just make pixel shaders flexible enough to do everything someone might think of to do. |
|
|
|
|
|
#7 | |
|
Junior Member
Join Date: Feb 2002
Posts: 87
|
Quote:
You may find Kristof's PowerVR article interesting: http://216.12.218.25/domain/www.beyo...ing/index1.php |
|
|
|
|
|
#8 | |
|
Junior Member
|
Quote:
|
|
|
|
|
|
#9 |
|
Member
Join Date: Feb 2002
Location: Eastern Washington
Posts: 85
|
What about when wee get to DX9. Won't it introduce more HOS? This would help reduce the polys by quite a bit wouldn't it?
|
|
|
|
|
#10 |
|
Member
Join Date: Feb 2002
Location: Germany
Posts: 845
|
Based on the Article from Kristof the Scene-Buffer requirements seem really high; so I made an crude calculation :
20 Mio Vertex/sec @ 60fps => ~ 334000 vert's/frame One vertex is around 64byte One pointer is an 32bit number => 2 x 64byte x 334000 + (32/8)byte x 334000 = ~ 42 Mbyte !! 2 x 42Mbyte @ 60fps (read + write ) = ~ 5 GB/sec bandwidth !! This numbers are rather high. Could this be correct?? Based on this numbers ( if correct!) I can understand that m$ uses an IMR in the XBox, cause the bandwidth demand for the scene-buffer alone would exceed the useful bandwidth of the XBox. And the scene-buffer would need most of the memory of the XBox, so only a small amount is left for the game and content. Manfred |
|
|
|
|
#11 | |
|
Gamerscore Wh...
Join Date: Jan 2002
Posts: 12,950
|
Quote:
Storing the pre-tesselated algorithm data in the bin will reduce poly load for the bin but then you have to question of what to do when you actually start rendering the tiles. If the HOS data covers many tiles then you either have to tesslate all the HOS data for all the tiles it covers once you meet the first tile that that contains some HOS information; or you re-tessellate several times for each individual tile which could increase the poly load on the T&L. Of course, this assume you have an architecture what can tessellate and transform HOS in hardware; KYRO, for instance, could only ever store the post tessellated information in the bin as the CPU has to tessellate and transform everything. |
|
|
|
|
|
#12 | |
|
Senior Member
Join Date: Jan 2002
Location: Abbots Langley
Posts: 732
|
Quote:
K- |
|
|
|
|
|
#13 | ||
|
Member
Join Date: Feb 2002
Location: Germany
Posts: 845
|
Quote:
Can You give an better, more accurate example (based on the 20Mio Vert's/sec)? Or is this proprietary knowledge? An real example would help to see the possibilities and drawbacks of TBR's better. I think quite a lot of people will use simplified examples like mine above to calculate the storage and bandwidth demands of an TBR and come to the same wrong assumptions. Manfred |
||
|
|
|
|
#14 |
|
Regular
|
For a new console it would be trivial to shift the burden from storage to computation ... just let the developer send stuff in tile order, no storage needed and hierarchical frustum culling aint that expensive.
|
|
|
|
|
#15 |
|
Member
Join Date: Feb 2002
Posts: 99
|
Would that work if you wanted to use the GPU's T&L unit?
|
|
|
|
|
#16 |
|
Regular
|
Yes.
|
|
|
|
|
#17 |
|
Crazy coder
|
Exactly what is "just let the developer send stuff in tile order" supposed to mean?
|
|
|
|
|
#18 | |
|
Member
Join Date: Feb 2002
Location: Germany
Posts: 845
|
Quote:
|
|
|
|
|
|
#19 | |
|
Member
Join Date: Feb 2002
Posts: 99
|
Mfa:
Quote:
|
|
|
|
|
|
#20 |
|
Regular
|
You dont, but the developer can make a conservative guess ... thats what frustum culling is about. A conservative guess is all you need to make it work.
To make it work well you need a good enough guess of course. I said nothing about overhead, I dont want to get into it really ... I just wanted to say that its easy to transform it into a (fairly tractable) computational problem instead of a storage one for a closed platform. I think computational cost will drop fast enough with increasing tile size to make it an attractive option, Im sure others will feel otherwise. |
|
|
|
|
#21 |
|
Senior Member
Join Date: Feb 2002
Location: gjethus, Norway
Posts: 1,256
|
Having the developer pass polygon data in tile order doesn't really sound like a good idea. The problem is that the exact tile set covered by each polygon cannot be computed until after at least the transform part of T&L is performed. You can do a conservative approximation by, for each object in a scene, compute the tile set covered by its bounding box/sphere/whatever, and use the data to perform object-level binning, but the result will be that you pass every polygon in the object for every tile covered by the object's bounding box - which is incredibly wasteful in terms of memory bandwidth and T&L (as you keep reading in and T&L-ing the same polygons over and over again) once you get objects covering more than about 2 tiles; at that point, it would be cheaper wrt memory bandwidth usage to just do traditional post-T&L polygon binning.
Of course, you can split objects into smaller sub-objects to get around the efficiency problem, but then you need either really small sub-objects or really large tile buffers (so that the average sub-object dimensions are less than 1/2 of the average tile dimensions). Small sub-objects will cause large memory requirements, and large tile-buffers are expensive. Also, you still have to do a lot of software transforms to get the bounding boxes for all the sub-objects. |
|
|
|
|
#22 |
|
Regular
|
You would use hierarchies, and memory requirements would always be swamped by the lowest level of the hierarchy (the actual vertices) so I dont quite see how thats an issue.
And you dont have to transform stuff multiple times, you can always store what belongs to other tiles till you render them if it makes sense. Storage cost would be minimal compared to a full screen display list. |
|
|
|
|
#23 |
|
Member
|
I suppose this would be a good point to turn people to my 'Tilers and High poly Counts' article at www.powervr.org.uk.
Well, I would but me bloody site has run out of bandwidth again Dave |
|
|
|
|
#24 |
|
Senior Member
Join Date: Feb 2002
Location: gjethus, Norway
Posts: 1,256
|
Some comments to the article (which I hadn't read before):
12 bytes per vertex sounds awfully little when you have to keep gouraud colors and texture coordinates around on a per-vertex basis. I'd expect backface culling to cull a little more than 50% of all polygons and thus a little less than 50% of all vertices, so you could more reasonably assume 24 bytes per vertex (and half of them culled - this would probably apply equally to the 3dmark2001 test and 'gloom3'.) You don't seem to take into account that the buffers for vertex data have to be written to as well as just read from, which doubles the figure from 1.4 GBytes/sec to 2.8, which still does not take into account vertex pointers and vertices read more than once (which could double the number again). Also: 1600 * 1200pixels * 32 BYTES per pixel * 60 fps * 2 = 7 GBytes/sec matches the number you state for framebuffer traffic, but shouldn't that be 32 BITS rather than 32 BYTES ...? Same applies for the texture bandwidth number as well. And to MfA: Actually, when I think about it, doing binning of objects larger than single polygons may be a rather good idea - you could then defer most of T&L (all of it except T for bounding boxes for each 'object') until you actually start to render each tile, much as in IMRs. This would reduce memory usage and traffic substantially, as you would no longer need to buffer all T&Led vertices in off-chip memory all the time. Bin sizes would be much smaller too. With some caching of vertices (before or after T&L), you may even get near-IMR level geometry performance, even in the case of memory bandwidth being the main bottleneck. And it might not require developer support either - the driver could very well process vertex arrays into suitable 'object' hierarchies. A variant of this method would be to defer tessellation of Higher-Order Surfaces until after binning, till just before rendering; such a scheme may be needed to keep TBRs from choking on the huge polygon counts that HOS tessellation tends to produce. |
|
|
|
|
#25 |
|
Regular
|
Without the developer being able to give bounding volume hints you arent going to be able to deduce where HOS's or vertex buffers will end up on the screen on the fly, analyzing the vertex shader to see what the hell it does is too much work :(
|
|
|
| Thread Tools | |
| Display Modes | |
|
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Article on Rambus XDR2 and ever-increasing bandwidth needs | Megadrive1988 | Console Technology | 12 | 26-Jul-2005 20:41 |
| Xenos - invention of the BackBuffer Processing Unit? | Shifty Geezer | Console Technology | 69 | 24-May-2005 18:59 |
| GeForce 6200 TurboCache Review | Dave Baumann | Beyond3D Articles | 27 | 08-Feb-2005 15:27 |
| PowerVR Serie 5 is a DX9 chip? | ActionNews | 3D Architectures & Chips | 269 | 15-Apr-2003 19:26 |
| Bandwidth - bend it like Beckham? | Neeyik | 3D Architectures & Chips | 17 | 05-Mar-2003 00:53 |