Next-gen specs

DarkBlu,
Naomi 2 does the distributed tile method (using 2 rendering chips), and AFAICS, the load balancing was pretty good.

You might get slightly better balancing with a scan line approach but, IMHO, you lose far more due to the decrease in data reuse (i.e. texture caching effectiveness decreases) and the fact that nearly all triangles have to be processed by both rendering chips.
 
Simon,

i know naomi2 did distributed tiling, and that it was said to be pretty good at that. but that does not change the fact that purely statistically using lower-granularity frame subdivision elements should produce better load balancing. AAMOF, distributing 1 pixel to a chip whould produce the best possible load balancing, and that's what multiple pixel pipes at a chip do -- they achieve the optimal load balance* across the chip.

it appears people seem to think of SLI in terms of 3dfx's particular implementation, which had its flows. if naomi2 could be fairly efficient at doing tiles of, erm 32x16, then i see no reason a SLI architecture should not be able to do the same, only at higher load balance efficiency.



*optimal load balance: when no pixel pipeline stays idle when there are still pixels to be drawn.
 
Once again I have to retort with the fact that a tile is smaller than a scanline :) Statistically speaking if we assume an infinite buffer and no stalls only the number of pixels in the screen division's determin the distribution of the difference in work between the two chips.

BTW if you use tilers the different chips would not need to render the screen in strict order, if a tile is done you move on to the next tile ... no stalls no hassle.

Marco

<font size=-1>[ This Message was edited by: MfA on 2002-03-06 18:24 ]</font>
 
BTW if you use tilers the different chips would not need to render the screen in strict order, if a tile is done you move on to the next tile ... no stalls no hassle.

ok, under the premise that a chip could move on to a pending tile at any time _and_ tile size is reasonably small then yes, that'd be close to optimal load balance (optimal reached with tiles of 1x1).
 
DarkBlu,
I know what you are trying to achieve by insisting on having the smallest possible level of 'unit of work' granularity, but you are ignoring a competing factor.

What is the point of, say, saving 5% of the frame rendering time due to the (possibly better) load balancing of SLI if the relative bandwidth requirements double, as would (typically) happen with the texturing?

Marco wrote:BTW if you use tilers the different chips would not need to render the screen in strict order, if a tile is done you move on to the next tile ... no stalls no hassle.
I'm not sure that's a good idea because then you'd either have to distribute the database to both chips' local memory or have a shared memory system.
_________________
"Your work is both good and original. Unfortunately the part that is good is not original and the part that is original is not good." - Samuel Johnson


<font size=-1>[ This Message was edited by: Simon F on 2002-03-07 09:26 ]</font>
 
Thats depends where the tri's have to come from. To the tiler it doesnt really matter wether they were send to them beforehand or wether it gets them on the fly during rendering.
 
Back
Top