Opinions needed on this Interview

neliz said:
Reverend said:
HDR doesn't mean just the sky; it should affect the entire scene.

Yes, but in the FC example it would be most noticeable and work intensive in the top half of your screen, say around trees...
Well, if that's the way you see it...

What about full-scene spatial anti-aliasing then? Less jagged edges would probably be most noticeable... but is specificly getting rid of the jaggies more "work intensive" for a chip when FSAA is specified?

Perhaps you don't have the correct perception of how HDR works (or how it relates to tiling)?
 
Tridam said:
neliz said:
Reverend said:
HDR doesn't mean just the sky; it should affect the entire scene.

Yes, but in the FC example it would be most noticeable and work intensive in the top half of your screen, say around trees..

I just want tiling..

It's not because something is more noticeable that it requires more work and it's not because something is not noticeable that it requires no work.

exactly, HDR works on the entire scene regarldess of what can be noticed or not. Tiling will cause issues with HDR, as it is a huge texture. on SLI systems or any system for that matter, it is prefered to use a multiple rendertarget system this will cut down on bandwidth requirements as smaller textures are used.

This is the reason why you don't see a subtantial performance increase on SLI systems with Far Cry with HDR on. Secondly thier levels weren't made with HDR in mind which causes page flipping.
 
neliz said:
Let's say, that one card can't finish all it's tiles, would it be able to drop the output for one frame to work on the next one because simply, it has no time to finish that frame?
It would then signal the master card that workload is too high and it wants less work for the next frame.

Dropping frames seems highly inefficient (and can introduce all sorts of oddities I would imagine) as you would've wasted the time spent on the other card finishing its portion of the work. I also don't see how this improves minimum or average framerates. Using the previous frame's completion times to determine the workload split of the upcoming frame is still the best compromise I've seen so far regarding load-balancing.
 
Reverend said:
Well, if that's the way you see it...

What about full-scene spatial anti-aliasing then? Less jagged edges would probably be most noticeable... but is specificly getting rid of the jaggies more "work intensive" for a chip when FSAA is specified?
Bad example actually, because more edges are indeed more work.
 
Xmas said:
Reverend said:
Well, if that's the way you see it...

What about full-scene spatial anti-aliasing then? Less jagged edges would probably be most noticeable... but is specificly getting rid of the jaggies more "work intensive" for a chip when FSAA is specified?
Bad example actually, because more edges are indeed more work.
Even if we're talking SS?
 
trinibwoy said:
Dropping frames seems highly inefficient (and can introduce all sorts of oddities I would imagine) as you would've wasted the time spent on the other card finishing its portion of the work. I also don't see how this improves minimum or average framerates. Using the previous frame's completion times to determine the workload split of the upcoming frame is still the best compromise I've seen so far regarding load-balancing.

Sorry, meant to say "tile" instead of frame, you don't want to drop frames offcourse, but some tiles (in a 16x16 environment) would not be noticeable...

anyway.. we'll see.. just waiting to see a final set-up and ati demonstrate it's favourite Collin Mcrae '05..
 
[standard "I r dumb" disclaimer]

Going back to the tiling thing mentioned a few pages back, wrt multiple cards with different performances etc, surely the easy solution is just for the driver to have a table listing per-quad effective fill-rate (or whatever measure you want to use) for each card and a simple algorithm to work out the tile order based on this? So if you have card X and card Y, and card X has four quads at speed 1.25 while card Y has three quads at speed 1.0, then card X can do 4*1.25 for every 3*1 card Y can do, giving you a 5:3 workload ratio between cards. That way, LtR your ideal order for (X) tiles will be something like this:
Q1Q5Q2Q3Q6Q4Q7Q1|Q2Q5Q3Q6Q4Q1Q7Q2|
And so on, where the bold quads are card one. (I'm simply laying quads from each card sequentially; there may be a more optimal way). That should give you instant load-balancing between cards of different powers with zero overhead while it's running and while keeping the balancing benefits of tiling. Or did I miss something important?
 
Why bother with a lookup table? Instead just detect if the boards have changed and then run a fill-rate test before allowing the user to activate dual board rendering and that test will give a performance ratio.
 
It'll be a close approximation regarding fillrate but what about bandwidth and other efficiency differences? There's also the issue of how flexible such an arrangement would be. Would they restrict it to within a certain generation or will R6xx cards be able to run with R5xx? If ATI goes with tiling across cards, I bet they'll require that similar cards be used.

Dave, that will give you a good approximation of the relative power of each board. But how does that help with load-balancing where the workload is not predictable?
 
Dave's method would really only work with the same generation of chips (eg. a X800 XL combined with a X800 XT). I think it would be not so valid when dealing with other generations (eg. X800 XT with a R520).

While filrate is all well and good, I think those different products will show a big difference in shader processing power, and would therefore rendering their filrates moot.
 
I can't think how many times I've said this before: It will be highly unlikely that two different generations of boards will operate together (if they decided they want to enable two different boards in the same system in the first place - not that great fo AFR) because there could well be quality differences betwen them: shader precisions, filtering algorithm differences, FSAA differences, etc., etc. and, even though those differences may be small, may be noticable when multiple chips/boards are rending parts of the same frame or alternate frames.

As for calculating a performance ratio and apportioning the workload on a tiled system then you an still try and distribute the tiles between the boards across the screen it give a reasonable inherant load balance (i.e. don't give one board an enourmous block in one corner of the screen, but distribute the tiles rendered around sceen for each of the render devices).
 
Well I posted this comparison of fill-rates earlier:

b3d15.gif


You can see the raw data it's based on here:

http://www.beyond3d.com/forum/viewtopic.php?p=505031#505031

I'm not sure where the threshold lies (in terms of percentages), beyond which you simply want to ignore the weaker card. You've also got to compare pixel shader efficiency and hope that you're not vertex limited on the weaker card.

If a generation of cards (e.g. X8xx) lasts 2 years, I suppose it could be attractive to mix an early mid-range card with an end-of-line high-end card that only costs mid-range money 2 years after the generation started.

MVP, if it comes in at the start of R520 might be the best generation to test this with. With M$ making noises about Longhorn "definitely being here by Christmas 2006" (oh dear!) it seems likely to me you'll be able to buy a high-end R5xx based card in the spring of 2007 for mid-range money. Question is, will you want to?...

Jawed
 
DaveBaumann said:
Why bother with a lookup table? Instead just detect if the boards have changed and then run a fill-rate test before allowing the user to activate dual board rendering and that test will give a performance ratio.

Fair enough; doesn't matter particularly how the data's acquired, and if it's fairly straightforward to do automatically I agree it makes sense to do it that way rather than rely on device IDs and such. The main point is that if you can work out performance ratios between cards and between all the available quads, it should be reasonably trivial to have a tiling systems which runs all quads/cards at full capacity while "load-balancing" everything (by virtue of the tiling system) without any additional calculations during rendering (in contrast with load-balanced SFR). Seems like a pretty useful arrangement to me.

[edit] Even if it's only doing trivial amounts of work (say 10% of total rendering), if there's no real overhead involved I don't see why you'd ever want to ignore the weaker card, as in the 10% example you should probably still get a 5-10% speed boost.

Presumably recombining the image is not a big problem? I'd always assumed that the horizontal split-line was to make it easier to recombine signals*, but it sounds like this was more to do with it being a better solution.

*Originally formulated with the Alienware setup, in which I'd imagine horizontal combining would be more problematic as, as far as I could tell, they were just recombining output signals, which would probably be easier to do horizontally as that's the way scan lines run


Anyway, feel free to ignore any of this if it's not interesting, I'm just rambling at this point.
 
Charmaka said:
[edit] Even if it's only doing trivial amounts of work (say 10% of total rendering), if there's no real overhead involved I don't see why you'd ever want to ignore the weaker card, as in the 10% example you should probably still get a 5-10% speed boost.

Such a relatively slow card is useless for AFR.
 
trinibwoy said:
Charmaka said:
[edit] Even if it's only doing trivial amounts of work (say 10% of total rendering), if there's no real overhead involved I don't see why you'd ever want to ignore the weaker card, as in the 10% example you should probably still get a 5-10% speed boost.

Such a relatively slow card is useless for AFR.

When you're tiling the two cards together to make one frame, AFR isn't even in the picture.

On the other hand, sending the data from the weak card, if it's slow enough, might be slower than the faster card just generating those tiles itself. e.g. imagine RV515 and R580, 1 quad and 6 quads respectively, the former with a 64-bit bus, the latter with 256-bit bus (yeah, I'm making all this up). Looks doomed to me.

Jawed
 
Jawed said:
trinibwoy said:
Such a relatively slow card is useless for AFR.

When you're tiling the two cards together to make one frame, AFR isn't even in the picture.

Yeah I know. What I meant was that nobody would bother with a card combination that works with tiling but isn't viable for AFR.
 
It's highly unlikely that ATI will apportion themselves more work with a multi-participant consumer solution than they absolutely have to.

It'll most likely be pairing identical SKUs and that'll be that, at least initially. Either way, we'll see soon.
 
If it's ok, bumpski for this Hexus article which I can't see being discussed anywhere obvious. Obviously it's not "confirmed" as such, but then what is?

Interesting points:
-The name Hexus is putting forward is Multi Video Processing
-It's tile-based (although may support AFR too), and incidentally min tile size is 32 pixels
-It requires a "master" card which must have the appropriate technology enabled - not available on the market yet
-It will work with any R480-derived "PCIe" card, including ATI onboard graphics
-Obviously this means they don't have to be the same model, as they're only talking about X850 masters at this stage
-The hardware to do this (master? or just slave?) has been "built into every top-end card since the 9700 Pro"
-It may accommodate three or more VPUs (for example two PCIe cards and an ATI onboard graphics chip
-It will include an external dongle connector for at least some card combinations
-Core logic compatability hasn't been announced yet
 
Back
Top