AFR: Preferred SLI Rendering Mode

Mintmaster · Nov 23, 2004

pcchen said:
It automatically solves many problems, including load balancing, geometry scaling problem, and render-to-texture problem.

The render-to-texture problem can still exist for effects requiring persistent buffers. Motion trails (not real motion blur, but the fake kind, e.g. NFS:U), simulations for water or cloth, cube maps that are updated every few frames, etc. Ironically, these are exactly the type of effects that makes the 6x00 series more attractive to me than the Xx00 series.

MfA is right. This is not a true multichip solution, just a way to get some cream from hardcore gamers with lots of cash. The problems I mentioned don't come up that often, so this is the easiest and cheapest route for scalable performance.

Rev, the only reason to make multiple chips for single board solutions is to save costs so that you can use the same chip across your product line. However, you increase costs simultaneously through increased board costs and engineering time to get synchronization sorted out.

In the end, the route ATI and NVidia took with disabling quads is the most sensible approach, as you get just about the same savings in cost, and you don't have to worry about the engineering aspects of two separate chips. The only problem is that you can't control how many chips you get with 8, 12, or 16 functional pipes, so you can't get enough low end chips very easily. But you usually want a different design for low end chips anyway.

Dave B(TotalVR) · Nov 23, 2004

I just thought I'd add......

That when a you are rendering a frame with your GPU your CPU can already be starting to calculate data for the graphics card. So if you're CPU limited then any latency associated with this will be hidden by pipelining.

Lecram25 · Nov 29, 2004

Reverend said:
MfA said:

The problem is that all these methods are driver hacks more than anything else, just an afterthought to push a few more boards to niche markets for which it isnt worth spending any real money on during hardware design.

Click to expand...

So 3dfx's "SLI design from the get go (best on a single board)" should be adopted by the current IHVs? Wonder why neither ATI nor (especially) NVIDIA has adopted this since 1999...

Because nVidia doesn't have the rights. SLI (Scanline Interleave) belongs to an individual who does not work for nVidia. This is why we're seeing "SLI" in a split frame rendering mode, instead of the "expected" rendering of the alternate lines on/of the frame.

Cryect · Nov 29, 2004

Lecram25 said:
Reverend said:

MfA said:

The problem is that all these methods are driver hacks more than anything else, just an afterthought to push a few more boards to niche markets for which it isnt worth spending any real money on during hardware design.

Click to expand...

So 3dfx's "SLI design from the get go (best on a single board)" should be adopted by the current IHVs? Wonder why neither ATI nor (especially) NVIDIA has adopted this since 1999...

Click to expand...

Because nVidia doesn't have the rights. SLI (Scanline Interleave) belongs to an individual who does not work for nVidia. This is why we're seeing "SLI" in a split frame rendering mode, instead of the "expected" rendering of the alternate lines on/of the frame.

I wouldn't really considering SLI the best, lots of work is being recalculated. Its pretty close to the best when T&L isn't being handled on chip but elsewhere so T&L operations aren't redundantly occuring (resulting in a T&L bottleneck). Also SLI make final frame composition more of a pain (either it breaks heirarchal framebuffers or requires a lot of memory bandwidth each frame to recopy from both render buffers alternating between lines). Also if the pixel rendering part requires any triangle setup then that work is also being redone (various hardware implementations vary on if any triangle setup occurs with the pixel shader/TMU/pixel pipeline segment).

Dave Baumann · Nov 29, 2004

HierZ / Onchip ZCULL Routines shouldn't be broken by SLI - in fact if you have stepped band or tile sizes (not the load balanced method in this case) then the HierZ / ZCULL can just operate on those regions allowing higher high frame buffer sizes still to get full benefit from the onboard storage.

Fixed / known (when an app starts / changes resolution) split screen / tile sizes can also save on framebuffer space, allowing for more texture space, as the buffer is sized to the portion of the screen being rendered on each board.

Yes, the other disadvantages are still there and these benefits can't be realised with the load balanced method either.

Cryect · Nov 29, 2004

DaveBaumann said:
HierZ / Onchip ZCULL Routines shouldn't be broken by SLI - in fact if you have stepped band or tile sizes (not the load balanced method in this case) then the HierZ / ZCULL can just operate on those regions allowing higher high frame buffer sizes still to get full benefit from the onboard storage.

Thanks, I see my falacy was in that I was for some reason tying the data structure of the Z-Buffer to the framebuffer. Therefore I was thinking you need to keep Z-Buffer continigious for the framebuffer to be conitigious not sure why I was thinking that at that moment (prolly due to working on a project proposal and was thinking of data structures I will need to implement and associated z and color as being in the same structure)

KimB · Nov 29, 2004

DaveBaumann said:
Yes, the other disadvantages are still there and these benefits can't be realised with the load balanced method either.

Well, I don't know about that. One of the problems with SLI is that due to proximity, there's a lot of texture data that ends up being shared between the two chips/cards. That is, the texture cache isn't made maximum use of because there's too much texture data on the boundary between the two chips (this is obviously worst when the lines are one pixel in width, as, I believe, was true with the original SLI implementation, but got better with wider lines).

The #1 benefit of an SLI method is that load balancing is pretty much automatic. On almost every frame, each video card will be computing almost the exact same amount of data. So, the technique is basically a way of sacrificing efficiency for keeping both chips active as much as possible in a trivial fashion.

So, in essence, if you can get another method to work that is fundamentally more efficient, and can overcome the load balancing issues to a large enough degree, it's going to be more efficient than basic SLI.

KimB · Nov 29, 2004

DaveBaumann said:
HierZ / Onchip ZCULL Routines shouldn't be broken by SLI - in fact if you have stepped band or tile sizes (not the load balanced method in this case) then the HierZ / ZCULL can just operate on those regions allowing higher high frame buffer sizes still to get full benefit from the onboard storage.

True, but it is going to be less efficient, unless the SLI lines are as wide or wider than the tiles.

AFR: Preferred SLI Rendering Mode

Mintmaster

Dave B(TotalVR)

Lecram25

Cryect

Dave Baumann

Gamerscore Wh...

Cryect

KimB

KimB

Similar threads