More SLI

That's pretty interesting, since it seems like 3DMark03 is so highly geometry-limited on these boards that you wouldn't think that you could get that much higher. Perhaps nVidia's drivers are effectively sharing geometry power between the two boards (Maybe the SLI link that connects the cards allows nVidia to only send each piece of geometry once? That would require a good bit of caching...).
 
You might want to consider what modes are available in the SLI drivers and what is most likely to yield the highest gains in geometry limited situations.
 
Well, yeah, I suppose they could simply be using alternate frame rendering. This isn't the best solution for a real game, unfortunately, but may offer the best performance here.
 
max-pain said:
http://www.gamespot.com/news/2004/10/08/news_6110146.html

Nvidia SLI benchmark results

...
The Opteron 250 GeForce 6800 GT SLI system scored an impressive 8,081 points in the recently released 3DMark05 benchmark. In comparison, a single GeForce 6800 GT scored 4,452 points on a slower 2.8GHz Intel Pentium 4 processor in a recent GameSpot 3DMark05 test. The SLI system also scored 18,176 points in 3DMark03, roughly 7,000 points higher than the average score for a single GeForce 6800 GT card.
...

How did they go from 7000 on 2xUltra to 8000 on 2xGT ?
 
trinibwoy said:
How did they go from 7000 on 2xUltra to 8000 on 2xGT ?

Changed SLI mode to Alternate Frame Rendering ?

You might want to consider what modes are available in the SLI drivers and what is most likely to yield the highest gains in geometry limited situations.

Well, yeah, I suppose they could simply be using alternate frame rendering. This isn't the best solution for a real game, unfortunately, but may offer the best performance here.
 
Bjorn said:
Changed SLI mode to Alternate Frame Rendering ?

I saw that but that would be a ridiculous difference in efficiency between modes. Based on several reviews a single Ultra is approx 10-15% faster than the GT in 3dmark05. So I would expect 2xUltra to be approx 10% faster than 2xGT. Now you're saying that AFR mode on 2xGT not only overcomes that 10% deficit but adds an additional 14% on top of that ? And how do we know that the Ultra score wasn't in AFR also? And how do we know that the GT score was even AFR in the first place? Aaaaargh!!!!
 
Bjorn said:
trinibwoy said:
.. And how do we know that the GT score was even AFR in the first place? Aaaaargh!!!!

Could just be immature SLI drivers of course.

The age old plaster for every 3D sore :) I'm happy to see this though - the earlier score on 2xUltra was a bit disheartening. Also, I would think that Nvidia's SLI should work best on Nvidia's SLI capable chipset so maybe the platform is a big factor too.
 
trinibwoy said:
I saw that but that would be a ridiculous difference in efficiency between modes. Based on several reviews a single Ultra is approx 10-15% faster than the GT in 3dmark05. So I would expect 2xUltra to be approx 10% faster than 2xGT. Now you're saying that AFR mode on 2xGT not only overcomes that 10% deficit but adds an additional 14% on top of that ? And how do we know that the Ultra score wasn't in AFR also? And how do we know that the GT score was even AFR in the first place? Aaaaargh!!!!
If indeed the difference was a choice between AFR and split-screen rendering, I would definitely expect a large difference between the scores. Since it appears that 3DMark05 is heavily geometry-limited, the fact that with AFR you're not going to be processing any geometry twice could lead to a tremendous increase in performance.

Alternatively, as Bjorn stated, it could simply be that the SLI drivers have been updated to remove some inefficiencies that existed when that previous score from the SLI'd Ultras was released.
 
Chalnoth said:
If indeed the difference was a choice between AFR and split-screen rendering, I would definitely expect a large difference between the scores. Since it appears that 3DMark05 is heavily geometry-limited, the fact that with AFR you're not going to be processing any geometry twice could lead to a tremendous increase in performance.

Understandable but I still have some doubts.

1. Wouldn't culling significantly reduce the overlap in geometry processed by both cards? I can't see such a small overlap accounting for such a large boost. Or is there additional overhead associated with split-screen rendering?
2. I saw an earlier comment about AFR not being ideal for games. Why is that?
 
1.
The only culling that can be done is post vertex shader processing, and will thus not save geometry calculation. (If a higher level API was used, it could be done better, but that won't happen.)

2.
AFR will increase framerate, but not decrease latency. So while it might look better for a spectator, it won't help the feeling while playing.
 
Basic said:
1.
The only culling that can be done is post vertex shader processing, and will thus not save geometry calculation. (If a higher level API was used, it could be done better, but that won't happen.)

Huh? I thought culling was one of the earliest and most basic operations in the 3D pipeline?

2.
AFR will increase framerate, but not decrease latency. So while it might look better for a spectator, it won't help the feeling while playing.

I think I understand this but care to add some color?
 
trinibwoy said:
2.
AFR will increase framerate, but not decrease latency. So while it might look better for a spectator, it won't help the feeling while playing.

I think I understand this but care to add some color?
because the length of time it takes to render a frame is not diminished (the times are just staggered a bit so that one card can renders one frame and the other renders the next), it's going to seem as if you're still playing at 20FPS when it seems to be running at 40FPS--everything will have that 20FPS delay.
 
With AFR you are always deferring the rendering by a (another) frame.

With "immediate mode rendering" (i.e. most normal rasterisers and the SLI setup in split screen or true SLI solutions) the CPU so sending the relevant data for that frame to begin rendering and the board will start processing the geometry as it comes in (more or less, I believe there is a bit less than a frame of delay under normal circumstances). However, with AFR the rendering is delayed a frame - on the first frame the CPU will send the data to board 1, then when that frame of data is finished board 1 will begin the rendering process, whilst the CPU will begin sending the data to board two for the next frame. When board 1 has finished rendering frame 1 and the CPU finishes up sending the data to for frame 2 to board two, board two renders that frame and the CPU starts sending the data for frame 3 to board 1.

The issue here is that you are always seeing one frame behing the rendering.
 
trinibwoy said:
Basic said:
1.
The only culling that can be done is post vertex shader processing, and will thus not save geometry calculation. (If a higher level API was used, it could be done better, but that won't happen.)

Huh? I thought culling was one of the earliest and most basic operations in the 3D pipeline?
For applications it's easy enough to do, because they (should) know about "objects" and "models". A 3D application can rather trivially cull a "model" by checking its bounding spheres or some such.

The driver does not have this bounding volume information. It can deduce it by looking at the geometry, but that's going to get ugly very quickly. The driver only knows about vertex buffers, stream sources and draw calls.

A vertex buffer can hold vertex data for multiple "objects", or multiple vertex buffers can be combined to source the vertex data for a single "object", or you can make a weird mixture of both. You could extract bounding volume information for some subsets of the live vertex buffers, but please trust my judgement when I tell you that it's both complicated and computationally expensive.

In "real world" situations drivers generally won't try to do that. During the time that's used to extract volume information, the chip could have just pulled a few extra millions of vertices, and there's no guarantee that the gained information is going to be reused. The vertex buffer might be discarded by the app immediately after the deed. It's just not a win. It might be a viable avenue for benchmark specific "optimization" though, but even then it would be much easier to just do in-house analysis instead of implementing all that crud in shipping driver code.
trinibwoy said:
2.
AFR will increase framerate, but not decrease latency. So while it might look better for a spectator, it won't help the feeling while playing.
I think I understand this but care to add some color?
AFR does not make rendering of a single frame any faster. To get any performance gain, the two chips must work on consecutive frames at the same time. The two frames will however not have been dispatched at the same time ...
 
Much thanks for all the responses. Nothing more humbling than a barrage of explanations on a technical subject :oops: - now I know how digi must feel :LOL:
 
Basic said:
1.
The only culling that can be done is post vertex shader processing, and will thus not save geometry calculation. (If a higher level API was used, it could be done better, but that won't happen.)
Not necessarily. You could, for example, transform the plane that divides the top from the bottom portion of the screen into world space, and determine with a simple dot product whether or not each vertex is above or below this plane. The only problem with this is that if you are geometry limited, that's quite a bit of processing to do. It's much more efficient to do it at a higher level where the comparison can be done via bounding boxes.

Basic said:
2.
AFR will increase framerate, but not decrease latency. So while it might look better for a spectator, it won't help the feeling while playing.
This is only partially true. You are only adding one extra frame of latency, but there are already a number of frames of latency in normal rendering anyway (about 2-3 in the driver, and 1-2 due to double or triple buffering). So it will still be a benefit even in the case where without SLI the framerate would be low. From this perspective, total latency will be reduced with AFR.

No, the real problem isn't this, but rather syncing rendering between the cards. If you remember framerate graphs of the ATI Rage Fury MAXX, for example, with double buffering enabled it tended to sort of "ring," that is, every other frame was slow. That sort of effect would be very noticeable. Hopefully nVidia enables triple buffering by default with AFR enabled.

Another problem with AFR is, of course, that you don't get the memory size savings of splitting the framebuffer between the two video cards. This will, in some games, make it a bit harder to run at the higher resolution/FSAA setting that you would expect SLI to get you.
 
Back
Top