More SLI

KimB · Oct 14, 2004

It has to. You don't want the CPU and the GPU to be working on the same frame, so you have to have a double (or more)-buffering scenario between the CPU and GPU just like you do between the GPU and monitor. Here's basically want you want:

Every time the GPU finishes a frame, there should be a frame from the CPU ready to render. So, the driver needs a buffer of frame data to be sent to the GPU. Additionally, when the CPU is done with its work, it wants to have an empty buffer of data so that it can continue working on the next frame.

Thus, for smooth operation, you want to have:
1) The frame the GPU is working on
2) The frame waiting to be sent to the GPU
3) The frame the CPU just finished
4) The frame the CPU is currently working on

If you examine this, then you should notice that the situation I posted above really is an optimal situation, when buffer (3) is perpetually empty. But, also keep in mind that the situation I posted was 33 fps, which is on the low edge of interactive framerates. Good framerates obviously include lower latencies.

As a side comment, a classic situation where this particular way of rendering became horribly noticeable was with the original Unreal. One of the problems with the original Unreal, early-on, was that some Direct3D drivers produced in the game what was reported as "mouse lag," which was really just too many frames being buffered before rendering, which resulted in a very noticeable lag between input and display. DirectX has set limits on the number of frames displayed, which I seem to recall is in the range of 5-6, hence my previous statement.

With this said, what you should notice is that if the framerate is low, you really want to have as little latency as possible, and thus fewer frames cached. But if the framerate is high, it becomes very desireable to have more frames cached so that the framerate stays high.

And lastly, do not forget that the ping seen in online multiplayer games really isn't a good indication of what sort of latency is noticeable to us, since bandwidth is also a significant factor.

OpenGL guy · Oct 14, 2004

Chalnoth said:
It has to. You don't want the CPU and the GPU to be working on the same frame, so you have to have a double (or more)-buffering scenario between the CPU and GPU just like you do between the GPU and monitor. Here's basically want you want:

Every time the GPU finishes a frame, there should be a frame from the CPU ready to render. So, the driver needs a buffer of frame data to be sent to the GPU. Additionally, when the CPU is done with its work, it wants to have an empty buffer of data so that it can continue working on the next frame.

Thus, for smooth operation, you want to have:
1) The frame the GPU is working on
2) The frame waiting to be sent to the GPU
3) The frame the CPU just finished
4) The frame the CPU is currently working on

0) The frame currently being displayed

This is far too much latency. If you take your 33 fps as an example, the player's input would not be shown for 4 frames according to your flow. That's 1/8 of a second latency and would give a very poor experience.

I believe that steps 2 and 3 in your flow should be the same step. That's enough queued frames to keep everyone busy.

KimB · Oct 14, 2004

Well, not quite, because each step doesn't necessarily need to take the full 33ms. Specifically, there's no a priori reason why buffers 2 and 3 must always have data. But if you don't have both, and the data load is roughly equal, then you're going to have more stalls. But you may want to have the driver optimize for the case where the system is perpetually GPU-limited, in which case you'd really want to cut out 3.

That said, I don't see why 1/8th of a second would be all that terrible. This is, after all, approximately the amount of time it takes for our eye to record and send one full picture. And don't forget that this is just 33 fps. 60 fps is often considered a much better framerate, and it's typically not because 33 fps wouldn't look good for a non-interactive scenario. Most gameplay is just too chaotic for there to be much difference between the 60 and 30 fps framerates if it wasn't interactive. Latency is what is typically noticed. Therefore, if we go by your claim that 1/8th of a second would be noticeable, then that would account for the noticed significant difference between 60 fps and 30 fps. Add to that that 20fps is where latency starts getting really bad (I claim this would be a total latency averaging about 150-175ms), and I think it makes sense.

OpenGL guy · Oct 14, 2004

Chalnoth said:
Well, not quite, because each step doesn't necessarily need to take the full 33ms.

It doesn't matter. Everything is gated by the slowest step. You can't queue up more frames if the HW is backed up, you can't generate more frames if you are CPU limited. In fact, you'll get less latency if you are CPU limited because then the HW can't get behind the CPU. Framerate won't be improved of course.

Specifically, there's no a priori reason why buffers 2 and 3 must always have data. But if you don't have both, and the data load is roughly equal, then you're going to have more stalls. But you may want to have the driver optimize for the case where the system is perpetually GPU-limited, in which case you'd really want to cut out 3.

If you're not GPU limited, then you won't be queuing up extra frames as the HW will be waiting for the CPU.

That said, I don't see why 1/8th of a second would be all that terrible. This is, after all, approximately the amount of time it takes for our eye to record and send one full picture. And don't forget that this is just 33 fps.

It makes a huge difference. I can sense much smaller latencies because my mouse cursor or crosshair won't track with the inputs.

60 fps is often considered a much better framerate, and it's typically not because 33 fps wouldn't look good for a non-interactive scenario. Most gameplay is just too chaotic for there to be much difference between the 60 and 30 fps framerates if it wasn't interactive. Latency is what is typically noticed. Therefore, if we go by your claim that 1/8th of a second would be noticeable, then that would account for the noticed significant difference between 60 fps and 30 fps. Add to that that 20fps is where latency starts getting really bad (I claim this would be a total latency averaging about 150-175ms), and I think it makes sense.

20 fps can be ok as long as you are not queuing up lots of frames. Again, being CPU limited means you'll have less frames queued so latency should be lower, of course framerates will stink at 20 fps.

epicstruggle · Oct 14, 2004

wouldnt the cpu lag/delay be diminished with a dual core-cpu? Both amd and intel seem to be on that path.

epic

PatrickL · Oct 14, 2004

It is exactly why i think SLI is not for gamers but should be very useful and maybe widely adopted in workstations.

KimB · Oct 14, 2004

PatrickL said:
It is exactly why i think SLI is not for gamers but should be very useful and maybe widely adopted in workstations.

Except SLI still decreases latency, even when working in AFR mode. Remember that AFR mode also isn't the only mode that nVidia's products work in (as I've stated, it's probably not the mode you'd want to use for games anyway).

trinibwoy · Oct 18, 2004

According to the INQ there's only going to be a $50 premium for SLI Nforce4 over Ultra Nforce 4 whatever that is. It's supposedly available for 754 also but nothing definite on which flavors of Nforce4 will be available for that socket. If there's an SLI 754 that's one less thing to whine about. No Soundstorm though.

link

KimB · Oct 18, 2004

Why the hell should you bother paying any attention to the Inquirer? Any idiot could have guessed SLI boards would cost approximately $50 more.

Rys · Oct 18, 2004

trinibwoy said:
If there's an SLI 754 that's one less thing to whine about.

Thinking about it from NVIDIA's marketing perspective should enlighten everyone. Or you could just think about it like this: do NVIDIA's actions perpetually give people reason to whine? If so, why change the habit of a recent lifetime

Rys

trinibwoy · Oct 18, 2004

Chalnoth said:
Why the hell should you bother paying any attention to the Inquirer? Any idiot could have guessed SLI boards would cost approximately $50 more.

Rough day? Tell that to those who were forseeing $100+ premiums.

Rys said:
Thinking about it from NVIDIA's marketing perspective should enlighten everyone. Or you could just think about it like this: do NVIDIA's actions perpetually give people reason to whine? If so, why change the habit of a recent lifetime

Maybe, but whining about Nvidia is also a hard habit to break

hjs · Oct 20, 2004

Richthofen said:
The opportunitiy to add a second card later on for a cheaper price once the first one gets to slow is driving sales.

Isn't this the trick a lot of people will get problems with?
as far as i understood you need 2 exactly the same cards for sli to work.
I think the change of finding the same card a few months later will be a very difficult thing. (or am i wrong?)

Rys · Oct 20, 2004

hjs said:
Richthofen said:

The opportunitiy to add a second card later on for a cheaper price once the first one gets to slow is driving sales.

Click to expand...

Isn't this the trick a lot of people will get problems with?
as far as i understood you need 2 exactly the same cards for sli to work.
I think the change of finding the same card a few months later will be a very difficult thing. (or am i wrong?)

To some extent you're right. Obviously two cards from the same vendor will be preferable. But even then, you still need to make sure the BIOS revisions on both cards are the same. I forsee an NVIDIA approved "Make both boards identical BIOS wise using this handy Windows flasher" tool in the future.

Rys

Tim · Oct 20, 2004

trinibwoy said:
Rough day? Tell that to those who were forseeing $100+ premiums.

It is a $100+ premium over the vanilla nForce 4. The vanilla nf4 still supports GB-ethernet, Raid 0/1, hardware firewall (but not ActiveArmor). OK the vanilla nf4 will be useless for overclocking (if the motherboard makers do not work around the limitations). We most likely going to see relativly feature rich (1GB-ethernet, Raid 0/1 etc.) sub-$100 dollar motherboard based on Via, SIS or Ati chipsets.

If you do not need the advanced fatures like Raid 5, ActiveArmor, 3Gb/s SATA - you are going to pay around $100 extra for SLI support.

whql · Oct 20, 2004

5700 Ultra was released about a year ago (the ddr-3 version even less than that) - try finding one of them now, let alone one from the same manufacturer with the same specs.

Jawed · Oct 20, 2004

How do you squeeze in two Ultras?

Is any SLI motherboard going to have enough space between (and around) the two "16x" PCI Express slots so that you can fit two 6800 Ultras in? Is this going to be an option, physically?

Will SLI be limited to single-slot sized graphics cards?

It seems to me that anyone with a 6800 Ultra is out of luck...

Jawed

Jawed · Oct 20, 2004

Texture Memory Virtualisation - Page Faults

It appears that SLI will halve the bandwidth available to each graphics card. How is this going to impact the virtual memory architecture that the graphics card industry is aiming to implement? I'm thinking specifically of the case where virtual memory is used to implement "texture caching", so that as artists create ever more complex and numerous textures the need to increase GPU card memory to 512MB (or beyond) is ameliorated.

If each card in an SLI configuration has half the virtual memory bandwidth available to it (as compared with a single card) how much impact will this have on the efficiency of SLI?

If SLI works by splitting each frame into "load-balanced" halves for each card to render separately, then it could be argued that each card will require, on average, only half of the texture bandwidth, since only "half" of the textures in a given scene will be rendered by that card.

Alternatively, if each card renders an alternate frame, then the halving in virtual memory bandwidth will hurt doubly, because both cards will, on average, require all the textures in a scene.

The only amelioration of the virtualised texture memory problem would appear to be the relatively infrequent page faults experienced as gameplay progresses. On the other hand the impact of a page fault will affect both cards equally (since the chances are that both cards will generate page faults for the same reasons: perspective change as the player moves or orientation as the player swings through 180 degrees, say), the net result being that the frame rate troughs due to texture page faults will typically be at least as deep as those experienced by a single card (though perhaps deeper due to the doubled load on the virtualised texture memory architecture).

Since the true power of a graphics card is determined by its frame rate troughs, the effective ceiling on PCI Express bandwidth for virtualised texture memory accesses implies to me that SLI, by taking a big chunk out of that effective bandwidth, is only a short- to medium-term solution for gamers.

A bit like good old-fashioned SLI...

Jawed

trinibwoy · Oct 20, 2004

Tim said:
If you do not need the advanced fatures like Raid 5, ActiveArmor, 3Gb/s SATA - you are going to pay around $100 extra for SLI support.

Good point. It will be a bad decision by Nvidia if they don't provide something similar to a vanilla Nforce4+SLI solution. I really don't want to pay for all that Ultra stuff just to get SLI.

Jawed said:
It appears that SLI will halve the bandwidth available to each graphics card. How is this going to impact the virtual memory architecture that the graphics card industry is aiming to implement?

Streaming texture data from main memory is just a pipe dream at the moment and may prove to be too inefficient for real-time rendering especially considering the bandwidth requirements for upcoming engines.

Dave Baumann · Oct 20, 2004

Virtualised systems will be very much a part of WGF and Longhorn - current specifications for Avalon require either PEG x16 or AGP8X. 8 lane PCIe will come in at AGP8x bandwidths (or very slightly less) but you are at the lower end of the requirements then.

trinibwoy · Oct 20, 2004

DaveBaumann said:
Virtualised systems will be very much a part of WGF and Longhorn - current specifications for Avalon require either PEG x16 or AGP8X. 8 lane PCIe will come in at AGP8x bandwidths (or very slightly less) but you are at the lower end of the requirements then.

Hurrah! How is that going to help us run games on UE3?

More SLI

KimB

OpenGL guy

KimB

OpenGL guy

epicstruggle

Passenger on Serenity

PatrickL

KimB

trinibwoy

Meh

KimB

Rys

Graphics @ AMD

trinibwoy

Meh

hjs

Rys

Graphics @ AMD

Tim

whql

Jawed

Jawed

trinibwoy

Meh

Dave Baumann

Gamerscore Wh...

trinibwoy

Meh

Similar threads