Investigating GeForce4 4XS AA

Shark, no it's not the same thing as the 4x9 tap FSAA mode on the GF3. That was a "blur" filter, like Quincunx. This doesn't blur additionally... take a look at the shots, the textures look better, not worse. It's hybrid super and multisampling. I would speculate the reason the edges look better is because it's also effectively 2x RGMS on one axis, and 1x2xOGSS on the other (if I'm understanding it right). Look at the details provided on digit-life... when the 2 modes are downfiltered, the result is offset pixel sample locations... not quite as good as truly rotated, but better than a pure ordered grid solution.
 
Rams explaination is by far the best I've seen so far - its definitely plausable

4xs2%2Egif


In the case of GF4/GF3 this is effectively halving the number of pixel pipes, as two will actually be working on the same pixel; with GF4MX it becomes a single 'pixel' per clock.

If you take this a coulpe of steps further on GF4 you could actually achieve and 8XS mode with 4 texture samples (rotated grid), 8XS at 2 texture samples (ordered Grid) or even 16XS with 4 texture samples (ordered grid) - but naturally bandwidth will be an issue!
 
On 2002-02-09 20:57, rhink wrote:
Shark, no it's not the same thing as the 4x9 tap FSAA mode on the GF3. That was a "blur" filter, like Quincunx. This doesn't blur additionally... take a look at the shots, the textures look better, not worse. It's hybrid super and multisampling. I would speculate the reason the edges look better is because it's also effectively 2x RGMS on one axis, and 1x2xOGSS on the other (if I'm understanding it right). Look at the details provided on digit-life... when the 2 modes are downfiltered, the result is offset pixel sample locations... not quite as good as truly rotated, but better than a pure ordered grid solution.

There are 2 mystery modes, one 9-tap Quincunx and 1 MS + SS, 4XS is the MS + SS mode!
 
I am still wondering whether, in 4xS-mode, the downsamling happens the "usual" way (1x2 back buffer -> 1x1 front buffer) or on the GF4's RAMDAC, i.e. whether nVIDIA, as Wavey suggested, got rid of the back buffer entirely.

Does anbody here have any further information?

ta,
.rb

________
iolite portable vaporizer
 
Last edited by a moderator:
How do you completely eliminate the back buffer? There has to be some temporary storage area, you can't render straight to the front buffer, and I thought that area, by definition, was the back buffer?
 
On 2002-02-09 11:04, aths wrote: i guess GF3 uses a line cache

After thingking about it for some time, i would say no. Of course I'm open for some ideas fo you why that already should be the case with the Geforce3.

tabelle.gif


This table by Nvidia shows the framebuffer requirements for Geforce3's FSAA. Here you see that for 2x AA they need exactly 15.36 MB at 1024x786x32bit.

1024x768x32bit = 3 MB for the Front Buffer
1024x786x32bitx2 = 6.1 MB for the Back Buffer
1024x786x32bitx2 = 6.1 MB for the Z-Buffer

With line caching, your back buffer with the color values could be half the size, as you combine the two subsamples on chip and only safe the final color value in the memory.

<font size=-1>[ This Message was edited by: ram on 2002-02-11 10:55 ]</font>
 
rhink,
How do you completely eliminate the back buffer? There has to be some temporary storage area, you can't render straight to the front buffer, and I thought that area, by definition, was the back buffer?
[wild_speculation]With MSAA enabled, the GF4 could use multisample buffers for such cases.[/wild_speculation]

ta,
.rb

________
Suzuki DR650S
 
Last edited by a moderator:
ahh, so sorta like the V5's multiple buffers? That would fall in line with the speculation the GF4 downfiltered at the ramdac stage.
 
If you used multiple buffers, you would have more random memory access, which would slow things down when rendering, though the crossbar memory might make up for it.

And yes, Kyro cards would probably look reasonable with single buffering, because unlike traditional renderers they only write final pixel values to the framebuffer. But you would still have tearing of course.
Traditional architectures need a back buffer.
 
And I'm still trying to work out what they are. My suspicion is they've got rid of the back buffer.

You mean the "front buffer", don't you? The back buffer in this case would be the upsampled buffer the GPU is rendering into.

The problem with this is ... how do you handle VSync? Not having a back buffer or front buffer makes it impossible to get the benefits of VSync ... you always read in sync with the monitor refresh from the upsampled buffer and therefore, tearing appears. Second, only having one buffer is not possible on traditional architectures, as in this buffer, there are temporary color values too. Double buffering further reduces "flickering".

The Voodoo 5 had a back buffer and a front buffer, although it did the combining in the ramdac.

There is still the line caching theory left. I still think this might be a reasonable approach, though I'm not sure how reading back filtered values could affect the image quality in reality. (?)

<font size=-1>[ This Message was edited by: ram on 2002-03-03 19:09 ]</font>
 
Back
Top