Has Nvidia Found a Way to get a Free Pass in AA comparrisons

Joe DeFuria · Mar 6, 2003

WaltC said:
Russ, I think you've totally misunderstood the issue, at least as I see it. When nVidia first began emulating 3dfx's T-Buffer FSAA in its GFx products, it used supersampling--rendered to a high resolution and scaled it down. It was very slow, and looked not 1/10th as good as what 3dfx did with the T-buffer. Indeed, 3dfx did *hardware jittering* in the V5, that is, they took more than one frame *at the same resolution* and hardware-jittered the result in the VSA-100 hardware. They blended in the T-Buffer, which was also in the the V5 hardware. nVidia never, ever had anything similar (and the results showed that plainly, IMO.)

No one is saying or arguing that any of nVidia's AA implentations is similar in any way to the VSA-100 except that there is a point where the combination of 2 samples / images was performed "during scan-out".

You appear to be arguing against things that nobody said, that's all.

kyleb · Mar 6, 2003

RussSchultz said:
And what exactly would you test with your V5 and driver CD to prove either one of us right or wrong? :?

well i would take a few sreenshots using standard methods vs the hypersnap method and we could compare the results. does that not make any sence to you?

also, as for the reading comprehension comment, i think i will hold off for a second opinion.

Simon F · Mar 7, 2003

WaltC said:
RussSchultz said:

The t-buffer method is exactly that: using the ramdac to blend the samples on the output stage.

Click to expand...

Sigh, wrong--I see I am speaking to someone who knew nothing about it when it shipped.

Walt, a word of advice for you mate. There are some quite knowledgable on this forum so it's best to make sure you really know what you're talking about before insulting someone.

The T-Buffer technology maintained N buffers, each of which could be rendered independently and then recombined on-the-fly (AFAIAA) in the DAC feed. The idea behind it was to re-order what could be done in an accumulation buffer (see SIGGRAPH in, err, about the early 90s) so that instead of doing N passes serially, the N passes could be done in parallel. Unfortunately, N, was typically rather small which meant than some effects, eg Depth of field and motion blur, weren't done brilliantly well.

Reverend · Mar 7, 2003

Simon F said:
WaltC said:

RussSchultz said:

The t-buffer method is exactly that: using the ramdac to blend the samples on the output stage.

Click to expand...

Sigh, wrong--I see I am speaking to someone who knew nothing about it when it shipped.

Click to expand...

Walt, a word of advice for you mate. There are some quite knowledgable on this forum so it's best to make sure you really know what you're talking about before insulting someone. <snipped rest of Simon's rather well defined T-Buffer post>

Whoa, talk about telling it to them the way I have always wanted to.

ChrisW · Jun 15, 2003

Anyone ever think to use a camera to capture the image from the monitor? Simply stand still and take a snapshot then capture it with hypersnap and take a picture of the resulting image. Then you can compare the two pictures and determine if they are the same.

I have seen too many threads by GFFX owners that say the image captured by Hypersnap is much better (more FSAA) than the image they see on the screen and now even some reviewers are pointing out the same thing. Since the drivers could easily detect that Hypersnap is running and change the image sent to it, it seems reasonable to question the results.

Sharkfood · Jun 15, 2003

This whole topic IS old news.. it's just a good thing if more and more people know about it since it may help them make better, informed decisions as consumers.

I posted on this from the release of the GF4. I upgraded a GF3 to a GF4-Ti4600 from the droves of websites that fashioned comparison screenshots of 2x, QC and xS modes and the GF4 screenshots were sharper, clearer and overall didn't have the banding/blending that the GF3 had. After snapping in the GF4 and playing Tribes and Flight Simulator, I was suddenly educated that no such improvements occurred. Quincunx, albeit a tiny, *tiny* bit improved, was still the blur-mastery that it was on the GF3.. the only real remarkable difference was that screenshots of 2x and QC were absolutely incomparable to on-screen image.

I remember a thread (too lazy to look up at the moment) where I think it was Dave that had some information that there is indeed some "magic" that occurrs at the post-filter level, so screenshots would not be valid representatives on final rendering quality. Mike at NVNEWS also carried out several experiments from my examples and conceded the same fact.

It sure helped to sell a lot of GF4's from various websites posting very pretty,clear, unblurred imagery from AA/AF settings on these cards. It sounds more like it may be the opposite situation if IQ is diminished between framebuffer and post-filtered screen result (ala 3dfx, which was also used to sell alot of Geforces by using this behavior of screenshots).

In all, from the GF4 forward, I think it's safe to assume that screenshots are unreliable to suggest image quality.. and consumers should take screenshot comparisons with a grain of salt. In days of old we had more trustworthy reviewers who would give lengthy and well-explained subjective findings concerning IQ (on decent monitors too!), but this has kind of faded over time as well... which is probably why forums have become so popular in the meantime.

Thowllly · Jun 15, 2003

Xmas said:
WaltC, you're only talking about the 3dfx post filter here, ie. the one used do eliminate dithering artifacts occurring with 16bit rendering (why it's '22 bit' is simple: it takes 4 pixels in RGB-565 format and adds them, which results in a RGB-787 color value. This combination is conditional however and depends on some contrast threshold)

Blurring the image just to remove dithering doesnâ€™t seem like a good idea to me. Are you sure about this? I know that there have been some articles here on B3D suggesting that it might be done this way, but I always thought they were wrong. Here is how I would do it:

-Add two extra zero bits to each channel. (Effectively multiplying it with 4)

-Compare each pixel to the three pixels right, down, down-right.

-- If a pixel has a higher value than the pixel it's compared to, decrement the output value by one.

-- If a pixel has a lower value than the pixel it's compared to, increment the output value by one.

-- If a pixel has the same (input) value as the pixel it's compared to, do nothing to the output value.

Two examples: (imagine that the dithering patterns continue beyond the small 4x4 cut-out shown here)

Code:

3 4 3 4       12 16 12 16       15 15 15 15

4 4 4 4 ___|\ 16 16 16 16 ___|\ 15 15 15 15 
       |___  >           |___  >
3 4 3 4    |/ 12 16 12 16    |/ 15 15 15 15

4 4 4 4       16 16 16 16       15 15 15 15
-------------------------------------------
3 4 3 4       12 16 12 16       14 14 14 14

4 3 4 3 ___|\ 16 12 16 12 ___|\ 14 14 14 14
       |___  >           |___  >
3 4 3 4    |/ 12 16 12 16    |/ 14 14 14 14

4 3 4 3       16 12 16 12       14 14 14 14

First example: The 12's are compared to tree 16's and are incremented three times to 15. The 16's sees one 12 each and are decremented once. The two other 16's it's compared to doesn't affect it. Second example: each 12 is compared to two 16â€™s and incremented to 14. Each 16 is compared to two 12â€™s a decremented to 14.

This method effectively removes dithering while only blurring very slightly. It could also include a...

-- If a pixel has a value that is very different from the pixel it's compared to, do nothing to the output value.

... to further reduce blurring.

I donâ€™t know if Iâ€™ve been very clear in this description, but Iâ€™ll quit typing now, safe in the knowledge that I did my best...

Edit: To demonstrate what happens when an edge is encoutered I added an edge to the examples above. The first example doesn't use the "do nothing if difference is large rule", the second do use that rule.

Code:

3 4 23 24       12 16 92 96       15 17 95 95

4 4 24 24 ___|\ 16 16 96 96 ___|\ 15 17 95 95
         |___  >           |___  >
3 4 23 24    |/ 12 16 92 96    |/ 15 17 95 95

4 4 24 24       16 16 96 96       15 17 95 95
---------------------------------------------
3 4 23 24       12 16 92 96       14 15 95 95

4 3 24 23 ___|\ 16 12 96 92 ___|\ 14 13 95 95
         |___  >           |___  >
3 4 23 24    |/ 12 16 92 96    |/ 14 15 95 95

4 3 24 23       16 12 96 92       14 13 95 95

Hopefully there arenâ€™t too many errors in my examples.

Himself · Jun 15, 2003

andypski said:
It is down to the online journalists to do appropriate comparisons and analysis and to work with IHVs such as ourselves and nVidia to ensure that they are getting accurate information - reviews are a collaboration to some extent because journalists rely on IHVs to answer their questions, and IHVs rely on journalists to bring issues to their attention (preferably before publishing so they can get an informed response).

- Andy.

Given the timelines for reviews, typically a week, and the response time of companies, typically a week to a month, that is not likely to be an iterative process.

Xmas · Jun 15, 2003

ChrisW said:
Anyone ever think to use a camera to capture the image from the monitor? Simply stand still and take a snapshot then capture it with hypersnap and take a picture of the resulting image. Then you can compare the two pictures and determine if they are the same.

You would need a very high quality camera to do that.

An easier way to do a comparison is to take advantage of the multimonitor capabilities of almost all new cards. Take two identical monitors, preferrably TFTs, and run 3D on one screen and show a screenshot of it on the other screen.

An even better way would be if developers implemented a comparison feature where you press a key to toggle between normal rendering and a mode that renders the scene, reads the framebuffer (like it would for taking a screenshot) and then shows that image, either by writing it back (but then you would still get quincunx blur if it's enabled) or by showing it in another, overlapping window through GDI.

Gunhead · Jun 15, 2003

WaltC said:
Indeed, 3dfx did *hardware jittering* in the V5, that is, they took more than one frame *at the same resolution* and hardware-jittered the result in the VSA-100 hardware.

Just clarifying the jittering part here; hope you don't mind.

You seem to say that they rendered (for instance) 4 frames, then, after that, hardware jittered those subsample frames. I believe (and the diagram says) that they jittered the geometry already, getting jittered subsample frames as a result, and then they just had to combine those frames at output.

The small difference being that the jittering happened to the triangle data, not to the backbuffer pixel data. Righto?

WaltC · Jun 15, 2003

RussSchultz said:
3dfx had the same thing with the V5.

Why do you keep saying that? It's just not true.

With the V5, when you switched to 32-bit display mode the post-filtering was *turned off.* You still got great FSAA, though.

3dfx only used the post filter to simulate a 22-bit color display mode *when the product was running in 16-bit display mode.* It was turned off in the V5 in 32-bit mode because the pseudo-22 bits they got through post-filter processing was inferior to their real 32-bit display mode (which should be obvious.)

3dfx pioneered post-filter processing in the V3 which did not do AA at all.

In the case of the V5 you are confusing the T-buffer with the post filter (haven't we been over this before?) The V3, of course, since it did no FSAA had no T-buffer.

The use of the post filter as apparently employed by nV3x specifically for FSAA is unique to nVidia.

Also unlike nVidia, 3dfx was very up front in describing the general mechanics behind not only the T-buffer (which they did copiously months ahead of the V5 shipping), but also they "writ large" on their use of the post filter in the V3 (where, again, its use had nothing to do with FSAA.)When this question was first put to nVidia by [H] with the nv30 launch, when it became apparent that their 2x FSAA screen shots were identical to their 0x FSAA screen shots, but an on-screen difference was visible--nVidia promptly clamped a "trade-secret" description on using the post filter for FSAA in its nv3x products for FSAA and has refused to even generally describe it since.

Nope, I see zero comparison between what 3dfx did and what nVidia is now doing where post-filter blending is concerned.

WaltC · Jun 15, 2003

Gunhead said:
The small difference being that the jittering happened to the triangle data, not to the backbuffer pixel data. Righto?

Yes, but with the V5 in 32-bit mode there was no post-filter blending going on. And there would have been none in the V5's plane-jane 16-bit mode, either. You had a specific driver switch in the driver interface you could throw to activate post-filter blending for their pseudo 22-bit display mode, but it was automatically disabled when you selected for 32-bit integer display, and also disabled when you turned it off. You could still do FSAA to your heart's content, however, with all post-filter blending turned off.

Xmas · Jun 15, 2003

WaltC, why don't you just accept that there is one 'post filter' to alleviate dithering artifacts in 16bit (this was used by 3dfx only) and another 'post filter' to do the AA downsampling (and this is where NVidia copied from 3dfx)?

btw, great idea Thowllly.

WaltC · Jun 15, 2003

Xmas said:
But there is another 'post filter' used to combine the AA samples for the final output to the screen. And that's why it was also difficult to grab an AAed screenshot on a V4/V5 at first.

I don't understand---now you guys are hypothesizing two post filter types of blending...

In the V3, the chip could only output 16-bits; however, internally everything was done at 32-bits and downsampled to 16 for output. The post filter blending, when turned on, used 3dfx algorithms to upsample to the 16-bit data to their near 22-bit level for output at the RAMDAC level. As in the V5, operation of the post filter was not automatic, it was user selectable in the drivers (ie, forced.)

The V5, however, was a full 32-bit integer display product. Thus, there was no need to upsample the data to a pseudo 22-bit level when running in 32-bits, obviously, since this would degrade the display. So, as 3dfx stated many times, use of the post filter for blending was automatically turned off in the V5's 32-bit display mode. It wasn't even used automatically for 16-bit display mode--you had to turn it on and could likewise turn it off.

Take the T-buffer away from the V5 and--guess what? No FSAA! None. Just like with the V3.

Now, are you hypothesizing that nV3x products are using a T-buffer along with post-filter blending to do their FSAA?...

If not, I think you are mischaracterizing the V5.

So....if nV3x products do not employ a T-buffer, they are not utilizing the post filter as 3dfx utilized it in the V5.

Let's get to the purpose of post-filter blending as 3dfx employed it in the V3 and the V5. The purpose they used it for was to increase the rendering accuracy of their 16-bit display mode to that approaching 24-bit accuracy (near 22-bits of accuracy), while running at only slightly degraded 16-bit performance levels.

So, considering that like the V5 the nV3x products are capable of not only 32-bit integer precision natively, but also much higher levels of precision through floating point (up to 128-bits of accuracy), what similar purpose is the post filter being employed for in nv3x FSAA?

That's it--and why I don't think nv3x is using the post filter in a manner that remotely approaches 3dfx's use of it in the V3 and the V5.

Rather, my own theory is that they are using it not to enhance pixel accuracy as 3dfx did--since nV3x can natively do much higher levels of rendering precision and therefore doesn't need such a device to achieve higher levels of rendering precision--but to simulate FSAA blending, without the use of a T-buffer, in certain of its nV3x FSAA MSAA modes. Obviously, it isn't happening at 8x FSAA, and with nV30 was not happening at 4x FSAA. With the nV35 my question would be whether they've extended it to their 4x FSAA MSAA mode.

This kind of thing would necessitate something much different than anything 3dfx ever did with the post filter, IMO. One wonders why 3dfx did so much talking about their use of the post filter and the T-buffer, while nVidia won't talk about their FSAA at all, except to say it's a "trade secret."

Xmas · Jun 15, 2003

WaltC, why are you still talking about the "22bit post filter" here? That is *not* the topic at hand.

This is about the circuitry used to blend the several AA samples of one pixel together, regardless of where they come from, whether from the 'T-Buffer', (which is similar to) ATI's/NVidia's multisample buffer or just one giant frame buffer.

The thing which 3dfx did with the VSA-100 and NVidia now uses in their newer chips is to blend the several samples together at scanout, that is when the RAMDAC / TMDS transmitter / TV encoder wants to send the picture to the output device.

Previous/other chips render one frame, downsample it and write the downsampled, antialiased image back to the frame buffer. From where it can be read later by the RAMDAC.

With downsampling at scanout, the antialiased image never gets written to the framebuffer. Therefore it is called a post framebuffer filter, or short: a post filter.

3dcgi · Jun 15, 2003

According to Kristof and Dave Barron here's what up.

The front T-Buffers now contains the sub-samples of the scene we just rendered. The sub-samples now need to be combined to form the final anti-aliased image. This combining is done just before the RAMDAC by special video circuitry that mixes the various buffers together at the pixel level. The RAMDAC is a special component of a 2D/3D chip that translates the contents of the buffers into a signal that can be displayed by your monitor. Most monitors take analog signals as input, which explains the DAC part of the name: Digital to Analogue Converter. The RAM refers to the fact that the AD conversion is done using a table contained in RAM (this has to do with Gamma Correction).The main advantage of this approach is that no down-sampled version of the image has to be stored and the color depth at the output level is higher than the color depth of an individual buffer. The sub-sample T-Buffers can contain, for example,16-bit color, but the combining operation (mixing of the colors) is done at a higher accuracy by the video circuitry which leads to a final anti-aliased image with a color depth higher than the color depth of the individual buffers. This principle is similar to that of the post-filter technology found in the Voodoo2 and 3 designs [3].

http://www.beyond3d.com/articles/ssaa/index5.php

There's even a nice picture.

Gunhead · Jun 15, 2003

Okay Walt, thanks for the 16/22 explanation anyway

ChrisW · Jun 15, 2003

Let me get this straight. The GFFX screenshot is made from a combination of the front and back buffers (two seperate frames), i.e. they are anti-aliasing only half of the screen on one frame and anti-aliasing the other half the other frame? The idea being after they flash two seperate frames in front of you fast enough, it would give the appearance of full scene anti-aliasing while only doing half the work?

gokickrocks · Jun 15, 2003

well i suggested this in the nvnews forum...

not sure if this would work, but...

do a video-out to video-in (you can use 2 computers for this) and take a screen cap

Sharkfood · Jun 15, 2003

do a video-out to video-in (you can use 2 computers for this) and take a screen cap

That is absolutely horrible from the incredible amount of signal quality that is lost... not to mention most video-in/capture are limited to NTSC/PAL resolutions, which cap them at around 480 vertical lines tops.

There are ways to capture high-resolution with reduced image quality loss, but such hardware is VERY expensive (true-color, real-time, VGA capture versus the svideo/composite modes of consumer stuff..). I don't think most websites could afford such equipment.

The real solution is to handle this how 3dfx and Hypersnap did- which is to yield an algorithm that can be applied to the framebuffer to match the process the post-filter is performing.

Has Nvidia Found a Way to get a Free Pass in AA comparrisons

Joe DeFuria

kyleb

Simon F

Tea maker

Reverend

ChrisW

Sharkfood

Thowllly

Himself

Xmas

Porous

Gunhead

WaltC

WaltC

Xmas

Porous

WaltC

Xmas

Porous

3dcgi

Gunhead

ChrisW

gokickrocks

Sharkfood

Similar threads