Has Nvidia Found a Way to get a Free Pass in AA comparrisons

ChrisW said:
Let me get this straight. The GFFX screenshot is made from a combination of the front and back buffers (two seperate frames), i.e. they are anti-aliasing only half of the screen on one frame and anti-aliasing the other half the other frame? The idea being after they flash two seperate frames in front of you fast enough, it would give the appearance of full scene anti-aliasing while only doing half the work?

That doesn't seem like it would give the appearance of full antialiasing. Wouldn't that scenario look something like this?

screen.jpg
 
Sharkfood said:
do a video-out to video-in (you can use 2 computers for this) and take a screen cap

That is absolutely horrible from the incredible amount of signal quality that is lost... not to mention most video-in/capture are limited to NTSC/PAL resolutions, which cap them at around 480 vertical lines tops.

There are ways to capture high-resolution with reduced image quality loss, but such hardware is VERY expensive (true-color, real-time, VGA capture versus the svideo/composite modes of consumer stuff..). I don't think most websites could afford such equipment.

The real solution is to handle this how 3dfx and Hypersnap did- which is to yield an algorithm that can be applied to the framebuffer to match the process the post-filter is performing.

Or find a device that can capture the frames from the DVI output (now every high-end card should have DVI output), but that device may have to be custom constructed.

I think it was incredible that 3dfx Voodoo5 was still talked about after so many years.
 
Christ, is there some sort of timewarp going on here?

Walt: just stop arguing the point. You're absolutely talking about something completely different than the rest of us. If nothing else, you should at least reevaluate your position when nobody agrees with you.

You = talking about 22 bit filter
Us = talking about scan out of subsamples to the screen

You = talking about something that NVIDIA doesn't do.
Us = talking about something that 3dfx did and NVIDIA does.

Note once again, Us (being not you) are not talking about NVIDIA doing the 22 bit post filter thing. Us (being not you) agree with you, NVIDIA does not do this.

US (being not you), however, ARE talking about the screenshots of NVIDIA FSAA on their latest products that employ super sampling don't capture correctly because there is a scanout mechanism similar in concept to the one that was employed on the 3dfx V5.
 
Not-Walts--

Are you guys saying there were two post-filters on the V5 and that Walt keeps pointing at the wrong one? Or are you saying that a post-filter is like a hammer, and that it doesn't really matter if you use the hammer to build a table (22-bit filter) or a house (FSAA), in either case it is still a hammer?

Anybody but me uncomfortable with the idea upstream that hypersnap is using an algorithm to simulate the results of Nvidia FSAA? I mean it seems to me it has to be characterized as a simulation if that is what they are doing. Maybe it is a good one and maybe it is a bad one, but it seems fraught with danger to me.
 
geo said:
Not-Walts--

Are you guys saying there were two post-filters on the V5 and that Walt keeps pointing at the wrong one? Or are you saying that a post-filter is like a hammer, and that it doesn't really matter if you use the hammer to build a table (22-bit filter) or a house (FSAA), in either case it is still a hammer?
I have no idea if they were physically two different units, but there were two separate functions being performed on the V5:

1) 16/'22' bit thing. (Not what we're talking about)
2) Combining the super sample buffers at scanout. (what we are talking about)

The NVIDIA solution also appears to implement a system similar to number 2 above, hence our (being not him) comparison and equation between the AA methods between current NVIDIA and 3dfx V5 parts.
 
I don't understand what it is that NVIDIA is supposed to be doing here.

Does it use several buffers and combine them in the ramdac? Ie, draw the scene simutaneously to N buffers with different offsets to get aliasing to average out via the ramdac? I would assume the ramdac thing would not be doing multiple samples per buffer but a rather straight forward linear scan of N progressing memory locations with simple averaging. So each refresh period, the ramdac has to scan N times as much as normal, generating the same filtering each time? Seems inefficient to me, especially as you increase resolution, but perhaps NVIDIA only uses 2 bufferes and does regular antialiasing of each buffer first for some modes?
 
Himself said:
Does it use several buffers and combine them in the ramdac?
Yes. When its doing super sampling, yes.

Some of their FSAA modes are multisampling, others are super sampling.
 
RussSchultz said:
Himself said:
Does it use several buffers and combine them in the ramdac?
Yes. When its doing super sampling, yes.

Some of their FSAA modes are multisampling, others are super sampling.
How does this differ from the 2x RGMS mode? I though they were also combining the MS buffers on scanout.
 
Himself said:
I don't understand what it is that NVIDIA is supposed to be doing here.

Does it use several buffers and combine them in the ramdac? Ie, draw the scene simutaneously to N buffers with different offsets to get aliasing to average out via the ramdac? I would assume the ramdac thing would not be doing multiple samples per buffer but a rather straight forward linear scan of N progressing memory locations with simple averaging. So each refresh period, the ramdac has to scan N times as much as normal, generating the same filtering each time? Seems inefficient to me, especially as you increase resolution, but perhaps NVIDIA only uses 2 bufferes and does regular antialiasing of each buffer first for some modes?
It is totally irrelevant for the function of antialiasing where you put the samples. Whether you put them in one buffer, several distinct buffers, or even a file on hard disk interleaved with some Excel and Word data. The only thing that matters is that you calculate several samples per pixel and output some weighted average of them to the screen.

In some case you could even argue that NVidia does both writing to one linear buffer and several smaller buffers at the same time! Depending on whether you see the memory bus as one 128-bit wide bus or four independent 32-bit busses.
 
US (being not you), however, ARE talking about the screenshots of NVIDIA FSAA on their latest products that employ super sampling don't capture correctly because there is a scanout mechanism similar in concept to the one that was employed on the 3dfx V5.

Not to hurl myself in the middle of this debate, but from my memory, Walt's explanation was more sound.. unless there was some very tricky manipulation in the 3dfx drivers (from the onset) for frame buffer grabs..

32-bit + AA was never a problem to capture on the V5. Using CTRL-PRNTSCN then pasting into Paint, or even F12 in Quake engine games always captured a framebuffer with full AA. If a post-filter/video blend was used to merge the sample buffers, I would think raw framebuffer grabs would have been incomplete or single-sample or single-chip sample at the very best.

16-bit was always a problem, of course. You needed to use Hypersnap or (later) 3dfx's NUMLOCK env variable shell to apply the algorithm, which then wrote a custom filtered TGA in software to algorithmically reproduce the DAC blend. As it's post-process and video (not written to buffer), this was the only way to handle it.

So it seems to me that the real point is (completely throwing out 16/22-bit as nobody seems to disagree this was obviously DAC or similar post-process) whether or not this "special circuitry" described in the white paper (notice- not DAC and somewhere before DAC) actually wrote to buffer/memory space in it's combine... or if it was a circuitry/video output blend from which the wording would suggest.

Screenshots on 1.01->1.04 drivers would suggest there were buffer writes associated- since screenshots yielded the banding/packed appearance in 16-bit, but always detailed correct FSAA be it a 16-bit or 32-bit with 2x or 4xAA. It wouldn't figure that if 3dfx were kludging some sort of automatic brute force/fast combine for buffer grabs for AA that they wouldn't just do the same for 16-bit (from which they approached by the NUMLOCK key trap under 3dfx driver software to create, manually process and write bitmap files)... I think only people that wrote the 3dfx drivers could honestly answer this one.. whether or not the "special circuitry" wrote to the final buffer (thus explaining AA'd screenshots and buffer grabs), or if some kludge in the drivers magically noticed the read-lock on framebuffer for a buffer grab and quickly applied a write-operation to do a brute force combine/approximation.. but decided not to apply the 16-bit algorithm.
 
Its not hard to 'magically' process the buffer for a screen shot. You have to explicitely lock the framebuffer for reading, which is the perfect que to the drivers for "go do the processing".

Why NVIDIA products don't do this, and why there isn't a tool to show the 'correct' output is a complete mystery to me, however.
 
Xmas said:
It is totally irrelevant for the function of antialiasing where you put the samples. Whether you put them in one buffer, several distinct buffers, or even a file on hard disk interleaved with some Excel and Word data. The only thing that matters is that you calculate several samples per pixel and output some weighted average of them to the screen.

In some case you could even argue that NVidia does both writing to one linear buffer and several smaller buffers at the same time! Depending on whether you see the memory bus as one 128-bit wide bus or four independent 32-bit busses.

For the function of scanning memory and creating a video stream, I would think that how you access the ram would be quite important, if you are doing 8x antialiasing and you have 8 page misses per pixel worst case I would think it would add up and limit how high a resolution you could generate (assuming some per scanline buffering using hblank time for slack). Obviously, creating the video stream requires some known quantity of latency and regularity of ram usage, you are not going to stop creating the feed to wait for agp/disk access. :)

(Let's see, at 8x AA using 8 buffers at 1280x1024 it gives you around 10K ram accesses per line multiplied by the worse case latency of a page miss of say 20ns == 0.2 milliseconds per scan line or 208 seconds to draw 1024 lines, not including hblank time. lol :) Numbers probably wrong and it's a far fetched case, but rather amusing to think of the refresh rate you would get. :))

If you are saying that it is always one regular ram buffer for the ramdac and it's all just algorithmic on NVIDIA's part then I don't see the problem with captures. The driver should be able to handle it's own formats.

From what I recall 3dfx had a very segmented ram setup, you had to copy bits between banks even. Wonder if any of that has crossed over.
 
Simon F said:
The T-Buffer technology maintained N buffers, each of which could be rendered independently and then recombined on-the-fly (AFAIAA) in the DAC feed. The idea behind it was to re-order what could be done in an accumulation buffer (see SIGGRAPH in, err, about the early 90s) so that instead of doing N passes serially, the N passes could be done in parallel. Unfortunately, N, was typically rather small which meant than some effects, eg Depth of field and motion blur, weren't done brilliantly well.

But the point I've been making is this:

Take out the T-buffer in the V5 and you don't get FSAA, period. The T-buffer was not an accumulation buffer in the strictest sense of the word--ie, it was not merely areas in local ram denoted as buffers. From what I recall it was a part of the VSA-100 architecture itself. It seems to me you are overly simplifying and revising what the T-buffer was in the V5. Not an A-buffer, in otherwords. The samples were hardware-jittered in the T-buffer, which is what distinguished it from a mere accumulation buffer. Everybody knew at the time that the T-buffer was *based on* accumulation buffer principles--it was not, however, ever an accumulation buffer of the type you point out.

As I recall at the time, nVidia PR headed up by Perez tried to do a number on the T-buffer and called it an "accumulation buffer" which nVidia then tried to emulate in its presently shipping products at the time with some demos and, I believe, drivers--the result being a complete failure all the way around. Not only was nVidia's a-buffer attempt a failure from an IQ point of view, it was also incredibly slow by comparison. IMO, characterizing the T-buffer as merely a run-of-the-mill software-based accumulation buffer scheme is a total and complete mishcharacterization.

The point to me here is that it's been said that "3dfx did the same thing in the V5." Remove the T-buffer from the V5 and it doesn't matter what you do with the RAMDAC--you aren't getting FSAA. Therefore, unless you want to speculate that the nv3x has a T-buffer, it's clear that 3dfx did not do "the same thing" in the V5 at all.

It would be nice to know *what* nVidia is doing with the RAMDAC relative to nv3x FSAA--but we don't know because nVidia won't talk about it. I do not think asking why nVidia won't talk about it even just to generally describe its mechanics is an inapprorpriate or irrelevant question.

Where you are wrong is in assuming a standard 1990's SIGGRAPH accumulation buffer and the V5 T-buffer circa 2000 are one in the same. They are not and never have been.
 
nVidia then tried to emulate (t-buffer) in its presently shipping products at the time with some demos and, I believe, drivers--the result being a complete failure all the way around. Not only was nVidia's a-buffer attempt a failure from an IQ point of view, it was also incredibly slow by comparison
Not to burst your bubble, but this is still the way its (generally) done today. ATI included. Render into larger (or multiple buffers), combine at 'flip' time(or not, in the case of NVIDIAs super sampling).

The method was unduly attacked at the time by 3dfx supporters because it was not 'tbuffer', but it has stood the test of time.
 
Sharkfood said:
Not to hurl myself in the middle of this debate, but from my memory, Walt's explanation was more sound.. unless there was some very tricky manipulation in the 3dfx drivers (from the onset) for frame buffer grabs..

32-bit + AA was never a problem to capture on the V5. Using CTRL-PRNTSCN then pasting into Paint, or even F12 in Quake engine games always captured a framebuffer with full AA. If a post-filter/video blend was used to merge the sample buffers, I would think raw framebuffer grabs would have been incomplete or single-sample or single-chip sample at the very best.

16-bit was always a problem, of course. You needed to use Hypersnap or (later) 3dfx's NUMLOCK env variable shell to apply the algorithm, which then wrote a custom filtered TGA in software to algorithmically reproduce the DAC blend. As it's post-process and video (not written to buffer), this was the only way to handle it....

Sharkfood, glad to see somebody else remembers it the way I do...;) (This is almost like old times, eh?...;))

Ditto, I don't recall any problems taking screen shots with the V5 set for 32-bit integer display. Ditto the recollection of the Hypersnap requirement for 16 bits with post filter blending turned on. In fact, I can never remember 3dfx talking about the post filter with any reference apart from its 16/22-bit mode whatsoever. And I recall the company stating more than once that when you went to 32-bit output on the V5 the post-filter blending was turned off automatically.


So it seems to me that the real point is (completely throwing out 16/22-bit as nobody seems to disagree this was obviously DAC or similar post-process) whether or not this "special circuitry" described in the white paper (notice- not DAC and somewhere before DAC) actually wrote to buffer/memory space in it's combine... or if it was a circuitry/video output blend from which the wording would suggest.

Exactly--people are assuming that the "special video circuitry" was in fact post-filter blending, and there's no evidence that this is what was meant by this description in the diagram. On a number of levels I find it hard to credit the "3dfx was doing it in the V5, too" school of thought. It seems more like an attempt at justification than anything else.

The kicker is that if nVidia would *explain itself* no one would be wondering about it or feel the need to justify it. Ah, well...as long as people are content with the "trade secret" mantra that probably won't happen.

... I think only people that wrote the 3dfx drivers could honestly answer this one.. whether or not the "special circuitry" wrote to the final buffer (thus explaining AA'd screenshots and buffer grabs), or if some kludge in the drivers magically noticed the read-lock on framebuffer for a buffer grab and quickly applied a write-operation to do a brute force combine/approximation.. but decided not to apply the 16-bit algorithm.

Agreed...*chuckle*...I think it's a moot point. The fact that today I can recall from memory a lot more about V5 technology relative to FSAA than nVidia is willing to reveal about FSAA in its currently shipping nV3x products is a chilling thought, IMO. When a company won't talk about what it's doing in general terms it's not hard to postulate that they won't talk about it because they simply don't want what they are doing to be known and examined publicly. If they'd just *talk about it openly* this entire unfavorable aura would be dispelled if the technique was legitimate and interesting. If not legitimate and interesting--then it's easy to see why they refuse to talk about it.

I have no objections to what they are doing provided it doesn't promote performance in benchmarks at the expense of image quality, since FSAA is about image quality as much as performance. But that's a separate issue from wanting to know what it is that they are doing with post filter in their nv3x FSAA MSAA modes, I think. Not talking about it only leads to possibly baseless opinion on the subject from all sides of the argument.
 
Himself said:
For the function of scanning memory and creating a video stream, I would think that how you access the ram would be quite important, if you are doing 8x antialiasing and you have 8 page misses per pixel worst case I would think it would add up and limit how high a resolution you could generate (assuming some per scanline buffering using hblank time for slack). Obviously, creating the video stream requires some known quantity of latency and regularity of ram usage, you are not going to stop creating the feed to wait for agp/disk access. :)
Of course it's important for speed how you store the samples, but not for the result, which is what I was trying to express. We can be quite sure NV3x uses a tiled framebuffer because of color compression.


Russ,
NVidia is using filter at scanout for their pure multisampling modes (NV25: for 2x and Quincunx, NV3x: for all MS modes), not the supersampling modes. The simple reason for this is that filter at scanout is more effective the higher the framerate to refresh rate ratio is.


WaltC,
do you agree that V5 does blending of the AA samples at scanout? If yes, do you further agree that NV25 (only 2xMS) and later chips do blending of the AA samples at scanout?
So would you agree that they do "the same" in this regard?
 
RussSchultz said:
Not to burst your bubble, but this is still the way its (generally) done today. ATI included. Render into larger (or multiple buffers), combine at 'flip' time(or not, in the case of NVIDIAs super sampling).

The method was unduly attacked at the time by 3dfx supporters because it was not 'tbuffer', but it has stood the test of time.

I would never argue with the idea that 3dfx started a good thing with FSAA with the V5, despite nVidia's attempts at the time to characterize the company's lack of such capability with statements of the "we think what gamers want is high-res gaming, not FSAA" calibre.

But I would also argue that vpu development along the lines of radical improvements in pixel fill rate per clock, radical improvements in general vpu MHz rates, and things like Z-compression and color compression developments have also had a major impact on "real-time" FSAA in 3D chips since the V5.

There's a big difference between the SIGGRAPH a-buffer principles and what 3dfx implemented in hardware in the V5--as big a difference between them as there is between the way a V5 did FSAA and an R3xx does it today, IMO. Back in the early 90's with the software accumulation buffer schemes employed by studios with OpenGl you might get a frame rendered every 10 minutes or so, depending on a number of variables.

Copying this approach directly would never have worked for 3dfx as they needed to do it with something approaching "real-time" frame rates. But isn't that a capsule picture of the 3D-chip industry as a whole? Heh--every time the software guys proclaim "You can't do studio-quality 3D rendering in real time" each generation of chip brings that goal progressively closer...;)

Actually, I think you have who attacked whom a bit backwards. 3dfx was "attacked" by nVidia over the issue of FSAA--and the company tried many times to diminish the significance of what 3dfx was doing by characterizing it as "nothing but software which anybody can do." Problem was nVidia could never do it at the time--regardless of trying fervently, and fruitlessly. Sitting here three years after the V5 shipped it's obvious nVidia could not have been more wrong about its initial "assessment" of what 3dfx did with the V5 and FSAA, isn't it? Heh--today, quality FSAA is not an optional component for a successful 3D chip--it's a necessity.
 
Xmas said:
Russ,
NVidia is using filter at scanout for their pure multisampling modes (NV25: for 2x and Quincunx, NV3x: for all MS modes), not the supersampling modes. The simple reason for this is that filter at scanout is more effective the higher the framerate to refresh rate ratio is.
Heh. I'll be quiet now. :)
 
WaltC said:
Simon F - who wrote this eons ago said:
The T-Buffer technology maintained N buffers, each of which could be rendered independently and then recombined on-the-fly (AFAIAA) in the DAC feed. The idea behind it was to re-order what could be done in an accumulation buffer (see SIGGRAPH in, err, about the early 90s) so that instead of doing N passes serially, the N passes could be done in parallel. Unfortunately, N, was typically rather small which meant than some effects, eg Depth of field and motion blur, weren't done brilliantly well.

But the point I've been making is this:

Take out the T-buffer in the V5 and you don't get FSAA, period.
Well that comes as a big surprise, full stop. If you take of the texturing hardware you can also stop it doing texturing. :rolleyes:
The T-buffer was not an accumulation buffer in the strictest sense of the word--ie, it was not merely areas in local ram denoted as buffers.
I never said it was exactly the same as an Accumulation buffer or else I would have said so. It was a way of re-ordering 'accumulation buffer style' AA passes so that they could be done in parallel instead of serially.
The samples were hardware-jittered in the T-buffer, which is what distinguished it from a mere accumulation buffer. Everybody knew at the time that the T-buffer was *based on* accumulation buffer principles--it was not, however, ever an accumulation buffer of the type you point out.
There may have been some automatic offsets (for AA) but they could also do DOF/motion blur effects (not brilliantly well though). That therefore indicates that the N rendering engines could operate independently or, at least, the data could be disabled for particular engines.
 
Back
Top