How R300 6X FSAA works?

3dfx used RGSSAA on the Voodoo 5 parts. Each chip computed one of the samples for each pixel.

Small correction – each pipeline produced a sample. Each VSA-100 had two pipes, so if it was doing 4X then there would be two buffer per chip.

If 3dfx called this multisampling, then I would disagree with them.

I tried to pin Gary T down on this. This was what he said:

Voodoo5 performs one texture sample per sub-pixel due to the nature of its architecture, which is a multi-chip architecture. So in that sense - Voodoo5 FSAA is like super-sampling. But in other ways, e.g. FSAA being a transparent operation to the application, it is closer to multi-sampling.
 
Interesting info, Dave.

I guess I shouldn't have explained that in a so certain manner. The problem is that I was explaining what the names *should* have meant, if it had to make sense. Current names do not make a lot of sense when you think about it in a hardware & technology POV, IMO.

Now, yes, currently used names are exactly what OpenGL guy explained. So, if he wants to stick with those names, it's obvious whatever I said didn't make sense. And I'm sorry if I wasn't clear enough.

The best naming system, IMO, would be:
for NV2x-like AA: UniColo(u)r MultiSampling
for V5-like AA: MultiColo(u)r MultiSampling
for NV25's 4xS: BiColo(u)r QuadSampling ( LOL, yeah, this sounds odd, might have to find something better )
for NV10's AA: Software SuperSampling ( it's useless to say it uses multiple colo(u)rs, because SSAA using a single color is so inefficient we're never going to see it )
for Radeon 8500 AA: MultiColo(ur) MultiSampling ( just like the V5, beside it's semi-random instead of rotated )

In the current market situation, if ATI decided to call the 8500 AA's MultiSampling, it would obviously be false marketing because, using current terms, it's SuperSampling. So, yes, even V5 AA, using current terms, is SuperSampling.

My point simply was it would be better to have terms explaining what those AA modes truly do instead of the current ones. But I'm not hoping to change those terms, they are way too old and have been used for a long time. It's too late to correct them...


Uttar
 
These are the names that have been around longer than 3dfx or NVIDIA. I'd say it was MS that started confusing things with DX8 by using the term 'Multisample buffer' of the DX API version of 3dfx's T-Buffer; should have just been multi-buffer IMO.

[edit] And, AFAIK, even by your terminology you are not correct since NV2x does not use multiple buffers, but a single large buffer.
 
Bah, now I look really dumb... :oops:
I did know the current terms were used a loong time ago; that's probably why it's hopeless the change them...
I never realized nVidia did it all with a larger buffer for their NV2x AA. But their Accuview tech brief clearly states that. It does make sense when you want to take all the subpixels at the same time. But I wonder how they're managing that...
Probably by considering that 4X AA is simply 4X Horizontal, even though the samples aren't horizontal. Interesting idea...


Uttar
 
Uttar said:
I did know the current terms were used a loong time ago; that's probably why it's hopeless the change them...
Multisample could be interpreted that a single color sample is used for multiple depth samples.
Probably by considering that 4X AA is simply 4X Horizontal, even though the samples aren't horizontal. Interesting idea...
Since the memory is tiled, it doesn't have to be arranged anything like this at all. It may be that all the samples are very close together in memory.
 
Definitions are always a fun thing... :)

I've always viewed any algorithm that performs a texture sample for *EACH* sub-sample a form of supersampling. NVIDIA started the "multisampling" misdefinition craze by dictating an algorithm that only performs a single texture read for all sub-samples was "multisampling" (note- more the fault of websites defining these than the IHVs themselves, although some whitepapers/press info also seemed to lean towards this polarized view).

OpenGL Guy-
Since the memory is tiled, it doesn't have to be arranged anything like this at all. It may be that all the samples are very close together in memory.

The main point of interest concerning AA and memory arrangement I think goes back to the interesting information found in 3dfx's whitepaper on their FSAA. It stated basically two things:
1) The memory arrangement of the sample buffers was designed in such a way to allow the DAC to perform the downsample/averaging phase. This made logical sense since if the sample buffers are "tiled" similar to the normal framebuffer, some form of tricky DAC use might actually allow the downsample phase to be totally bypassed and instead done as a single DAC blend at output stage.
2) Sample buffer memory arrangement, if again similar to normal framebuffer segmentation, would require less exotic translations for applications that directly write to the framebuffer for HUDs, overlays and whatnot.

The thing that always confused me about 3dfx's AA claims and their whitepaper was how FSAA could be screenshotted when the downsample phase was allegedly a DAC blend of the sample buffers to the output signal. This would suggest some sort of "software kludge downsampling" for when capturing the final framebuffer or something similar.. Always a point of interest. :)
 
Well, the 3dfx screencaptures always needed help. Even before the T-Buffer.

They had "21" or "22" bit color by doing a special dither pattern in Voodoo2/3 that allowed the DAC to output more color than was available in the 16 bit image by guessing what the gradient should be from the dither pattern in a horizontal scan. This is why thieir 16 bit color was marketed at "22" bit. It worked quite well as long as all things in the scene could be done in one pass. Otherwise, the ditering artifacts would build and the DAC filter wouldn't work. Quake III smoke trails were the favorite thing to note this on. Quake II didn't have much of a problem.

A special screen capture utility was requred to get the "22 bit" output.

With the T-Buffer and 32 bit color, you wouldn't need a special screen capture utility (because a screen capture of 16 bit must return 16 bit data, not 22 bpp, but 32 bit color is returned with a 32 bit tbuffer capture). The driver was designed to read all of the buffer and downsample when a screen capture was taken. Actually, I think the hardware itself was designed to "fake it" by making direct video memory accesses read/write from/to the multiple buffers in many cases. Thus, the OS and many API's would think that they were dealing with a single 640x480 memory area, but the hardware was taking care of the multiple interleaved buffers in the background. This is much like how memory tiling/swizzling is done by hardware now. You access a backbuffer or texture with a linear address, but the hardware disguises the inner workings.

swizzle swizzle :D
 
Scott-

The interesting tidbit was mainly referencing FSAA screenshot images, not 16/22 or other issues with taking 3dfx screenshots. :)

The point is as follows-
If the Voodoo5 used a DAC blend to merge/average the sample buffers, then what built the final image buffer for usage with screenshots WITH fsaa?

Normal, raw framebuffer grabber tools captured the FSAA result with no problems, which would suggest there was something besides a DAC blend performing the sub-sample averaging... either that or on handing a final buffer pointer to some application, the driver "quickly" did a "best guess" averaging stage to create a final, averaged buffer with FSAA from the combination of sample buffers in some way.

So if you envision 4 sample buffers, each at 640x480, assume the four sample buffers are blended by the DAC on output to perform the sample averaging stage, this opens a can of worms for screenshots, yet screenshots all showed nicely AA'd output. That was the puzzling part for me. :)
 
I could be wrong on this, but I think the original 3DFX drivers, even when using AA, outputted only one of the sample buffer, thus giving an aliased result.
That's why 3DFX decided to make a software merger of those sample buffers, once reviewers began to ask for it. Pretty much like what nVidia did for the GF1/GF2 SSAA, but only for screenshots.

Not very sure of this, however.


Uttar
 
The thing that always confused me about 3dfx's AA claims and their whitepaper was how FSAA could be screenshotted when the downsample phase was allegedly a DAC blend of the sample buffers to the output signal. This would suggest some sort of "software kludge downsampling" for when capturing the final framebuffer or something similar.. Always a point of interest.

The hardware was capable of combining all samples when doing a Linear Frame Buffer access. There was a register that allowed turning this on and off. The master chip would automatically combine the samples from all the buffers if this was enabled.

The Glide screenshot key however did indeed use a software kludge to access 16 bit antialiased frame buffers. It would manually read each sample from each of the buffers and combine them to output a 32 bit image. If it let the hardware do it, there would be a loss in output precision.
 
Sharkfood said:
The thing that always confused me about 3dfx's AA claims and their whitepaper was how FSAA could be screenshotted when the downsample phase was allegedly a DAC blend of the sample buffers to the output signal. This would suggest some sort of "software kludge downsampling" for when capturing the final framebuffer or something similar.. Always a point of interest. :)

AFAIK, NV25 implements a similar approach to downsampling which is why its difficulat to capture a Quincunx screenshot.
 
Sharkfood said:
Definitions are always a fun thing... :)

I've always viewed any algorithm that performs a texture sample for *EACH* sub-sample a form of supersampling. NVIDIA started the "multisampling" misdefinition craze by dictating an algorithm that only performs a single texture read for all sub-samples was "multisampling"
AFAIK, SGI introduced Multisampling not Nvidia: See
K. Akeley. "RealityEngine graphics". Computer Graphics (SIGGRAPH 93 Proceedings), volume 27, pages 109-116, 1993.
 
Uttar said:
The best naming system, IMO, would be:
for NV2x-like AA: UniColo(u)r MultiSampling
for V5-like AA: MultiColo(u)r MultiSampling
for NV25's 4xS: BiColo(u)r QuadSampling ( LOL, yeah, this sounds odd, might have to find something better )
for NV10's AA: Software SuperSampling ( it's useless to say it uses multiple colo(u)rs, because SSAA using a single color is so inefficient we're never going to see it )
for Radeon 8500 AA: MultiColo(ur) MultiSampling ( just like the V5, beside it's semi-random instead of rotated )

In the current market situation, if ATI decided to call the 8500 AA's MultiSampling, it would obviously be false marketing because, using current terms, it's SuperSampling. So, yes, even V5 AA, using current terms, is SuperSampling.

My point simply was it would be better to have terms explaining what those AA modes truly do instead of the current ones. But I'm not hoping to change those terms, they are way too old and have been used for a long time. It's too late to correct them...
I don't know what you'd want to 'correct' here.
IMO 'multisampling' for a technique that calculates one color for all covered subsamples and 'supersampling' for a technique that calculates a color for each subsample are useful terms.

The important things to know about current AA implementations are IMO:

the sample pattern:
- OG
- RG
- sparse pattern
- alternating pattern
- etc.

the method of determining sample colors:
- multisampling (one color for all covered subsamples)
- supersampling (a color for each covered subsample)
- mixedsampling (anything in between MS and SS)

other things:
- trying to 'detect' edges: fragment AA
mostly performance-related:
- (using compression)
- (downsampling at the scanout stage)

The arrangement of samples in memory besides compression is not very important, and will always be so as to provide best performance using a certain memory interface.
 
OpenGL guy said:
Xmas said:
other things:
- trying to 'detect' edges: fragment AA
Edge detection is important for multisample as well as you don't want to do Z checks on points outside the polygon.
Well, I guess the rasterizer determines which pixels are covered by a triangle, and determines a coverage mask per pixel (which can be modified further by the application).

But what I meant is detecting edges specifically to determine where to use more than one sample per pixel, like Parhelia does.
 
Sharkfood:

I already answered your quesiton. The hardware takes care of it. Raw screencaptures "think" they are getting a single linearly addressed 640x480 region, but it is actually 4 (or 2) interleaved buffers that the chip translates.

If it is a 32 bit screenshot, then it is corect. If it is 16 bit, there is a problem because the DAC takes 4 16 bit buffers and makes a 22 bit output (it does not downsample and truncate to 16 bits, but blends in 24 bit 888 color space). Thus, even though you see AA with a screenshot on 16 bit color Voodoo4/5 FSAA, it isn't what you see on the screen.

This is why I brought up the "old" voodoo 1/2/3 16 bit color screen capture issue, because it is the same problem in 16 bit land. With 32 bit color though the DAC doesn't output 30 bit (8-8-8 + 2 bits each = 30 bit) so the screen captures are what you would see on the monitor.

The multiple buffers are hiddedn not just from the app but even from screencapture utilities (if so chosen, as colourless points out). Like I mentioned, it can be like internal texture or framebuffer formats.
What about a tiled framebuffer? The screencapture thinks it is a linear set of scanlines it is addressing, but internally it is organized in blocks. WIth textures they are swizzled in memory, but you can write to them linearly if you want through OGL or D3D and you don't have to know how it orders it. Same thing with multiple buffers for FSAA or framebuffer compression, or whatever.
 
Dave-
AFAIK, NV25 implements a similar approach to downsampling which is why its difficulat to capture a Quincunx screenshot.

That's kind of what I thought. It's kludgey at best, but means the output result is a best-guess, approximation but not truly 100% applicable to the real output. Interesting.

Simon-
AFAIK, SGI introduced Multisampling not Nvidia:

I never said NVidia introduced Multisampling. Read it again. I said NVidia started the "multisampling" misdefinition craze from which websites started to assume any form of AA that took multiple samples per 1 texture sample were instantly categorized as "multisampling" :)

Scott-
I already answered your quesiton. The hardware takes care of it. Raw screencaptures "think" they are getting a single linearly addressed 640x480 region, but it is actually 4 (or 2) interleaved buffers that the chip translates.

Actually, it couldnt be "the chip translates" from the original depiction, but this seems to be erroneous.

I've actually obtained a newer mirror of a different variation of the whitepaper (the Kristof/Dave Barron edition. heh), and it seems this one explains this process a little differently from the original 3dfx, which makes a little more sense:
"The combining is done just before the RAMDAC by special video circuitry that mixes the various buffers together at the pixel level."

So the combining is indeed performed to a memory region and the result winds up in a "final" buffer (less some obvious RAMDAC finalization), rather than this step being bypassed as originally described.
 
Still wrong.

When Nvidia started to talk about multisampling, the current use were already the norm. If you want to find the company that introduced the multisampling term into the 3D PC home gaming talk, look at Gigapixel.

[Added]
From reading the quote about T-buffer mixing, how can you get to the conclusion that there is a "final" buffer?
OK, maybe some small cache so it can read in larger blocks. (Not that the quote hints to any such thing, but it's reasonable to do so.)
 
No, there is no final buffer in main memory. The Ramdac reads 4x640x480 pixels per screen refresh for 640x480 with 4x AA on Voodoo 5.

In fact, there were several lengthy and lively discussions here about how this affects the total memory bandwidth (since more bandwidth is needed for the ramdac scan than if there is a final buffer) and if it was worth it to do this. (it depends on the framerate relative to the screen refresh).

The circuitry did in fact hide this behind the scenes T-buffer stuff from the outside, and it was something that 3dfx marketing/engineering was proud of and talked about.

I remember this stuff, and just about all 3d hardware happenings from the Voodoo 1 / V1000 days through the original GeForce series before I started lurking more than posting.

3dfx always had some tricky custom ramdac thing going on from day one, and the T-buffer ended up being another expression of ramdac wizardry.

Multiple buffers, multiple chips rendering them, and the ramdac taking care of the blending of multiple buffers together for display.

Think about this: without FSAA, how would screen capture work? There was multiple chips on the Voodoo5, with separate memory domains, rendering alternate scanlines.... why didn't a screenshot only capture half the screen?
If there was a target buffer for the final output, which of the two chips memory was it written to?

The answer is that there was no final buffer. it was all abstracted by the hardware to appear as a single linear address space with "stuff" in it to the outside (OS, API's etc).
 
Back
Top