Cost of 3dfx RGSS AA on current hardware?

no-X said:
...It didn't comsume additional fillrate...

Of course it did. Supersampling consumes fillrate.

As for the cost, it is much more expensive than MSAA, and therefore high levels of it would consume significantly more rasterization resources (ALU, etc) I think it's debatable that 8xSS would be worth the cost. 8x-16xMSAA is much cheaper, and if you throw in some SS to handle alpha test, you're good to go.

But IMHO 8xSS would be completely wasting a card's computational power for marginal gains in IQ that are simply not worth it.
 
Of course it did. Supersampling consumes fillrate.
I was talking about resolve, which is free on 3Dfx hardware. Therefore I said "additional". Competing solutions consumed fillrate not only for rendering to higher resolution frame buffer, but even for resolve.

as for RGSS - I think the best solution would be hybrid mode - 2-4x RGSS + 2-4 sparsed multi-samples per each super-sample. Many people play older games and it's a pitty if you maxed all IQ settings, you play at >100FPS and you can't further improve IQ...
 
Last edited by a moderator:
But IMHO 8xSS would be completely wasting a card's computational power for marginal gains in IQ that are simply not worth it.
Like you said, it's your opinion but I'm wondering what you're comparing 8xRGSS to when using the words "marginal gains in IQ"...

Also, by "a card's computational power" in the context of your comment (the way I read it anyway), you're not really takiing about silicon space and budgets, are you?

How much computation power would be required of a video chip of today when it comes to VSA100's 8xRGSS , and how much of a percentage would that be overall?
 
Like you said, it's your opinion but I'm wondering what you're comparing 8xRGSS to when using the words "marginal gains in IQ"...

It would really only show big differences on alpha-tested edges. It would not be superior on 8xRGMS on regular edges, nor would it really be better than AF for interiors. It would consume way more ALU cycles and way more bandwidth than AF.

Also, by "a card's computational power" in the context of your comment (the way I read it anyway), you're not really takiing about silicon space and budgets, are you?

How much computation power would be required of a video chip of today when it comes to VSA100's 8xRGSS , and how much of a percentage would that be overall?

Umm, you understand that it requires executing all shaders 8 times right? Thus, for say, a G80 to do 8xRGSS, it would have to have 1024 stream processors to run the same scene that a 128SP version can do with 8xRGMS. I don't know about you, but if bought a card with 8x as many ALUs as the G80, I'd want them to be used for the game's shaders, not transparency-antialiasing at 8x.

The VSA100 did not run pixel shaders, so at most the fillrate cost of each additional sample was fetching and combining a few texels. But today's games are running shaders with tons of ALU arithmetic. Do you really want to run your skin shader through the ALUs 8 times?

The cost of large compared to NxMSAA: N * shader cycles consumed, N * texture lookups, and if you want to do that with no serious performance hit, you need to duplicate resources. The VSA100 wasn't "efficient", it duplicated the *entire GPU* to achieve it. For today's cores, you could probably increase the ALU arrays, but consider how one would like to allocate those resources.

Doest someone playing Crysis really want to turn on 8xRGSS? The "cost" in terms of resources or silicon in making SS performant is not much different than the cost of rendering 8 times as many pixels. IMHO, someone who wants high levels of super sampling is going to buy a 2xSLI or 4xSLI rigg, and even then, it will probably suck on games coming out this year. Or, they could just cut their resolution by 8x, say to 640x480.
 
Silent_Buddha: Yes, but only the grid looked rotated. In fact, the scene wasn't. Each scene was rendered multiple times and each vesion was sampled with slight offset. It was great, because the pattern was fully programmable, adjustable from frame to frame, frame-buffer resolution wasn't affected, so this method was 100% compatible (oversampling from ATi/nVidia wasn't). It didn't comsume additional fillrate - blending was done in RAMDAC... And the IQ was awesome - with adjusted mip-map LOD, textures were sharp, shimmer-free, alpha textures looked great, edges too. And in addition, it reduced dithering artifacts / increased output color-depth (when using 16bit rendering) to 22bit equivalent, which sometimes (especially when using glide) looked better than competitors 32bit output. (my favourite screenshot from WoT with 3Dfx RGSS 2x @16bit and w/o AF - VSA-100 didn't support it)

zsouthboy:

PowerSlide (RGSS 4x) (very old glide game :) )

R7x00 4x (same as GF2)
3Dfx 4x
3Dfx 4x (adjusted)

As far as I know, the "22bit quality" had nothing to do with AA, and was in fact supported already by Voodoo3's, if not even earlier cards.
 
As far as I know, the "22bit quality" had nothing to do with AA, and was in fact supported already by Voodoo3's, if not even earlier cards.
That was supported on Voodoo 1, I think. It simply referred to the fact that 3dfx's 16-bit rendering was dithered to give what 3dfx claimed was the equivalent of 22-bit colour. (This became something they needed to emphasise once 24-bit or 32-bit colour cards began to appear).

They also used to make a fuss about the fact that they had a floating-point Z-buffer, which meant that their 16-bit Z-buffer was allegedly equivalent to a higher-bit integer Z-buffer.
 
Kaotik: It was supported since V1 via 4x1 postfilter. Voodoo 3 (or Banshee?) added 2x2 postfilter mode, but both modes were only adaptive interpolation of surrounding pixels (/colorous of surrounding pixels). 3 of 4 colour samples were taken from adjacent pixels so the result wasn't accurate. 3Dfx SSAA had similar 24->16->22bit postfitlering effect, but because all four colour samples were in fact 4 subsamples lying within one pixel, the result was exact and didn't cause loss of detail. I would link B3Ds article, but I can't find it...

Otherwise said, if you blend 4 16bit color samples, the resulting color depth will be increased up-to 22bit. And from this standpoint it's irrelevant, if these samples belong to one pixel (~FSAA) or various pixels (blurring).

//edit: yupeee, here:

http://web.archive.org/web/20050211203852/www.beyond3d.com/articles/v5_5500_IQ/index2.php
 
Last edited by a moderator:
As far as I know, the "22bit quality" had nothing to do with AA, and was in fact supported already by Voodoo3's, if not even earlier cards.

That, IIRC, is correct. The 22bit quality was a post filter that was applied to 16-bit results in the framebuffer to try to mask the dithering artifacts.
 
How is the AA in rthdribl achieved? It even has a 16x mode, and the more samples, the better the motion blur looks indeed, so I guess it must be working. Does it have its own version of anti-aliasing, emulating the method mentioned here?
http://www.daionet.gr.jp/~masa/rthdribl/


From what I read on this thread, the RGSS method for AA is quite a delicious one :) You could have mega AA modes without the memory impact of supersampling, the frames would be added to the previous on a given % depending on the AA level, if this could be supported by hardware.
 
They also used to make a fuss about the fact that they had a floating-point Z-buffer, which meant that their 16-bit Z-buffer was allegedly equivalent to a higher-bit integer Z-buffer.
When used correctly, a floating-point Z buffer is far superior to a fixed-point one. Most PowerVR chips also use FP Z.

Unfortunately certain APIs have got the mapping for Near/Far -> Screen 0..1 back-the-front for optimal FP-Z behaviour :rolleyes:
 
The VSA100 did not run pixel shaders, so at most the fillrate cost of each additional sample was fetching and combining a few texels. But today's games are running shaders with tons of ALU arithmetic. Do you really want to run your skin shader through the ALUs 8 times?

wouldn't that cure instances of shader aliasing?
and what does really change? that means your GPU performance scales down by a factor of 8, all resources being affected (bandwith, fillrate, math). isn't that the same as on voodoo?
sure it's gonna be inefficient but you get supersampling on multi-GPU. remove the multi-GPU thing, you still have a big ass GPU and supersampling is one way of using its brute force. stupid but so are 16x sampled shadows, high res and all.


----------
only voodoo5 6000 can do 8x, V5 5500 is limited to 4x and V4 to 2x.
as for the usefulness, well, games like quake 3, CS etc. don't magically disappear from people's hard drives, some ppl have a CRT or "low res" 1280 or 1440x900 LCD. when they need a high end card to play crysis, oblivion and whatever else, why not put some of the excesss power to use in older games. and now the games so old that they run well on low end PC are called HL2, doom3, painkiller etc.
 
Last edited by a moderator:
sure it's gonna be inefficient but you get supersampling on multi-GPU. remove the multi-GPU thing, you still have a big ass GPU and supersampling is one way of using its brute force. stupid but so are 16x sampled shadows, high res and all.
Many-samples shadows aren't "stupid", it was just until recently the only known (and not-stupidly-slow) way to make shadows not look like ass. And I don't believe many people will agree with you when it comes to high-res; most people would definitely prefer 1920x1200 + 2xAA to 960x600 + 8xAA, I think. At least, I would! ;)
 
I'll stop keeping repeating the same boring things, as I've got an idea : a 16x mode consisting in 4x RGSS and 4x CSAA. that could look and perform better than nvidia's 16xS.
 
MMmmmm, a Humus demo featuring 3dfx style RGSS AA.../drool

4xRGSS would be fine with me in almost any scene as that would still equate to better visuals than 4xMSAA + 4xTSAA + 8-16x AF. Especially since it would also help with shader aliasing/crawling.

Back in the day 3dfx's 4xRGSS was superior to 4xSSAA on Nvidia hardware and produced better AF than Nvidia's highest AF algorythm at the time. Everything was less "blurry" and had less Texture shimmer.

Except in RTS games that allowed a larger window with increased resolution as opposed to scaling everything with regards to increased resolution, I would take 1680x1080 or even 1440x960 or heck even 1280x768 with 4xRGSS vs 1920x1200 with any amount of MSAA. That said for any strategy game that actually allowed a larger view without scaling with resolution, I'd have to go with higher resolution with MSAA.

Regards,
SB
 
Regarding cost, there are 2 nice properties to take into account:

1. The cost of postprocessing shaders is only payed once, since you only need to do it after the accumulation phase.

2. You get motion blur for free.

How could you only pay the postprocessing once on the V5? It blended the samples on scanout.
On the other hand, doing postprocessing after multisample resolve is standard practice.

Not sure how you'd get motion blur for free either.

Would be it possible for any savings in rendering an image due to much of it being similar or will the whole scene always have to be rendered for each sample of a RGSS implementation?

Well, that's the point of multisampling. Why redo so much work? But supersampling cannot, by definition. That's the "super" part of supersampling, that it samples at at higher rate. If it didn't, it wouldn't be supersampling.

Guys,
I think we need to get some terminology correct here. Rotated Grid Super Sampling (as opposed to sampling on a standard grid) is just a form of sparse sampling pattern.

Correct. There's nothing saying that you can't use a rotated grid with multisampling, and in fact, that's what both IHVs do these days for the 4x mode.

I was talking about resolve, which is free on 3Dfx hardware.

Free how? Doing it on scanout may be cheaper, but it's not free. You're still consuming a lot more bandwidth than with no AA. Instead of reading a single sample, you read four. Still better than reading 5 and writing 1 which is the case with a resolve pass before scanout, but hardly free.
 
Back
Top