AA/AF enhancements

Heathen said:
PS: Never seen a game where it looks worse, always better but sometimes by more than others.

I'm not talking games here, I'm talking angles :)
As I said, 6x AA is often slightly better, but sometimes it's slightly worse.

So, overall, it'll always look better ( unless a company would want to specifically make ATI look bad. Oh, that's not impossible, it already happened with nVidia :rolleyes: ) - but in some specific places of the screen, it might look worse.
Not anything you'll see just fragging around, of course.


Uttar
 
AA : 9 samples (MSAA, of course) on a non-ordered grid (with full control for the developer to specify the position of his samples if he wants to) should be more than enough for some time, IMHO. After that, with a high enough resolution, we should have near perfect IQ. Of course, Matrox's Fragment AA is interesting too, but IIRC it messes up with Z-buffer or something (not sure here).

AF : more samples, better algorithms (no more "force bilinear" performance mode, thanks), better performance. I think we may evolve to the point where no performance-loss texture lookups for 16-tap AF becomes the norm. I would like to see NV40 or Rxxx implement this (no performance loss for 16 samples AF in full quality).

IMHO, and especially considering the already very nice AA on the R3xx, AF is where most of the work is.
 
I don't notice FSAA or aniso filtering compared to big chunky cheesy models used in games. I think some developers can't see the polygons for the pixels. :)
 
Uttar said:
Heathen said:
PS: Never seen a game where it looks worse, always better but sometimes by more than others.

I'm not talking games here, I'm talking angles :)
As I said, 6x AA is often slightly better, but sometimes it's slightly worse.

When is it even slightly worse? We've had discussion on sparse sample selection criteria here, and with what I recall, I don't see how it is possible for ATI's 6x to be said to be worse than the 4x sampling pattern. The coverage should atleast as good, and I also think ATI is even using more than one sample pattern that might make that more likely (depending on the criteria for selection) Rather, I know it has been clearly stated that it can use more than one, but I'm not sure if it does in general.

It is less symmetrical and less "pretty" than 4x, but I'm not aware of situations with inferior coverage.

Perhaps there is an issue with blending precision (I don't know the blending precision for the AA), but as it stands I'm not aware of a sample case of problems to warrant that conclusion. What specifically are you thinking of?

Now, as for a lack of improvement over 4x, isn't that just because the 4x sample selection is just optimal quite often?

So, overall, it'll always look better ( unless a company would want to specifically make ATI look bad. Oh, that's not impossible, it already happened with nVidia :rolleyes: ) - but in some specific places of the screen, it might look worse.

Is this speculation, or do you have a sample case in mind? Trying to come up with one, I'd say that for a case where 4x OG sampling would achieve ideal coverage, and there are 4 different colors, any pattern with a sampling count that isn't a multiple of 4 would be inferior for that one pixel (that would include 9x), but that seems an extremely unlikely corner case for edge AA (more of a concern for texture sampling than edge AA, it seems to me). However, for varying angles at edges, in games or not, I'm not seeing where you're expecting problems compared to 4x RG sampling.

This type of theory for "possible to be slightly worse" doesn't seem useful, because it is a complaint about sample count not sampling pattern.
What is a better 6x sampling pattern, assuming ATI is using just the one AA analysis programs show?

Not anything you'll see just fragging around, of course.


Uttar

Was there a discussion on problems with 6x AA that I missed? If so, just point me towards it and/or refresh my memory.
 
Is there any performance advantage to using certain sample patterns over others, such as order-grid rather than rotated grid? It doesn't seem to make sense that NVidia would keep on using OG for 4x, which looks only slightly better than 2x RG (most of the time), unless there was a clear benefit. If there's no performance benefit, is it simply easier to implement?
 
For the topic question:

I have a pretty strong recollection of some sort of indication that significant improvement to AF was being planned by ATI. My recollection of that info is associated with the presumed-to-be-former R400.

I don't recall any info from nVidia, but the programmable slope calculation functions should indicate some possible improvement in this regard, shouldn't it? Would simple replication of that functionality allow performance increase? If their AF method is computation bound, that seems likely to me, though that might depend more on parallel replication: their current AF might be good enough, compared to R3xx type AF or an improvement still closely related quality and performance-wise, barring unknown issues with their implementation, if the calculation potential is scaled for a wider opportunity of application to varying pixels.

I don't recall any indication from ATI regarding their next AA implementation at the moment, though simple sample count increase seems likely.

nVidia seems like they need a serious revamping of their AA approach. They either will, or won't. It looks to me like nVidia is using a 4 pixel grid, and can select samples from it freely...increases beyond 4x sample count seem to be by "higher resolution" rendering and down sampling. The most straightforward improvement would be to increase the the size of the "flexible" grid, and the challenge there seems to be transistor count.

Gamma correction seems obvious, and I don't know of anything that would prevent that from being a focus, unless they do choose to revamp and their new approach has new hurdles or offers significant improvement by itself.
 
Ostsol said:
Is there any performance advantage to using certain sample patterns over others, such as order-grid rather than rotated grid? It doesn't seem to make sense that NVidia would keep on using OG for 4x, which looks only slightly better than 2x RG (most of the time), unless there was a clear benefit. If there's no performance benefit, is it simply easier to implement?
On some polygon edges (near-horizontal, near-vertical, near-45degree diagonal) ordered grid will lead to fewer color transitions and as thus fewer edge pixels being shared between 2 or more polygons than would be the case with rotated/sparse grids. Fewer shared pixels means higher performance, but lower image quality. Also, OG is a bit simpler/cheaper to implement than rotated/programmable grid, in that you have to do fewer calculations per sample in order to determine whether it is inside/outside the polygon currently being renderered. The savings are small, though - on the order of a couple thousand transistors per renderer pipeline.
 
Uttar said:
Heathen said:
PS: Never seen a game where it looks worse, always better but sometimes by more than others.

I'm not talking games here, I'm talking angles :)
As I said, 6x AA is often slightly better, but sometimes it's slightly worse.
Uttar


I'd love to see screenshots of "slightly worse" please. If you are just stating your opinion say so, don't pass it on as fact. There is no AA mode comparable to ATI's 6x. so how can you judge it looks "slightly worse" than anything out there.
 
I owned a Geforce 3 Ti 200 before I got myself a Radeon 9700. I have used 4x FSAA on both cards and they both have their strong and weak points.

The GF3 has its worst AA at near horizontal and near vertical edges. The R9700 has its worst AA at some rotated edges. And personally I still have to decide wheter I like the GF3 or R9700 version better. I notice the weakest points in AA on both cards.

The only real improvement the R9700 has is the gamma corrected AA which makes the transitions in colors more 'smooth'.
 
nVidias AF with up to 16x samples
+
ATis FSAA with up to 8x samples
=
a pretty ideal solution for the time being IMHO
 
Ostsol said:
Is there any performance advantage to using certain sample patterns over others, such as order-grid rather than rotated grid? It doesn't seem to make sense that NVidia would keep on using OG for 4x, which looks only slightly better than 2x RG (most of the time), unless there was a clear benefit. If there's no performance benefit, is it simply easier to implement?
It's computationally less expensive to always use an ordered grid format. In particular, it takes a more powerful triangle setup engine for any non-ordered-grid format.

Some examples of how architectures have handled it:
The Voodoo5 architcture actually duplicated each triangle, meaning that for 4x FSAA, the Voodoo5's triangle setup engine had to work four times as hard. This is the most basic, most flexible, and most costly form of non-ordered-grid FSAA.

The R300 architecture appears to handle it by first having the triangle setup engine generate a larger ordered grid, then only selecting specific samples. For example, for 6x FSAA, the R3xx will generate a 6x6 grid for each pixel. The triangle setup engine will then throw out all but one pixel in each row and column (for the final sparse-sampled pattern). This implementation may at first seem less efficient than 3dfx's, but I think that it is more open to optimization, and may be quite a bit cheaper in the number of transistors required to maintain the same performance.
 
Chalnoth said:
The R300 architecture appears to handle it by first having the triangle setup engine generate a larger ordered grid, then only selecting specific samples. For example, for 6x FSAA, the R3xx will generate a 6x6 grid for each pixel. The triangle setup engine will then throw out all but one pixel in each row and column (for the final sparse-sampled pattern). This implementation may at first seem less efficient than 3dfx's, but I think that it is more open to optimization, and may be quite a bit cheaper in the number of transistors required to maintain the same performance.
Note that this is pure speculation and may be completely incorrect :D
 
Ante P said:
So we've got NV40 and R360 and what not on the incoming.

Has anyone heard anything about the future FSAA implementation that nVidia might have in store?

I'll throw a fit if NV40 sports the same lousy FSAA that the NV3X does. I mean nVidia hasn't done jack to AA quality since they launched the GF3 two and a half years ago. (Well ok they changed their Cuncunx from "really crappy" to just "crappy", whoopeee)
And yet they sit on 3dfx IP... how ironic is that? :?

Very ironic--you aren't the only one with the same idea. What I find even more ironic is their apparent and mysterious use of post-filter blending for some degree of their FSAA in some of of their modes. That's straight out of 3dfx technology if anything is (although 3dfx never used it for FSAA.) I would relish being more specific but nVidia simply doesn't want to talk about it (in stark contrast to 3dfx's talking up their use of post-filter blending with the V3 for their 16/pseudo-22 bit display mode.)

How well I recall nVidia's initial comments on how "they felt" that gaming at higher resolutions was more important than FSAA (because they didn't have any.) It was a while even before comparative reviews between the V5 and nVidia products even used FSAA--because what nVidia finally brought to the table was decidedly a poor-man's definition of what 3dfx was doing--all the while talking FSAA down out of the other side of its corporate mouth. Typical nVidia. They sour-grape what they can't do while at the same time study how to copy it--or cheat it--whichever is most convenient.

I was extremely disappointed to see that although nVidia claimed the nv30 was co-designed with 3dfx engineers absorbed by the company there was scant indication of that in the nv30 product reference design. I expected they would do much more with FSAA than they did. I still can't believe their use of the post filter relative to FSAA is something they refuse to describe even in a general, bare-bones manner.
 
their apparent and mysterious use of post-filter blending for some degree of their FSAA in some of of their modes. That's straight out of 3dfx technology if anything is (although 3dfx never used it for FSAA.)
Walt, how many times are you going to say that?

It just isn't true. Saying it over and over again won't make it true either. Please stop spreading your misinformation.

3dfx's V5 FSAA depended on an intelligent scan out to blend the sample buffers together.

This is exactly what the current NVIDIA hardware does when super sampling.

They're essentially equivalent. Just because they won't explain it to you doesn't mean the rest of us are living in denial.
 
Well, in terms of AF, improvents will be brought mainly in terms of performance.Why?Because right now aniso is ok in terms of what it provides as a quality feature.AA is somewhat more...quirky.NV has a bad implementation right now, and that is due to a number of things that i cannot discuss right now.The only modes that are interesting, IMO, are 2X. 4xS and 8XS, and as u can see, the higher quality ones involve supersampling, and that`s a performance hog.There are two directions for the short term-provide higher pure multisampling modes or do something similar to Matrox, whose FAA is, IMHO, albeit having many errors, the nicest AA algorithm on the scene
 
Testiculus Giganticus said:
Well, in terms of AF, improvents will be brought mainly in terms of performance.Why?Because right now aniso is ok in terms of what it provides as a quality feature.AA is somewhat more...quirky.NV has a bad implementation right now, and that is due to a number of things that i cannot discuss right now.The only modes that are interesting, IMO, are 2X. 4xS and 8XS, and as u can see, the higher quality ones involve supersampling, and that`s a performance hog.There are two directions for the short term-provide higher pure multisampling modes or do something similar to Matrox, whose FAA is, IMHO, albeit having many errors, the nicest AA algorithm on the scene
It's possible that Matrox has improved upon FAA in the new P-LX, (P6/750 cards) still waiting for confirmation. (Or something to disprove it for that matter)
 
You'll know when a chip implements truly high quality AA and anisotropic filtering across the full range of customer desired framerates and resolutions when they are no longer options with performance/quality tradeoffs in the control panel, but are on all the time by default at the maximum quality settings and customers are very happy with the results.
 
OpenGL guy said:
Chalnoth said:
The R300 architecture appears to handle it by first having the triangle setup engine generate a larger ordered grid, then only selecting specific samples. For example, for 6x FSAA, the R3xx will generate a 6x6 grid for each pixel. The triangle setup engine will then throw out all but one pixel in each row and column (for the final sparse-sampled pattern). This implementation may at first seem less efficient than 3dfx's, but I think that it is more open to optimization, and may be quite a bit cheaper in the number of transistors required to maintain the same performance.
Note that this is pure speculation and may be completely incorrect :D
Well, as far as I know, the only other way to do it (aside from minor deviations from the method I laid out above) would be the much less efficient method of having totally programmable sample positions. With the above method, it's still relatively easy to use simplistic calculations on the linear interpolations for each pixel subsample.

Of course, since it is MSAA, I suppose that much efficiency may not be necessary: the only thing that needs to be interpolated for each subsample is the z value. So, it may be possible to have totally arbitrary pixel allocation.

But the sample patterns indicate that there is one sample taken (at 6x, as an example) at each row and column in a 6x6 grid, for a total of 6 samples.
 
Back
Top