R520 Running

Ailuros · Dec 29, 2004

geo said:
What about the Matrox approach to 16x? Would that be workable with a large dose of driver monkeys banging away at fixing its problems? Could you mix-mode it, laying it over 4xmsaa? That would give you 64x on most edges, and hide the worst of your edge-intersection issues as they'd still be 4xmsaa there?

Of course we are well into dangerous territory for my level of expertize, so I'll apologize in advance if that is an eye-rolling suggestion for those of you who actually know what the hell you're talking about.

Why would you want to "mix" fragment AA with common multisampling anyway?

Fragment antialiasing might very well be an option for the future; vendors just need to figure out how to antialias poly-intersections and stencil shadows for instance too. Matrox's FAA despite having 16x samples had in fact only a 4*4 EER, which you can get on both ATI's and NVIDIA's products today with "just" 4xMSAA. Increasing the sample density while keeping an ordered grid pattern might have it's uses for some corner cases, yet doesn't overall present a better antialiasing result and it burns away too many valuable resources. If 16x sample-whatever-AA then kindly an at least rotated or sparced grid.

In terms of Multisampling, ATI's 6x sparce (6*6 EER) MSAA is the highest quality MSAA implementation this far in the PC space.

All IMHO.

Tim Murray · Dec 29, 2004

what about 3DLabs' antialiasing? never really seen that compared to newer MSAA algorithms. has it changed since it was introduced on the Wildcat III?

Ailuros · Dec 29, 2004

No idea frankly; not only is 3dLabs' HW out of reach for most of us, but reviews and/or articles on their products in terms of AA are extremely rare.

The only one I remember is Wavey's own article here at B3D about WildcatIII; it sounds like some sort of pseydo-stochastic algorithm, which is actually Multisampling in the end (as confusing as it may sound).

I'm not aware of the algorithm's performance penalties, yet have a look here:

Question:

What is the configuration of the FSAA sampling on Wildcat? For instance NVIDIA employs multiple Z check units per pixel pipe resulting virtually fill rate free multisample FSAA (only and extra cycle is required per colour change on edges); does Wildcat feature 16 Z units such that all 16 samples can be output in one cycle, or does it take each sample individually requiring 16 cycles per multisampled pixel?

Answer:

We have some tricks we apply here, but I'd rather not talk about that - we need to keep some of our magic secret!

http://www.beyond3d.com/articles/wildcatiii/index.php?page=page7.inc

I doubt that 16x samples can be output in one cycle (at least on WildcatIII, no idea about newer products). But since there's a mention somewhere in that article that it didn't make sense to use <8x sample AA, it could very well be that they use two cycles to gain the 16x samples.

Trouble is that these are usually high end professional cards where cost and hardware space is less of an issue. I'm sure that if some required presuppositions would be better in the PC space, that vendors would have already opted for far more advanced logic and algorithms when it comes to AA. With all the constantly increasing shader wizz-bang that we're seeing I doubt that we'll see massive breakthroughs any time soon, but that's of course just my own opinion, so don't count on it.

Ailuros · Dec 29, 2004

By the way I recall a small statement from Eric Demers himself, prior to the R420 release about AA in general:

3DCenter: With the 6x sparse multisampling offered by Radeon 9500+ cards, ATI set a new standard in antialiasing quality on consumer hardware. Do you feel there is a need for another increase in quality in the near future, or is the cost for even better modes too high to justify the result?

Eric Demers: There's still not a single PC solution out there that gives AA quality like we do. Not only do we have a programmable pattern (which is currently set to sparse), we are also the only company offering gamma-corrected AA, which makes for a huge increase in quality. Honestly, we've looked at the best 8x and even "16x" sampling solutions, in the PC commercial market, and nobody comes close to our quality. But we are always looking at ways to improve things. One thing people do have to realize is that if you use a lossless algorithm such as ours (lossy ones can only degrade quality), the memory used by the back buffer can be quite large. At 1600x1200 with 6xAA, our buffer consumption is over 100 MBs of local memory space. Going to 8xAA would have blown passed 128MB. Consequently, with a lossless algorithm, the increase in subsamples must be matched with algorithm changes or with larger local storage. We've looked at things such as randomly altering the programmable pattern, but the low frequency noise introduced was worst than the improvement in the sampling position. Unless you have 32~64 subsamples, introducing random variations is not good. So we are looking at other solutions and algorithms. Current and future users will be happy with our solutions. Stay tuned.

It doesn't surprise me that IHVs have experimented and probably still are, with random/stochastic patterns. It's not the first time either that I read that it really makes only sense with very high sample densities. Framebuffer space is already there with 256MB and upcoming 512MB cards for 8xAA, yet there are other obvious considerations too. I doubt we'll find out until next generation products appear, yet with a gun pointed to my head I'd speculate that we'll see fundamental changes rather with the advent of WGF2.0 compliant sollutions...

overclocked · Dec 29, 2004

DonÂ´t IBM offer any 90nm nodes at their foundries?

Geo · Dec 29, 2004

Ailuros said:
Why would you want to "mix" fragment AA with common multisampling anyway?

Fragment antialiasing might very well be an option for the future; vendors just need to figure out how to antialias poly-intersections and stencil shadows for instance too.

I was being conservative/pessimistic on their ability to do the latter. . .which would of course be preferable.

psurge · Dec 30, 2004

I am dissapointed that neither ATI nor NV have gone the 3dlabs route.

If you read Wavey's article, you'll see that they preallocate 2 memory slots (a slot has space for color,stencil,Z) per pixel. Most pixels are covered by 1 or 2 primitives - those covered by more than 2 get additional dynamically allocated memory slots.

By dynamically allocating memory, they keep the average space requirements very low. Futhermore, I would assume that the blending HW probably spends time proportional to the number of slots per pixel to update/produce display pixels.

The interview states that 3dlabs no longer supports 2x and 4x AA because they didn't offer a noticeable improvement over 8x/16x. Also, their sample locations are apparently picked in programmeable fashion from a 16x16 grid.

So IMO : 16X is most definitely not insane. It also looks much better than anything ATI/NV have to offer.

3dcgi · Dec 30, 2004

Hyp-X said:
FAA and Z3 can be viewed as a lossy compression for MSAA.
Above a certain number of samples lossy compression should provide better quality/performance ratio than a lossless one.

Agreed. A question might be what is that certain number. There's probably no clear answer, but 16x might be a likely threshold.

Geo · Dec 30, 2004

3dcgi said:
Hyp-X said:

FAA and Z3 can be viewed as a lossy compression for MSAA.
Above a certain number of samples lossy compression should provide better quality/performance ratio than a lossless one.

Click to expand...

Agreed. A question might be what is that certain number. There's probably no clear answer, but 16x might be a likely threshold.

You saw the Sireric quote above, "lossy ones can only degrade quality"? Now, let's say he didn't mean it literally (and for all I know he did mean it literally). Let's say he meant "lossy ones at the number of samples we're likely to see anytime soon can only degrade quality". So, having taken that liberty with the actual text, and given that he was also talking about 8x in that quote, I find it difficult to believe that Sireric would agree that 16x lossy is better than what he has, as probably a 16x lossy is something doable in the relatively near term.

Doesn't mean he's right (or wrong), but I don't think he'd be on board at least with a 16x lossy algo. One could argue that the randomness factor he's pointing at later on as requiring at least 32-64 to be bearable would apply to a lossy algo as well.

Hyp-X · Dec 30, 2004

lossy ones can only degrade quality

I said quality/performance (and in performence I include memory usage).

When you compress a .bmp into a .jpg it will degrade the quality.
But if you can have a higher resulution source and compress it into a .jpg with the same size as the .bmp the result will be of much higher quality than the .bmp

With AA you can always have a "higher resolution source" - just take more samples.

Btw, compression will get more appealing when we'll move to FP16 targets.

WaltC · Dec 30, 2004

madshi said:
...

Also, speed is not the only important thing. ATI needs to make sure that developers keep working with ATI cards. Probably lots of developers jumped back to NVidia for their main development card, because they want/need to do SM3.0. Heck, even some XBox2 devs were said to have changed their card to NVidia, cause XBox2 does SM3.0, while R420 does not.

I see no evidence at all supporting this premise...

Developers are primarily centered around the API, and of course it would make no sense to support anything but SM2.x at present since all the current 3d-cards will support it, and there's no risk of dropping rendering support for 60% (or more) of your target market by going exclusively to SM3. (As such, I think a fair percentage of devs will be using SM1.x, even, as their base platform.)

Basically at present I would think the only devs structuring around 3 for their games are doing so as nV provides them financial incentive to do so, and even those devs are not supporting 3 at the expense of 2, but rather supporting 3 on top of 2 in an anecdotal fashion.

Last, as well, xBox2 will of course support SM2.x in addition to SM3.x (assuming it does support 3, of course), as hardware support for 3 does not preclude hardware support for 2.x or 1.x in any hardware of which I'm aware. I should also think as well that the best chip for xBox2 devs to spend their time supporting will be the xBox2 chip itself, don't you?....

In general, developers target the broadest base of installed hardware they envision with the more advanced/recent feature support being layered in additionally--ie, let me know when the first game ships which requires SM3 to the exclusion of everything else. Can't think of a single example of that at the moment.

2senile · Dec 30, 2004

WaltC, forgetting the PC for the moment, do you think that anybody developing a game for the console market will prefer to target said game for the latest hardware only?

Be gentle with me, I'm guessing that there would be incentives for game developers to do so in order to make the latest console a more "attractive" purchase.

madshi · Dec 30, 2004

WaltC said:
I see no evidence at all supporting this premise... Developers are primarily centered around the API, and of course it would make no sense to support anything but SM2.x at present since all the current 3d-cards will support it, and there's no risk of dropping rendering support for 60% (or more) of your target market by going exclusively to SM3. (As such, I think a fair percentage of devs will be using SM1.x, even, as their base platform.)

Basically at present I would think the only devs structuring around 3 for their games are doing so as nV provides them financial incentive to do so, and even those devs are not supporting 3 at the expense of 2, but rather supporting 3 on top of 2 in an anecdotal fashion.

Last, as well, xBox2 will of course support SM2.x in addition to SM3.x (assuming it does support 3, of course), as hardware support for 3 does not preclude hardware support for 2.x or 1.x in any hardware of which I'm aware. I should also think as well that the best chip for xBox2 devs to spend their time supporting will be the xBox2 chip itself, don't you?.... In general, developers target the broadest base of installed hardware they envision with the more advanced/recent feature support being layered in additionally--ie, let me know when the first game ships which requires SM3 to the exclusion of everything else. Can't think of a single example of that at the moment.

Walt, I never said that devs are dropping SM1.x or SM2.x support. But there are some devs which are adding SM3.x paths additionally to the other paths. And of course they need to test those SM3.x paths. So for them a NV4x card is ideal, since they can test all paths without needing to swap the graphics card all the time. If you have a R420 in your PC, you can't test the SM3.x path, obviously. Of course devs need to test their stuff on all popular cards, but still they usually have a main dev card. And currently NV4x seems to be the best choice for that. That's not a good situation for ATI.

About XBox2: Of course R500 would be the ideal card to do XBox2 development for. But what can the devs do, if no R500 cards are available? IIRC, someone knowledgable on this forum (don't remember who it was) hinted that some XBox2 devs were ordering NV4x cards for their XBox2 development, because the ATI chips in the preliminary XBox2 dev machines they got from Microsoft aren't able to do SM3.x. It's not something I've pulled out of my behind, it was posted on these forums by a knowledgable person.

DegustatoR · Dec 30, 2004

psurge said:
So IMO : 16X is most definitely not insane. It also looks much better than anything ATI/NV have to offer.

The question is - how fast is it? You can do 128X and it will look stunning but will give you 0,05fps which makes it just insane for mainstream market.

Reverend · Dec 30, 2004

DegustatoR said:
psurge said:

So IMO : 16X is most definitely not insane. It also looks much better than anything ATI/NV have to offer.

Click to expand...

The question is - how fast is it? You can do 128X and it will look stunning but will give you 0,05fps which makes it just insane for mainstream market.

There's another thread here that talks about what it would take to eliminate jagged edges (and presumably nothing else).

So... what are we talking about here? What it would take, or what we can do currently?

I am always confused by some of the questions here...

digitalwanderer · Dec 30, 2004

Reverend said:
I am always confused by some of the questions here...

I am always confused by some of the answers! :?

2senile · Dec 30, 2004

digitalwanderer said:
Reverend said:

I am always confused by some of the questions here...

Click to expand...

I am always confused by some of the answers! :?

I'm always confused! :?

Geo · Dec 30, 2004

Hyp-X said:
lossy ones can only degrade quality

Click to expand...

I said quality/performance (and in performence I include memory usage).

When you compress a .bmp into a .jpg it will degrade the quality.
But if you can have a higher resulution source and compress it into a .jpg with the same size as the .bmp the result will be of much higher quality than the .bmp

With AA you can always have a "higher resolution source" - just take more samples.

Btw, compression will get more appealing when we'll move to FP16 targets.

Sure, but you also get a heckuva lot better than 4-1 compression (todays 4x vs the mentioned 16x above that I was responding to) in going from a .bmp to a .jpg. That might be the key --how much compression you are getting for relatively minor degradation of the (much larger) source. Anyone have a sense of that?

Basic · Dec 30, 2004

With a constant number of samples, you can only loose when you go lossy.
With a constant memory size, you will likely win when you go lossy. (Unless you're doing it in a dumb way.)

psurge · Dec 30, 2004

DegustatoR said:
psurge said:

So IMO : 16X is most definitely not insane. It also looks much better than anything ATI/NV have to offer.

Click to expand...

The question is - how fast is it? You can do 128X and it will look stunning but will give you 0,05fps which makes it just insane for mainstream market.

Well, again according to Wavey's article, 3dlabs ditched 2x and 4x because it didn't offer a noticeable performance increase over 8x/16x.

Speculation :
I don't know any details of their implementation, but I'm guessing that performance of their method depends on the percentage of pixels which require more than the 2 preallocated memory slots. It sounds like it has a lot in common with Z3 in that each memory slot would hold a coverage mask, Z, stencil, and color. The difference between SuperScene and Z3 would be that SuperScene can increase the number of slots per pixel beyond the preallocated number (hence making it a non-lossy AA solution), whereas Z3 has to merge fragments to stay inside a fixed memory budget.

R520 Running

Ailuros

Epsilon plus three

Tim Murray

the Windom Earle of mobile SOCs

Ailuros

Epsilon plus three

Ailuros

Epsilon plus three

overclocked

Geo

Mostly Harmless

psurge

3dcgi

Geo

Mostly Harmless

Hyp-X

Irregular

WaltC

2senile

madshi

DegustatoR

Reverend

digitalwanderer

wandering

2senile

Geo

Mostly Harmless

Basic

psurge

Similar threads