what is the current state of NV50 ?

jvd · Nov 30, 2003

Ailuros said:
jvd said:

Couldn't a tile based gpu be able to use diffrent sample amounts per tile. So if tile 54 needs 16 samples compared to some that need no samples it would increase the rendering speed with fsaa on ?

Click to expand...

If you use some sort of analytical anti-aliasing approach, then certain edges should get varying amounts of samples anyway. Those edges will cross in most occassions more than one tile; what exactly have I missed here? (not to speak that I don't see why the above shouldn't be more or less the same with IMRs and tiled back buffers).

Just asking a question really don't know as much as the others here. I allways figured an adaptive or on the fly scalable fsaa and aniso setup would be the ideal way to do these things .

Ailuros · Nov 30, 2003

I don't know that much either; I just try to use common sense. Think of the edge of a wall ahead of you in a scene, which data will be divided over let's say tile 56 up to 72. Said edge of the wall is determined by the analytical algorithm that it needs 4x samples, then why would there be a reason to have per tile a different amount of samples?

I allways figured an adaptive or on the fly scalable fsaa and aniso setup would be the ideal way to do these things .

Yes adaptive algorithms are probably the way of the future. Yet for the time being they still seem too expensive from the HW implementation side. Could be of course completely wrong on both accounts...

Arun · Nov 30, 2003

I do agree a stochaistic algorithm is not certain at all; it's not extremely reliable information, far from that. Certainly the most reliable bits are FP32/FP16/FX16, the clocks, the transistor count ( 175M ), and so on.

Plus, saying it's stochaistic is extremely vague. Is it really fully and purely stochaistic? Or is it just kinda imitating stochaistic, with less advantage but also less problems and less associated transistor costs?

With 48GB/s, I'd say 8x MSAA standard is a no brainer, 4x ( or even 2x?

) for more demanding games. It'd be interesting to know how many Z Units the NV40 has though, as it could indicate towards what number of samples it's engineered.

Uttar

KimB · Nov 30, 2003

Ailuros said:
What we have seen, however, is that even with ATI's 6x sparse AA, there are angles where significant aliasing occurs (If I remember correctly, the most severe aliasing is on a diagonal line from top left to bottom right, or 135 degrees from the positive x axis), and somewhat less aliasing perpendicular to that line.

Click to expand...

That's not the issue here. The issue here is to effectively surpass the competition (in this case ATI) - preferably whatever improvements they will have in their next generation products - on a quality/performance combination ratio. I don't see the NV3x having there anything to match ATI's 6x sparse sollution from that perspective, let alone the more commonly used 4x sparse option.

I really don't understand what you're trying to say here.

I was merely stating that sparse sampling isn't the best way to go, and that a procedural or stochastic method can be better (at the very least, a procedural method would be better...a stochastic method may require more samples per pixel).

Ailuros · Dec 1, 2003

I bet it was hard to comprehend that in the Multisampling department the NV3x quality can only be compared with ATI's 2x sample MSAA.

What about 4xMSAA? Trying to critisize the anti-aliasing quality of 6x sparse on R3xx is pointless, since:

a) 4xAA is more commonly used IMO today.
b) it's 4xOG vs 4x sparse.

I was merely stating that sparse sampling isn't the best way to go, and that a procedural or stochastic method can be better (at the very least, a procedural method would be better...a stochastic method may require more samples per pixel).

For which I doubt that 8x sample stochastic makes actually sense over 8x sample sparse MSAA.

Even if it would what about the gap between 2x and 8x sample AA (ie 4x, 6x samples)?

What I want to know is where and how a supposed stochastic algorithm makes sense as a whole. Is it going to be one new algorithm delivering 2x, 4x, 6x, 8x samples stochastic or just 8x sample stochastic while keeping the current algorithm in parallel for 2x and 4x MSAA?

Either case doesn't make sense to me and I'd think NV was smart enough to opt for a far more effective, simpler and cheaper implementation than any of the above scenarios.

Ailuros · Dec 1, 2003

With 48GB/s, I'd say 8x MSAA standard is a no brainer, 4x ( or even 2x? ) for more demanding games. It'd be interesting to know how many Z Units the NV40 has though, as it could indicate towards what number of samples it's engineered.

You're supposing ~750MHz DDR2 aren't you?

Even if the ROPs remain the same in the NV40, more samples aren't a problem. Remember you can loop samples and to that with small performance penalties, with a clever implementation. Assume someone replies back that the amount is the same, it's still not enough IMO to draw any conclusions.

Anyway it's only a couple of months until we find out...

Grall · Dec 1, 2003

Z-units schmee units.

I hope they dump MSAA altogether and switch to something that DOESN'T make alpha textures look like $Â¤!#...

Tons of games using those still and even more in the pipe. Everything else looks nice, but those alpha textures turn into an instant grainy mess.

*G*

Ailuros · Dec 1, 2003

It sounds easier to enable selectively SSAA then for alphas, procedural textures or wherever else needed and continue to concenctrate on MSAA in order to save tons of fillrate.

KimB · Dec 3, 2003

Ailuros said:
With 48GB/s, I'd say 8x MSAA standard is a no brainer, 4x ( or even 2x? ) for more demanding games. It'd be interesting to know how many Z Units the NV40 has though, as it could indicate towards what number of samples it's engineered.

Click to expand...

You're supposing ~750MHz DDR2 aren't you?

Well, with compression, 8x MSAA won't be a whole lot more expensive than 4x, since the only added bandwidth comes from pixels that include triangle edges (of course, there is more bandwidth used, but it'd be a lot less extra than twice the bandwidth).

Hellbinder · Dec 3, 2003

Chalnoth said:
Ailuros said:

The point is that 8x sample stochastic would have to deliver better results than 8x sparse MSAA to justify the increased hardware cost of the first. So far I haven't seen or read a single commentary in these forums and in more than one occassions that less than 16x samples are worth the effort. Experiments and their results so far don't seem to have shown different results either.

Click to expand...

What we have seen, however, is that even with ATI's 6x sparse AA, there are angles where significant aliasing occurs (If I remember correctly, the most severe aliasing is on a diagonal line from top left to bottom right, or 135 degrees from the positive x axis), and somewhat less aliasing perpendicular to that line.

A stochastic or procedurally-changing AA sample pattern could utterly eliminate any angle preference to the AA algorithm, which would make it close to perfect (instead of jagged lines, you'd get single-pixel aliasing issues that would be far less noticeable).

But, it may be that 8 samples is too few for a purely stochastic algorithm. It might be better, in terms of visual quality and hardware implementation, to instead use a procedural method (that would use one of several sparse patterns for each pixel).

WTH is this trip you have with "aliasing"?? Lets talk about the HUGE GAPING HOLES in Nvidias Current AA method where its a complete Jaggy mess. Like virtually every game that uses angles that are not perfectly Vertical.

I am not seeing any "significant" AA aliasing even at 2x or 4x FSAA mode. I use 6x FSAA in a few games an it looks beautiful. Pristine perfection. Reading you post this kind of stuff in post after post is geting a little wearysome. :?

John Reynolds · Dec 3, 2003

Hellbinder said:
I am not seeing any "significant" AA aliasing even at 2x or 4x FSAA mode. I use 6x FSAA in a few games an it looks beautiful. Pristine perfection. Reading you post this kind of stuff in post after post is geting a little wearysome. :?

Chalnoth obviously has his biases--as do others--but ATI's AA is far from pristine perfection. Yes, it's the best currently available for the home market, but Chalnoth's statement that a good stochastic implementation being superior to ATI's 6x sparse is correct. The catch is that Ailuros is almost 100% correct in doubting that next gen. chips will support any form of stochastic AA.

KimB · Dec 4, 2003

Hellbinder said:
WTH is this trip you have with "aliasing"?? Lets talk about the HUGE GAPING HOLES in Nvidias Current AA method where its a complete Jaggy mess. Like virtually every game that uses angles that are not perfectly Vertical.

I never said anything about nVidia's method.

Yes, ATI's method of AA with the R3x0 is better than anything nVidia has at the moment.

I was saying that it's not perfect. It was my reason for promoting a stochastic or procedural AA method. It had nothing to do with any sort of comparison between ATI and nVidia. It had to do with my thinking that a stochastic (or, more likely, a procedural) AA algorithm could look better.

KimB · Dec 4, 2003

John Reynolds said:
The catch is that Ailuros is almost 100% correct in doubting that next gen. chips will support any form of stochastic AA.

Well, while I will admit that true stochastic is very unlikely (it would require storage of a huge amount of additional data, or just not getting the sample positions the same every time a pixel is "re-used" for another primitive...which could result in artifacts, not to mention the hardware required for random number generation).

However, I think that a procedural method is very possible. If you can visualize it, one method I proposed a number of months ago involved taking a sparse AA pattern that is two samples larger than the number of samples currently used. This pattern would be mapped over the current pixel grid (every once in a while, a sample would need to be thrown out, but this should be easy to do in hardware), so that each pixel would get a piece of a sparse AA pattern, and this should break up the "jaggies" quite well, for all angles.

Another way to think of it is this. Imagine we're strobing from left to right across the screen. We're using 4x AA. Each column of a 6x pattern has a pre-set coordinate where the sample is positioned.

So, the first pixel would use a 5x5 grid, but only four samples would fall within this grid:
1,2,3,4,5 (for the sake of argument, let's say column 3 has the one sample that will fall "below" this pixel)
The next pixel would use these columns:
6,1,2,3,4
Next:
5,6,1,2,3
And then:
4,5,6,1,2

This last one would have all five samples fall within the pixel, so one would need to be thrown out.

Anyway, I think this algorithm would be relatively cheap in terms of transistors to implement, though I doubt it's the best way to do a procedural algorithm (in terms of a performance/logic ratio).

I guess I'm just eternally hopeful that we'll see something better than sparse AA very soon

Basic · Dec 4, 2003

I'm going to stick out my chin here and say that I don't want any stochastic sampling.

Every time I think of it, it seems a little bit worse. The only "stochastic" stuff that I think could be worthwile is if you have a set of carefully designed(*) sparse pattern, and then pick one of those patterns per pixel in a semi-stochastic way. But I wouldn't be surprised if it would be better to carefully design a "pattern of patterns", where you hand pick various optimal sample pattern and place them in a NxN pixel tile. The sample patterns should be hand picked for working well at the side of next pattern.
I'd be surprised if you get any benefits worth mentioning when going above 8x8 pixel, and I wouldn't be surprised if 4x4 or maybe even 2x2 pixels is enough.

The bad thing about stochastic sampling is that it can introduce errors that is worse than the errors they try to remove. A major issue is that edge intensity may not be monotonous, and that's even more irritating than the steps. The pattern Chalnoth describe has that problem. And with true stochastic sampling, the problem could be even worse.

(*) By carefully designed sampling pattern for a pixel, I mean optimal for horizontal and vertical edges, and as good as possible for the rest.

Dio · Dec 4, 2003

Basic said:
A major issue is that edge intensity may not be monotonous, and that's even more irritating than the steps.

Oh yes. Nice straight lines become 'orrible lumpy things if you don't use the same AA pattern for every pixel - or lots of samples (where 'lots' is >> 8 ).

Basic · Dec 4, 2003

And with >>8 samples you could make a mighty fine sparse pattern.

But would you say that you can't even use well designed patterns and vary what patterns to use next to each other without getting "lumpy" lines?

Hmm, maybe I can see that, but I think I must find myself an example before I form an opinion on that.

Dio · Dec 4, 2003

I would think not.

However, I'm not an AA guru. There's probably one out there that could prove me wrong...

LeGreg · Dec 5, 2003

Uttar said:
Here are 10 "NV40 facts" ( okay, some are not 100% sure, but some seem pretty darn reliable to me, I'd be surprised if 60% of it wasn't right )

NV40
---
1) 600Mhz core on IBM's 0.13u technology, 48GB/s memory bandwidth with 256-bit GDDR2
2) 8x2 ( possibly 16x0 or 16x1 mode, although I'd find that rather stupid personally due to the focus on AA ).
3) FP32/FP16/FX16, this means PS1.4. is done in FX16 100% legally, while it would seem logical for PS2.0. partial precision to be done in FP16 unless MS decides to expose the HW better in an upcoming DX9 revision.
4) ( unsure ) HUGE die, NVIDIA is most likely artificially increasing die size to make cooling more efficient.
5) Slightly beyond PS3.0. / VS3.0. specificiations ( not anywhere as much as PS2.0.+ and VS2.0.+ were compared to the PS/VS2.0. standard though, I assume ).
6) Support of a Programmable Primitive Processor
7) The only units being shared between the VS and the PS are the texture lookup units ( NOT addressing units; addressing is still done on a standard FP32 unit ).
8 ) Most likely no 512MB version, that's still overkill IMO.
9) PCI-Express support, most likely ( but not certainly ) through a compatibility bridge between AGP and PCI-Express.
10) Completely new AA algorithm, most likely a stochaistic(sp?) approach.
---
Release: February-March 2004

And when it comes to the NV50...
---
1) Full ILDP; sharing of VS/PS units
2) 0.09u most likely
3) Not a TBDR!
---
Release: Mid 2005, most likely ( SIGGRAPH? )

It's not because GPU:RW isn't online anymore that it means we don't know anything about NVIDIA's next gen products

Quoted

pcchen · Dec 5, 2003

Real stochastic sampling is very unpractical: you may spend much more time generating suitable patterns (with some criteria, such as minimum distance between any two points) than actually rendering anything.

A "pattern of patterns" as Basic described is more practical, but it still has some problems: DPCM-based Z compression becomes almost impossible. Of course, with a nice number of samples you may want to use other types of Z compression instead. However, I think you need more than 8 samples to show a clear advantage over regular patterns, otherwise you may just see an increase of noise.

Johnny Rotten · Jan 7, 2004

Hellbinder said:
I am not seeing any "significant" AA aliasing even at 2x or 4x FSAA mode. I use 6x FSAA in a few games an it looks beautiful. Pristine perfection. Reading you post this kind of stuff in post after post is geting a little wearysome.

I can see aliasing in Neverwinter Nights on my 9800 with 4x fsaa on. As one example immediately off the top of my head.

what is the current state of NV50 ?

jvd

Ailuros

Epsilon plus three

Arun

Unknown.

KimB

Ailuros

Epsilon plus three

Ailuros

Epsilon plus three

Grall

Invisible Member

Ailuros

Epsilon plus three

KimB

Hellbinder

John Reynolds

Ecce homo

KimB

KimB

Basic

Dio

Basic

Dio

LeGreg

pcchen

Moderator

Johnny Rotten

Similar threads