Differences in AF quality betwen 6800 Ultra and 9800XT

KimB · Apr 23, 2004

demalion said:
I thought it was primarily from the approximation used for sampling selection? Wouldn't this allow a computational savings even when the output was the same?

GPU's have special hardware for this. This is why it costs more transistors to do it right. This shouldn't impact performance in any way.

What about allowing more efficient texture cache usage, which might allow bandwidth savings in the same circumstance?

Um. Of course. This comes naturally with the lower LOD you get on many surfaces.

The idea really is very simple. Transistors are saved by skimping on the LOD selection. Performance is saved by taking fewer texture samples. Image quality suffers.

darkblu · Apr 23, 2004

demalion said:
Pardon? Where is your logic?

must have left it at home.

Of course lower LOD affects bandwidth (well, depending on your texture cache behavior and the texture), darkblu, I'm simply saying that "every ounce of speed that you get" from these aniso methods is not from less detail or more texture aliasing. It is quite literally that comment that I addressed, so I don't see how you missed it.

for a start, let's try to cut out some of the verbal ballast in this talk:
texture aliasing would mean the app upped the LOD, likely breaking the caching efficiency in the course, and hampering performance; it's the less detail that bandwidth savings (read 'speed ups') come from.

For example, for your question regarding detail, doesn't bilinear filtering result in a lower level of detail than point sampling?

nope. per a given LOD bilinear sampling would reconstruct better or equally compared to nearest sampling.

Is it automatically faster, or is it the same speed because of architectural decisions made to make bilinear filtering as fast?

it's architectural.

What opportunities might exist for an implementation of a sampling determination approximation to spend transistors on bandwidth savings that don't depend on less texture samples for gain?

zero opportunities (for less texure samples per clock, not for less texure samples per se. that's what bandwidth saving is - you put through less data per unit of time.)

For Chalnoth's original question, for "either aliasing or loss of detail" in order to gain performance, I don't see any basis for his assumption other than a long string of selective perception and where he sees "issues". Aliasing and detail would be determined by how efficiently, per clock and resource usage, you implemented the implementation, and usability would be determined by how efficiently, per transistor, you could implement it...what he says is impossible is not.

honestly, having difficulties to follow you here. what you say is correct, yet how that is supposed to back up your point? i.e. where does the final sentence of the paragraph come from?

What about fast trilinear, or any of a host of other hardware implementations? Has this been proven to be a performance saving shortcut that loses image quality? I'm under the impression that it sets a bar that exceeds brilinear (both nVidia's and ATI's relatively unexposed RV versions) tradeoff without disqualifying disadvantages...am I mistaken?

which fast trilinear - PVR's one? if so - it does save on bandwidth by not reading in additional texels. the fact that it can (nearly) reconstruct that skipped data is a lucky propety of this particular algorithm. if you can show me an algorithm for Nth degree of anisotropic texture sampling, which algorithm somehow reconstructs data from way lower LODs that would be required in the ideal case then I'm all for it, and all power to you. until then - any degree-dropping-by-the-angle implementations will remain details-wise inferior. no matter of anybody's beliefs.

What about the rest of my questions asking not to bypass any discussion of cost benefit simply because it is inconvenient to preference, for example floating point textures?

i'll leave that to chalnoth for the time being.

Tim Murray · Apr 23, 2004

Chalnoth said:
The Baron said:

No, it's currently believed to be a simple driver "bug" as the highest quality AF mode is not working.

Click to expand...

That's just an impossibility. The shape of the MIP borders on those screenshots is dependent solely upon the hardware's anisotropic degree selection algorithm. This is absolutely deliberate. It pisses me off, but it's deliberate. Well, actually, I'm mostly pissed off with the droves of people who supported ATI's crappy anisotropic filtering, calling it "adaptive," which I consider just plain idiotic: it was a worse approximation that used fewer transistors, and happened to have the side effect of better performance while reducing image quality.

Since nVidia got blasted for dropping the trilinear quality, they decided that the only way to improve performance with their next generation was to make the quality more similar to ATI's.

And Snookums, when NVIDIA said that the NV40 had angle-independent AF and angle-dependent AF (or at least I think they did, I was told that) in their Analysts' Day, and then a few editors told me that the 60.72 drivers couldn't enable the "High Quality" mode, and that it just didn't stick, I guess that means it's a total impossibility.

KimB · Apr 23, 2004

The Baron said:
And Snookums, when NVIDIA said that the NV40 had angle-independent AF and angle-dependent AF (or at least I think they did, I was told that) in their Analysts' Day, and then a few editors told me that the 60.72 drivers couldn't enable the "High Quality" mode, and that it just didn't stick, I guess that means it's a total impossibility.

Sorry, I thought you were saying that the angle-dependent anisotropic algorithm itself was a bug. That is an impossibility.

I do find it unlikely that they'd leave such a simple bug in review drivers, though, if the NV40 did keep the hardware around to do a better anisotropic degree selection algorithm. In other words, I'm still dubious that we'll ever see the NV4x produce an anisotropic degree selection algorithm similar to the GeForce4/FX.

KimB · Apr 23, 2004

And by the way, I really don't see how floating point texture filtering changes things at all. Obviously it will cost more bandwidth to filter them. So what?

Tim Murray · Apr 23, 2004

Chalnoth said:
The Baron said:

And Snookums, when NVIDIA said that the NV40 had angle-independent AF and angle-dependent AF (or at least I think they did, I was told that) in their Analysts' Day, and then a few editors told me that the 60.72 drivers couldn't enable the "High Quality" mode, and that it just didn't stick, I guess that means it's a total impossibility.

Click to expand...

Sorry, I thought you were saying that the angle-dependent anisotropic algorithm itself was a bug. That is an impossibility.

I do find it unlikely that they'd leave such a simple bug in review drivers, though, if the NV40 did keep the hardware around to do a better anisotropic degree selection algorithm. In other words, I'm still dubious that we'll ever see the NV4x produce an anisotropic degree selection algorithm similar to the GeForce4/FX.

Notice that I said that high quality couldn't be enabled and that the option didn't stick no matter where you enabled it. I didn't say it was a bug.

By the time of the retail launch, we'll probably see full AF.

demalion · Apr 23, 2004

Xmas said:
demalion said:

"Crappy anisotropic filtering"?

Please, point me to where the "crappy anisotropic filtering" is actually worse than not using anisotropic filtering. I'm currently under the impression that it is significantly better, and that it isn't like "brilinear" because it isn't mislabelled at all.

Click to expand...

That is not the point.

Yes it is, because no image quality tradeoff is "crappy" without context.

Limited resolution? Limited samples instead of analytical determination that is best case for the given resolution?

Just because it is significantly better than something else doesn't make it good.

And just because it is sometimes worse than something else, and offers other benefits because of it, doesn't mean it is crappy. Why are you telling me the obvious when I've already recognized it in my discussion?

Or are you proposing that 1600x1200 at 60 fps is crappy regardless of context, and saying all my points are circumvented?

Bilinear filtering is significantly better than point sampling, still it's crappy (for the purpose of simple texturing, not math lookups).

And anisotropic filtering is better than bilinear filtering alone. And trilinear is better than bilinear. And 8x sparse sampled "gamma corrected AA" is better than 2x rotated grid "non-gamma corrected AA". And 48-bit and 64-bit are better than 32-bit and 24-bit.

How about we restrict discussion to addressing the points I made about benefit and cost, so this sequence of commentary can be remotely useful in discussing things other than hardware with infinite transistor budget or clock speed? Any chance of it? :-?

Angle-dependent AF is better than no AF at all, certainly, but seeing the huge impact AF has on IQ, it should be a priority.

And seeing the huge impact AF can have on performance, that should also be a priority, if you don't happen to consider performance part of image quality.

You seem to imply that angle-dependent AF doesn't deliver "huge" quality improvements, at least in order for this line of discussion to have a point in what I actually said to address the label of "crappy".

To actually discuss your opinion and your basis, which disagrees with mine without being useless or requiring the aforementioned label as a shortcut:

I think the qualitiy of a texture filtering method is inversely proportional to the obtrusiveness of the artifacts it exhibits.

OK, so asking you to discuss how the issues manifest while not ignoring how benefits might manifest should be fine with you, right?

Therefore, I think a "balanced" AF (that is hardly affected by rotation around the Z axis, texture rotation, or the position of the moon) even at 4x is better than some 1024x AF that regularly breaks down to 2x on some surfaces.

Well, it seems to me rather obvious you're ignoring some of the angles to make that assertion, because the lack in your comparison (of 4x to 16x angle dependent AF) would be even more obtrusive. Why don't you bother to recognize that, given your opinion on image quality?

Though it wouldn't look like a windmill in a colored mip level tunnel test.

I don't really think this escapes you, so I don't know why I have to repatedly ask you not to be selective in perception. :-?

The question I'm shocked noone has asked (AFAICS): How about the savings for something like floating point texture filtering?

What do you mean? The AF algorithm shouldn't affect the data format used for filtering at all.[/quote]

I'm asking about the expense in calculation for floating point textures in transistor/performance tradeoff, and the impact on the overall architecture. I also queried the impact this would have with large scale pipeline replication. This all relates to the NV40, and what the actual cost/benefit of this aniso method would be in terms of the hardware capabilities and performance.
My mention of a lack of infinite transistor budget/clock speed and considering what might have to be given up for an alternative relates to this.

If this AF method doesn't save transistors, then the question becomes whether offering a performance benefit for the various NV40 configuration (16 and 12 as far as we know, right?) and the corresponding bandwidth available, justified however many extra transistors were required to implement a design capable of it. Is it a primarily computational savings? Bandwidth? How would that add up for all the quads/pipes in those configurations? Again, what about floating point texture filtering?

If it doesn't save transistors, it's not justified at all IMO.[/quote]

Well, fine if that's your opinion, but my points and concerns for that particular possibility just happen to remain completely unaddressed by your statement.

I don't call this useful, not because you aren't entitled to your opinion or don't have valid reason for it, but because the opinion isn't news to me, nor are those reasons and their relation to the points I made actually made evident by it or any other commentary you've made yet.

It trades image quality for performance. So does reducing the AF degree, but not locally selective.

OK, but I don't see the point of the comment with regard to what I discussed. Is being locally selective for angles that objectively can be evaluated to be rarer in a majority of game scenes "crappy" regardless of any context?

demalion · Apr 23, 2004

Chalnoth said:
demalion said:

I thought it was primarily from the approximation used for sampling selection? Wouldn't this allow a computational savings even when the output was the same?

Click to expand...

GPU's have special hardware for this. This is why it costs more transistors to do it right. This shouldn't impact performance in any way.

So, AF is done in one clock cycle of calculation no matter the degree?

What about allowing more efficient texture cache usage, which might allow bandwidth savings in the same circumstance?

Click to expand...

Um. Of course. This comes naturally with the lower LOD you get on many surfaces.

Are you being a sophist, or are you maintaining that lowering LOD is the only way to increase texture cache usage efficiency? On what basis? My current thinking is that (as one possibility) transistor savings in one place might allow transistors to be spent on better texture cache management.

For LOD, I'd say lowering LOD (sampling from lower resolution mip maps) reduces texture cache usage. I'd also say this increases texture cache effectiveness in removing bandwidth penalties. Efficiency as I meant it could do this as well, but this does not mean it has to be done the same way. I thought mentioning fast trilinear and my understanding of it would make this clear.

The idea really is very simple. Transistors are saved by skimping on the LOD selection.

This statement doesn't make sense to me, unless you mean in the context of sample selection in some way.

But, yes, it is very simple. :?:

Performance is saved by taking fewer texture samples. Image quality suffers.

I understand the simple proposition. I've responded to it already. Repeating it and ignoring the response looks more than slightly silly.

...

How about going back to the first post and the points I raised, Chalnoth? Or are you going to insist on extrapolating a conversation based on replying only to the one part you quoted initially and repeating your reply to it?

demalion · Apr 23, 2004

darkblu said:
demalion said:

Pardon? Where is your logic?

Click to expand...

must have left it at home.

Of course lower LOD affects bandwidth (well, depending on your texture cache behavior and the texture), darkblu, I'm simply saying that "every ounce of speed that you get" from these aniso methods is not from less detail or more texture aliasing. It is quite literally that comment that I addressed, so I don't see how you missed it.

Click to expand...

for a start, let's try to cut out some of the verbal ballast in this talk:
texture aliasing would mean the app upped the LOD,

That's one way to get texture aliasing. But your attack of my statement was simply concerning how I must have some really strong belief/bias in order to address Chalnoth's comment, because it is obvious "saying that a lower LOD does not affect bandwidth" is ludicrous, and necessitating that I point out how I didn't say that. :-?

likely breaking the caching efficiency in the course, and hampering performance; it's the less detail that bandwidth savings (read 'speed ups') come from.

No, it is from maintaining texture cache efficiency and thereby saving bandwidth and avoiding having latency become visible. Just because reducing LOD is one way to avoid breaking texture cache efficiency doesn't mean reducing LOD and maintaining texture cache efficiency/increasing performance therefore become interchangeable.

Or is the only possibility that cache management remained identical for 3d cores, and only cache size increased? Why would this be the case?

For example, for your question regarding detail, doesn't bilinear filtering result in a lower level of detail than point sampling?

Click to expand...

nope. per a given LOD bilinear sampling would reconstruct better or equally compared to nearest sampling.

But it would omit detail on angled surfaces in comparison to point sampling for a given LOD. Are you just ignoring angled surfaces? There is less aliasing, which is why I discussed the original comment separately from the basis for your accusation.

Is it automatically faster, or is it the same speed because of architectural decisions made to make bilinear filtering as fast?

Click to expand...

it's architectural.

Right. Now, what architectural opportunites are enabled by reducing the transistors necessary for high performance AF? None? Some? What is your basis for your selection?

What opportunities might exist for an implementation of a sampling determination approximation to spend transistors on bandwidth savings that don't depend on less texture samples for gain?

Click to expand...

zero opportunities (for less texure samples per clock, not for less texure samples per se. that's what bandwidth saving is - you put through less data per unit of time.)

Hmm? That ignores the impact of a cache on performance completely, by ignoring bandwidth savings on the throughput going into the cache while at the same time measuring samples retrieved from the cache as bandwidth throughput. This seems to effectively omit performance from the picture AFAIK. :?:

For Chalnoth's original question, for "either aliasing or loss of detail" in order to gain performance, I don't see any basis for his assumption other than a long string of selective perception and where he sees "issues". Aliasing and detail would be determined by how efficiently, per clock and resource usage, you implemented the implementation, and usability would be determined by how efficiently, per transistor, you could implement it...what he says is impossible is not.

Click to expand...

honestly, having difficulties to follow you here. what you say is correct, yet how that is supposed to back up your point? i.e. where does the final sentence of the paragraph come from?

He says it is impossible to gain performance from angle dependent AF implementation except by reducing LOD or increasing aliasing. I've discussed my commentary on texture cache, computation, and transistor budget several times now. To me, it would have saved time if Chalnoth actually addressed where I already mentioned them in my original post. But perhaps that is just me.

What about fast trilinear, or any of a host of other hardware implementations? Has this been proven to be a performance saving shortcut that loses image quality? I'm under the impression that it sets a bar that exceeds brilinear (both nVidia's and ATI's relatively unexposed RV versions) tradeoff without disqualifying disadvantages...am I mistaken?

Click to expand...

which fast trilinear - PVR's one?

Well, I was thinking of S3's, but this should suffice for now.

[if so - it does save on bandwidth by not reading in additional texels.

Sort of like a more effective texture cache in comparison to a less effective or non-existent one? At least if you look at the places where the performance impact is.

the fact that it can (nearly) reconstruct that skipped data is a lucky propety of this particular algorithm.

"Lucky property" and "(nearly)" you say? Wouldn't it seem to be spending transistors to be more efficient?

What is the quality loss tradeoff? How "obtrusive", borrowing from another discussion, is it? Is there less detail? More aliasing? How much?

if you can show me an algorithm for Nth degree of anisotropic texture sampling, which algorithm somehow reconstructs data from way lower LODs that would be required in the ideal case then I'm all for it, and all power to you.

Well, that would be one tradeoff relationship, but not the only one, that would follow what I propose.

until then - any degree-dropping-by-the-angle implementations will remain details-wise inferior.

To what, and when?

Not to 4x "ideal AF" at common game angles. Not to no AF at all. And, probably, not in performance/transistor tradeoff relationship to the "ideal" AF.
Tell me, have I said angle-dependent AF is ideal, or have I said viewing even "4x GF4 AF" as ideal in comparison to any degree of angle-dependent AF depends on picking and choosing what you look at?

no matter of anybody's beliefs.

I still don't know what beliefs you are accusing me of.

What about the rest of my questions asking not to bypass any discussion of cost benefit simply because it is inconvenient to preference, for example floating point textures?

Click to expand...

i'll leave that to chalnoth for the time being.

The first post, darkblu. It was directed to everyone. Chalnoth's commentary is based on omitting quite a bit.

KimB · Apr 23, 2004

demalion said:
So, AF is done in one clock cycle of calculation no matter the degree?

The degree selection should be (the equation is analagous to MIP map level selection).

Are you being a sophist, or are you maintaining that lowering LOD is the only way to increase texture cache usage efficiency? On what basis? My current thinking is that (as one possibility) transistor savings in one place might allow transistors to be spent on better texture cache management.

Lowering LOD is the only way to increase texture cache usage efficiency given the same texture cache/filtering algorithm. You're not going to sacrifice texture cache to spend more transistors on the LOD calculation. If you want to take the route of sacrificing image quality to save transistors, then we might as well not bother with anisotropic filtering or AA at all. That'll save lots of transistors.

Transistor counts are increasing. The "texture processors" are becoming increasingly smaller portions of the die. Using a few more transistors here is less and less of a problem.

Efficiency as I meant it could do this as well, but this does not mean it has to be done the same way. I thought mentioning fast trilinear and my understanding of it would make this clear.

Again, no bearing. If you want to accelerate texture processing, you can spend transistors there. There's no reason to sacrifice other facets of texture processing.

Xmas · Apr 23, 2004

demalion said:
Yes it is, because no image quality tradeoff is "crappy" without context.

I can see your point. However, I, for myself, simply fail to see a realistic context where angle-dependent AF might be better than (almost) non-angle-dependent AF.

And if one method is worse than another in all aspects
(but one, which is transistor count), I tend to not like it. I don't have to call it "crappy", but IMO that fits pretty well.

Note that I'm only talking about the angle dependency. For what we know, NV40 could indeed be more efficient when applying 8xAF to a quad than NV3x is.

Why are you telling me the obvious when I've already recognized it in my discussion?

The truth is, I sometimes do not understand what you're discussing exactly. Really. That is not blaming you.

And seeing the huge impact AF can have on performance, that should also be a priority, if you don't happen to consider performance part of image quality.

I certainly do, but if someone decides to use a particularly bad quality/performance tradeoff, I'm not happy.

I think the qualitiy of a texture filtering method is inversely proportional to the obtrusiveness of the artifacts it exhibits.

Click to expand...

OK, so asking you to discuss how the issues manifest while not ignoring how benefits might manifest should be fine with you, right?

Therefore, I think a "balanced" AF (that is hardly affected by rotation around the Z axis, texture rotation, or the position of the moon) even at 4x is better than some 1024x AF that regularly breaks down to 2x on some surfaces.

Click to expand...

Well, it seems to me rather obvious you're ignoring some of the angles to make that assertion, because the lack in your comparison (of 4x to 16x angle dependent AF) would be even more obtrusive. Why don't you bother to recognize that, given your opinion on image quality?

I don't ignore that. Well, at least not deliberately. It may be partial ignorance from my visual system.

First, the higher the AF level, the smaller are the differences to the previous one. So 4x to 16x isn't a bigger difference than 2x to 4x IMO.

Then, I'm not talking about side-by-side comparisons. Not "the floor is more detailed here, and the grass over there looks crispier on the left shot". Because that makes people tend to ignore the issues inside a given rendered scene. When you have two neighboring surfaces with differing degrees of AF, sometimes as much as 2x to 16x, this is glaringly obvious. And obtrusive. Far more obtrusive than the lack of detail of 4xAF compared to 16xAF on some surfaces in another screenshot, IMO.

I certainly agree that a surface with proper 16x AF applied to it looks better than one with 4x AF. And that certainly needs more performance. However, limiting the maximum level of AF applied to a surface on the basis of something as unrelated to the need of filtering as the angle of rotation around the Z-axis - that I consider just plain stupid.

Ok, that is simplified, and I recognize the mathematical basis of AF supports an optimization like that. However, neither logic nor human vision do.

What do you mean? The AF algorithm shouldn't affect the data format used for filtering at all.

Click to expand...

I'm asking about the expense in calculation for floating point textures in transistor/performance tradeoff, and the impact on the overall architecture. I also queried the impact this would have with large scale pipeline replication. This all relates to the NV40, and what the actual cost/benefit of this aniso method would be in terms of the hardware capabilities and performance.
My mention of a lack of infinite transistor budget/clock speed and considering what might have to be given up for an alternative relates to this.

Well, the more complex the texture sampling and filtering itself gets, the less relative cost is AF. The cost is per quad pipeline.
If they cut down AF to put in other features, then I think they got their priorities wrong.

If this AF method doesn't save transistors, then the question becomes whether offering a performance benefit for the various NV40 configuration (16 and 12 as far as we know, right?) and the corresponding bandwidth available, justified however many extra transistors were required to implement a design capable of it. Is it a primarily computational savings? Bandwidth? How would that add up for all the quads/pipes in those configurations? Again, what about floating point texture filtering?

Click to expand...

If it doesn't save transistors, it's not justified at all IMO.

Click to expand...

Well, fine if that's your opinion, but my points and concerns for that particular possibility just happen to remain completely unaddressed by your statement.

You begin with "If it doesn't save transistors...". Well, if it doesn't save transistors (note again: I was talking about the angle dependency only, maybe you were not, hence the misunderstanding), then it only is a method to "distribute" the efforts of AF differently to certain surfaces. Thus, any computational or bandwidth savings could also be reached through lowering the degree of AF applied, but in a more sensible manner regarding the overall IQ impression.

Bambers · Apr 23, 2004

now what I would fine amusing was if r420 had AF with no angle dependance. :?

demalion · Apr 23, 2004

Chalnoth said:
demalion said:

So, AF is done in one clock cycle of calculation no matter the degree?

Click to expand...

The degree selection should be (the equation is analagous to MIP map level selection).

What, if all samples are in the cache, the operations to properly blend them all are performed in one clock cycle, with trivial transistor budge as you go on to assert? Explain please.

Are you being a sophist, or are you maintaining that lowering LOD is the only way to increase texture cache usage efficiency? On what basis? My current thinking is that (as one possibility) transistor savings in one place might allow transistors to be spent on better texture cache management.

Click to expand...

Lowering LOD is the only way to increase texture cache usage efficiency given the same texture cache/filtering algorithm.

But we're not talking about the same texture cache/filtering algorithms, are we?

You're not going to sacrifice texture cache to spend more transistors on the LOD calculation.

Sacrifice in relation to what? What you could have done if you'd had unlimited transistor budget?

I'm not seeing the point of your assertion, since it doesn't even follow along the lines of what I proposed.

If you want to take the route of sacrificing image quality to save transistors, then we might as well not bother with anisotropic filtering or AA at all. That'll save lots of transistors.

Of course, since obviously angle dependent AF is just like no AF, and this makes sense in response to any of my points.

Transistor counts are increasing. The "texture processors" are becoming increasingly smaller portions of the die. Using a few more transistors here is less and less of a problem.

And the number of pipelines and quads, and the bit depth of the textures being processed, are increasing.

If you can reply to this observation, you'll have caught up to one of the things I asked about in my initial post.

Efficiency as I meant it could do this as well, but this does not mean it has to be done the same way. I thought mentioning fast trilinear and my understanding of it would make this clear.

Click to expand...

Again, no bearing. If you want to accelerate texture processing, you can spend transistors there. There's no reason to sacrifice other facets of texture processing.

Because everything is free in 3D, right?

darkblu · Apr 23, 2004

demalion said:
Of course lower LOD affects bandwidth (well, depending on your texture cache behavior and the texture), darkblu, I'm simply saying that "every ounce of speed that you get" from these aniso methods is not from less detail or more texture aliasing. It is quite literally that comment that I addressed, so I don't see how you missed it.

Click to expand...

for a start, let's try to cut out some of the verbal ballast in this talk:
texture aliasing would mean the app upped the LOD,

Click to expand...

That's one way to get texture aliasing. But your attack of my statement was simply concerning how I must have some really strong belief/bias in order to address Chalnoth's comment, because it is obvious "saying that a lower LOD does not affect bandwidth" is ludicrous, and necessitating that I point out how I didn't say that.

doh, i tried to clean up a reoccuring misconcepiton (i.e. increased aliasing having being associated with bandwidth savings) and you come up at me with the above?

ok, let's get this aspect of the discussion straight: your original reply to chanlnoth's pretty forward and logical statement (except for the 'aliasing' lead) didn't say anything at all except that you just disagreed with him. on what basis? - you didn't care to explain initially, at least not to a degree where a logical discussion could be led. actually, you just threw in an accusation of beliefs, which i just reflected back to you, nothing more, nothing less. so if you cut on the blaming attitude, we could hope for the signal-to-noise ration of this topic to eventually get better.

likely breaking the caching efficiency in the course, and hampering performance; it's the less detail that bandwidth savings (read 'speed ups') come from.

Click to expand...

No, it is from maintaining texture cache efficiency and thereby saving bandwidth and avoiding having latency become visible. Just because reducing LOD is one way to avoid breaking texture cache efficiency doesn't mean reducing LOD and maintaining texture cache efficiency/increasing performance therefore become interchangeable.

i never implied reduced LOD and high cache efficiency were interchangeable. it's a uni-directional relation - i.e. decreasing LOD imporves cache efficiency, not vice versa. apparently you're smart enough to know that, w/o the need from my side to step into details, so don't make me waste forum bandwidth.

apropos, if we care to introduce latencies in the picture, there are other factors (aside from caching) that affect memory reads performance, and therefore the preformance of successive operations. like page breaks, for example. so yes, you could decrease LOD, and even w/o any cache you could get higher texturing performance just becasue you end up with accessing less texture pages overall.

Or is the only possibility that cache management remained identical for 3d cores, and only cache size increased? Why would this be the case?

let's stay on topic

For example, for your question regarding detail, doesn't bilinear filtering result in a lower level of detail than point sampling?

Click to expand...

nope. per a given LOD bilinear sampling would reconstruct better or equally compared to nearest sampling.

Click to expand...

But it would omit detail on angled surfaces in comparison to point sampling for a given LOD. Are you just ignoring angled surfaces? There is less aliasing, which is why I discussed the original comment separately from the basis for your accusation.

care to explain how bilinear sampling would omit details compared to point sampling per a given LOD?

Is it automatically faster, or is it the same speed because of architectural decisions made to make bilinear filtering as fast?

Click to expand...

it's architectural.

Click to expand...

Right. Now, what architectural opportunites are enabled by reducing the transistors necessary for high performance AF? None? Some? What is your basis for your selection?

what architectural opportunities are introduced by scrapping the whole bloody part and using the sillicon left for child toys electronic watches? what is your basis for your selection? you open up the vga box at home only to find out it's full of nice pinkie toy watches - how's that for an argument?

What opportunities might exist for an implementation of a sampling determination approximation to spend transistors on bandwidth savings that don't depend on less texture samples for gain?

Click to expand...

zero opportunities (for less texure samples per clock, not for less texure samples per se. that's what bandwidth saving is - you put through less data per unit of time.)

Click to expand...

Hmm? That ignores the impact of a cache on performance completely, by ignoring bandwidth savings on the throughput going into the cache while at the same time measuring samples retrieved from the cache as bandwidth throughput. This seems to effectively omit performance from the picture AFAIK.

absolutely. i don't care what bombastic performance gains a bad approximation would produce if it's bad.

He says it is impossible to gain performance from angle dependent AF implementation except by reducing LOD or increasing aliasing. I've discussed my commentary on texture cache, computation, and transistor budget several times now. To me, it would have saved time if Chalnoth actually addressed where I already mentioned them in my original post. But perhaps that is just me.

he didn't say it's impossible - he's said it's logical that a significant part of the increased cache performance must have come from the LOD decrease alone, which is bloody logical. as aniso degree computation would have likely been done in a clock, so the approximation itself saves nothing but transistors. where those AF-saved thansistors have eventually ended up is irrelevant - actually they may have not been used at all, for christ's sake. hypothesising that they could have been used to further increase the performance of the already 'naturally boosted' AF is nothing but pure sophistry - they could have ended up as personal profit in somebody's bank account just as well. but at the end of the day we have 16th degree of aniso turning up as 4th degree (or even worse) at certain angles.

if you can show me an algorithm for Nth degree of anisotropic texture sampling, which algorithm somehow reconstructs data from way lower LODs that would be required in the ideal case then I'm all for it, and all power to you.

Click to expand...

Well, that would be one tradeoff relationship, but not the only one, that would follow what I propose.

until then - any degree-dropping-by-the-angle implementations will remain details-wise inferior.

Click to expand...

To what, and when?

Not to 4x "ideal AF" at common game angles. Not to no AF at all. And, probably, not in performance/transistor tradeoff relationship to the "ideal" AF.

actually yes, yes and possibly yes.

yes - it's inferior to ideal 4x as you know what you get independently of z-axis angle, so it saves you unpleasant surprises, especially if you're a developer.

yes - it's inferior to no AF at all because regardless of how loose an approximation it has ended up, i can bet my testicles on it that it has consumed quite a bit of transistors.

possibly yes - becasue r&d money could have been invested in actually producing a better approximation, which could have eventually ended up as the 'ideal' implementation.

Tell me, have I said angle-dependent AF is ideal, or have I said viewing even "4x GF4 AF" as ideal in comparison to any degree of angle-dependent AF depends on picking and choosing what you look at?

it's the good old argument of meeting expectations, demalion. an z-angle-dependent aniso impementation fails to meet quite some peoples expectations. we could carry out a poll, if that would show you anything.

Althornin · Apr 23, 2004

Chalnoth said:
The Baron said:

And Snookums, when NVIDIA said that the NV40 had angle-independent AF and angle-dependent AF (or at least I think they did, I was told that) in their Analysts' Day, and then a few editors told me that the 60.72 drivers couldn't enable the "High Quality" mode, and that it just didn't stick, I guess that means it's a total impossibility.

Click to expand...

Sorry, I thought you were saying that the angle-dependent anisotropic algorithm itself was a bug. That is an impossibility.

I do find it unlikely that they'd leave such a simple bug in review drivers, though, if the NV40 did keep the hardware around to do a better anisotropic degree selection algorithm. In other words, I'm still dubious that we'll ever see the NV4x produce an anisotropic degree selection algorithm similar to the GeForce4/FX.

and that is a crying shame.
AF was the one area where the R3x0 could have been improved, imo - and the one area that nv did right, before all the brilinear crap and LOD adjustments, that is.

demalion · Apr 23, 2004

Xmas said:
demalion said:

Yes it is, because no image quality tradeoff is "crappy" without context.

Click to expand...

I can see your point. However, I, for myself, simply fail to see a realistic context where angle-dependent AF might be better than (almost) non-angle-dependent AF.

In your 16x"ad"AF versus 4x"b"AF comparison, you don't see any place at all where, even just evaluating image quality, 16x"ad"AF is better than 4x"b"AF?

And if one method is worse than another in all aspects
(but one, which is transistor count), I tend to not like it. I don't have to call it "crappy", but IMO that fits pretty well.
...

But this isn't an accurate description of "ad"AF versus "b"AF. What about performance? What about what might be allowed by transistor savings for adequate performance having an impact on functionality and image quality?

Skipping over some...

...

Therefore, I think a "balanced" AF (that is hardly affected by rotation around the Z axis, texture rotation, or the position of the moon) even at 4x is better than some 1024x AF that regularly breaks down to 2x on some surfaces.

Click to expand...

Well, it seems to me rather obvious you're ignoring some of the angles to make that assertion, because the lack in your comparison (of 4x to 16x angle dependent AF) would be even more obtrusive. Why don't you bother to recognize that, given your opinion on image quality?

Click to expand...

I don't ignore that. Well, at least not deliberately. It may be partial ignorance from my visual system.

First, the higher the AF level, the smaller are the differences to the previous one. So 4x to 16x isn't a bigger difference than 2x to 4x IMO.

Well, in an FPS, where the benefit manifests most is where it is most noticable, wouldn't you agree? This is why I mentioned different game types. You also completely ignore how the "bigger difference" from 2x to 4x manifests in a game scene.

You don't need to consider either to point to advantage of one or the other in a certain circumstance, just to justify an assertion about how "obtrusive" those advantages and disadvantages are.

Then, I'm not talking about side-by-side comparisons. Not "the floor is more detailed here, and the grass over there looks crispier on the left shot". Because that makes people tend to ignore the issues inside a given rendered scene.

On the one hand, people can only notice the difference between 16x and 4x when comparing to different screenshots, and on the other the difference between 16x and 2x on some angles on different surfaces would jump out at them? OK...

When you have two neighboring surfaces with differing degrees of AF, sometimes as much as 2x to 16x, this is glaringly obvious. And obtrusive. Far more obtrusive than the lack of detail of 4xAF compared to 16xAF on some surfaces in another screenshot, IMO.

...So the ideal tradeoff for you would be to limit to optimally implementing only up to 4x AF with high performance? What strikes me here is that it sounds like, since surfaces at 2x AF versus 4x AF would be a less obtrusive change, that 4x"ad"AF would be better than 16x, because reducing the glaring difference between 16xAF and 2xAF outweight the benefit of 4x to 16x?

I certainly agree that a surface with proper 16x AF applied to it looks better than one with 4x AF. And that certainly needs more performance. However, limiting the maximum level of AF applied to a surface on the basis of something as unrelated to the need of filtering as the angle of rotation around the Z-axis - that I consider just plain stupid.
Ok, that is simplified, and I recognize the mathematical basis of AF supports an optimization like that. However, neither logic nor human vision do.

I understand the gist of your complaint, I just can't see how it overrides the things you say it does and the questions I posed about evaluating the benefits the NV40 and its derivatives might gain from it.

What do you mean? The AF algorithm shouldn't affect the data format used for filtering at all.

Click to expand...

I'm asking about the expense in calculation for floating point textures in transistor/performance tradeoff, and the impact on the overall architecture. I also queried the impact this would have with large scale pipeline replication. This all relates to the NV40, and what the actual cost/benefit of this aniso method would be in terms of the hardware capabilities and performance.
My mention of a lack of infinite transistor budget/clock speed and considering what might have to be given up for an alternative relates to this.

Click to expand...

Well, the more complex the texture sampling and filtering itself gets, the less relative cost is AF. The cost is per quad pipeline.

But computation is per pixel, so won't the transistor/performance tradeoff be affected in any case?

If they cut down AF to put in other features, then I think they got their priorities wrong.

Even if those features were floating point texture filtering and/or work to speed it up?

...

Well, fine if that's your opinion, but my points and concerns for that particular possibility just happen to remain completely unaddressed by your statement.

Click to expand...

You begin with "If it doesn't save transistors...". Well, if it doesn't save transistors (note again: I was talking about the angle dependency only, maybe you were not, hence the misunderstanding), then it only is a method to "distribute" the efforts of AF differently to certain surfaces. Thus, any computational or bandwidth savings could also be reached through lowering the degree of AF applied, but in a more sensible manner regarding the overall IQ impression.

Again, you say lower degree of AF being better than higher degree that lowers at some angles is the only non-"stupid" conclusion, and I don't see how that holds together.

EDIT: fixed quoting

KimB · Apr 23, 2004

Bambers said:
now what I would fine amusing was if r420 had AF with no angle dependance. :?

Actually, I think that would be awesome. If this is true, then I expect to see many "FanATIcs" hailing the praises of angle-independent anisotropy, which should result in no future video cards using angle-dependent anisotropy.

KimB · Apr 23, 2004

Althornin said:
and that is a crying shame.
AF was the one area where the R3x0 could have been improved, imo - and the one area that nv did right, before all the brilinear crap and LOD adjustments, that is.

Wow, we agree (well, mostly....I think the R3x0 could have improved on quite a bit more....but that's another debate). That can't happen that often.

Althornin · Apr 23, 2004

Chalnoth said:
Althornin said:

and that is a crying shame.
AF was the one area where the R3x0 could have been improved, imo - and the one area that nv did right, before all the brilinear crap and LOD adjustments, that is.

Click to expand...

Wow, we agree (well, mostly....I think the R3x0 could have improved on quite a bit more....but that's another debate). That can't happen that often.

Us agreeing is rare, indeed.
your list of "improvements" on R3x0 is sure to be ...enlightening... though (especailly considering your thoughts on nv3x and nv40) - but hey, we understand - it wasnt made by nVidia, it was made by ATI and is inherently inferior, so yeah, lets just bask in the moment of a tiny agreement, shall we?

demalion · Apr 23, 2004

darkblu said:
...
doh, i tried to clean up a reoccuring misconcepiton (i.e. increased aliasing having being associated with bandwidth savings) and you come up at me with the above?

Isn't it Chalnoth who said that?

ok, let's get this aspect of the discussion straight: your original reply to chanlnoth's pretty forward and logical statement (except for the 'aliasing' lead) didn't say anything at all except that you just disagreed with him.

No.

The line you quoted "didn't say anything except that I disagreed with him", and his assertion about "Every ounce of speed" as I'd just addressed in my reasons, stated above that.

on what basis? - you didn't care to explain initially, at least not to a degree where a logical discussion could be led.

Actually, what I think prohibited logical discussion was your reducing my reasons to the status of non-existence by cutting them out of what you quoted. I didn't pick what you quoted and replied to, you did.

actually, you just threw in an accusation of beliefs, which i just reflected back to you, nothing more, nothing less.

Actually, I made an accusation of bias after pointing out that Chalnoth had ignored almost every point I'd made on the issue and asked, specifically, how he summarily dismissed any question of computational performance cost and texture cache utilization.

so if you cut on the blaming attitude, we could hope for the signal-to-noise ration of this topic to eventually get better.

darkblu, this is a pot and kettle situation.

likely breaking the caching efficiency in the course, and hampering performance; it's the less detail that bandwidth savings (read 'speed ups') come from.

Click to expand...

No, it is from maintaining texture cache efficiency and thereby saving bandwidth and avoiding having latency become visible. Just because reducing LOD is one way to avoid breaking texture cache efficiency doesn't mean reducing LOD and maintaining texture cache efficiency/increasing performance therefore become interchangeable.

Click to expand...

i never implied reduced LOD and high cache efficiency were interchangeable. it's a uni-directional relation - i.e. decreasing LOD imporves cache efficiency, not vice versa.

Chalnoth was arguing that reducing LOD is the only way to gain performance by the expedient of ignoring any consideration of other ways to improve cache efficiency. If you don't want to imply something, stop defending Chalnoth because he happens to agree with your personal preference in AF, and discuss the points of mine he sidestepped in where you actually started your discussion. You probably want an aid for finding where the problematic and myopic defense of Chalnoth is occuring to prevent reasonable discourse...as alternative to restating it all here, away from the examples, I'll color code (yes, that was one above...more to follow).

apparently you're smart enough to know that, w/o the need from my side to step into details, so don't make me waste forum bandwidth.
apropos, if we care to introduce latencies in the picture, there are other factors (aside from caching) that affect memory reads performance, and therefore the preformance of successive operations. like page breaks, for example. so yes, you could decrease LOD, and even w/o any cache you could get higher texturing performance just becasue you end up with accessing less texture pages overall.

So, if you go back to the initial post you quoted, and my comments above that quote, can you see my point of why Chalnoth's assertion about the only way to get performance from an AF method that had angle-dependent issues?

Can we now go back to the rest of my points Chalnoth used his assertion to dismiss?
...

For example, for your question regarding detail, doesn't bilinear filtering result in a lower level of detail than point sampling?

Click to expand...

nope. per a given LOD bilinear sampling would reconstruct better or equally compared to nearest sampling.

Click to expand...

But it would omit detail on angled surfaces in comparison to point sampling for a given LOD. Are you just ignoring angled surfaces? There is less aliasing, which is why I discussed the original comment separately from the basis for your accusation.

Click to expand...

care to explain how bilinear sampling would omit details compared to point sampling per a given LOD?

Pardon, I was thinking of bilinear with mip levels versus point sampling without, and how the first could take more transistors to offer less (color) detail. :-?

I was focused on the idea of transistor tradeoffs and constructed a bad example in response to your only mentioning LOD alone in reply to me.

...
what architectural opportunities are introduced by scrapping the whole bloody part and using the sillicon left for child toys electronic watches? what is your basis for your selection? you open up the vga box at home only to find out it's full of nice pinkie toy watches - how's that for an argument?

Well, I don't see how you'd get any benefit for 3D hardware out of your example, call me crazy.

I have this strange idea that saving transistors in one place allows you to spend them somewhere else in a design when you have a limited transistor budget, as opposed to allowing you to make toy watches.
This seems to make one of the examples relevant to the topic under discussion, and the other noise.

What opportunities might exist for an implementation of a sampling determination approximation to spend transistors on bandwidth savings that don't depend on less texture samples for gain?

Click to expand...

zero opportunities (for less texure samples per clock, not for less texure samples per se. that's what bandwidth saving is - you put through less data per unit of time.)

Click to expand...

Hmm? That ignores the impact of a cache on performance completely, by ignoring bandwidth savings on the throughput going into the cache while at the same time measuring samples retrieved from the cache as bandwidth throughput. This seems to effectively omit performance from the picture AFAIK.

Click to expand...

absolutely. i don't care what bombastic performance gains a bad approximation would produce if it's bad.

And we enter the Twilight zone where completely performance penalty free 16xAF with angle dependency for FP32 textures would be bad, because "angle dependency" is "bad", and nothing else matters.

All you had to do was bypass my point, which was that your usage of "bandwidth savings" says you get no bandwidth savings for avoiding texture cache misses by saying something absolute in disagreement to my point about how you could gain performance without reducing texture samples (i.e., more coming from a successfully managed texture cache).

He says it is impossible to gain performance from angle dependent AF implementation except by reducing LOD or increasing aliasing. I've discussed my commentary on texture cache, computation, and transistor budget several times now. To me, it would have saved time if Chalnoth actually addressed where I already mentioned them in my original post. But perhaps that is just me.

Click to expand...

he didn't say it's impossible - he's said it's logical that a significant part of the increased cache performance must have come from the LOD decrease alone, which is bloody logical.

Darkblu, read this with me:

Chalnoth said:
The speed of this crappy anisotropic degree selection algorithm comes from only from selecting pixels to be at a lower degree of anisotropy than a more correct algorithm. Every ounce of speed that you get out of it is cost by a pixel that is displayed with less detail (or more texture aliasing) than you'd get with, say, the GeForce4's anisotropic.

as aniso degree computation would have likely been done in a clock, so the approximation itself saves nothing but transistors.

"Would have likely"? You mean What if?
What if, for example, aniso degree computation was used to predict how the texture cache should be managed?
What if the transistor savings allowed it to be replicated to manage more blend operations in paralllel?
What if a trick of organization of the mip level determination/storage allowed an approximation to more quickly pick samples and blend an AF result according to the relation to the screen, but the approximation failed to select usefully at other than 90 and 0 degrees? What if replicating some of the approximation algorithm twice, but with failure at an offset angle, maintained speed and improved image quality?

In such cases there is a benefit besides transistors, that might manifest with varying significance in the situations I initially outlined, and that are done a disservice by being simply dismissed as "crappy" or "stupid" without examination.

Your assertion that the only benefit is in transistor savings is simply "What if the only reason for angle dependency is to turn down AF degree at angles", and I don't think the occurence of surfaced with the reduced AF angles in common game scenes supports it. Does performance drop when looking down right angled/45 degree angled hallways, and rise in hallways angled differently with similar screen surface area? That's what Chalnoth proposes in response to what I mention, and you keep defending.

where those AF-saved thansistors have eventually ended up is irrelevant - actually they may have not been used at all, for christ's sake. hypothesising that they could have been used to further increase the performance of the already 'naturally boosted' AF is nothing but pure sophistry

No, it is simply having a recollection of AF performance behavior that indicates the tradeoff can be more significant than simply capping AF degree at certain angles and gaining performance fromt he reduced AF on certain angled surfaces. I welcome you to discuss that topic some time instead of...

- they could have ended up as personal profit in somebody's bank account just as well.

Adding noise. :-?

but at the end of the day we have 16th degree of aniso turning up as 4th degree (or even worse) at certain angles.

Yes, but I thought we agreed that the relationship between LOD and texture cache efficiency was unidirectional in dependency? If we have, why are you defending Chalnoth

if you can show me an algorithm for Nth degree of anisotropic texture sampling, which algorithm somehow reconstructs data from way lower LODs that would be required in the ideal case then I'm all for it, and all power to you.

Click to expand...

Well, that would be one tradeoff relationship, but not the only one, that would follow what I propose.

until then - any degree-dropping-by-the-angle implementations will remain details-wise inferior.

Click to expand...

To what, and when?

Not to 4x "ideal AF" at common game angles. Not to no AF at all. And, probably, not in performance/transistor tradeoff relationship to the "ideal" AF.

Click to expand...

actually yes, yes and possibly yes.

yes - it's inferior to ideal 4x as you know what you get independently of z-axis angle, so it saves you unpleasant surprises, especially if you're a developer.

Are you playing a game by dropping "at common game angles" so you can disagree with me and restate your opinion, or did you just not finish reading the phrase?

yes - it's inferior to no AF at all because regardless of how loose an approximation it has ended up, i can bet my testicles on it that it has consumed quite a bit of transistors.

Ah, so now using up more transistors is the only determining factor. So "ideal" AF is inferior to no AF? What if ideal uses "quite a bit more transistors" than angle dependent AF...doesn't that make ideal inferior to angle dependent? Maybe it is simply that any degree of angle-dependent AF provides worse image quality than No AF?
Or maybe it is my mistake in thinking there was a point other than saying something contrary?

possibly yes - becasue r&d money could have been invested in actually producing a better approximation, which could have eventually ended up as the 'ideal' implementation.

Don't let consistency limit you. :-?

What if you decided the "ideal" implementation simply wasn't within your transistor budget with your exisiting R&D? A chip design has finite limits, right?

Tell me, have I said angle-dependent AF is ideal, or have I said viewing even "4x GF4 AF" as ideal in comparison to any degree of angle-dependent AF depends on picking and choosing what you look at?

Click to expand...

it's the good old argument of meeting expectations, demalion.

And it's the good old argument of your own personal expectations not making other considerations irrelevant. I'm not proposing that my evaluation of angle-dependent AF makes your disapproval irrelevant, just that it doesn't make it useful to ignore consideration of what benefits might have resulted depending on details of the implementation beyond the "angle dependency".

an z-angle-dependent aniso impementation fails to meet quite some peoples expectations. we could carry out a poll, if that would show you anything.

For Pete's sake, I'm not disputing the existence your personal expectations, just some of the things you've gone on to say to validate them. You'll find that I've quoted them and made a relevant reply in conjunction.
What I was doing was pointing out why Chalnoth's mandates he proposed to dismiss my points did not make sense, and then you interjected your defense of your expectations to defend Chalnoth's sharing of them, ignoring those points, and going off on a noisy tangent.

Differences in AF quality betwen 6800 Ultra and 9800XT

KimB

darkblu

Tim Murray

the Windom Earle of mobile SOCs

KimB

KimB

Tim Murray

the Windom Earle of mobile SOCs

demalion

demalion

demalion

KimB

Xmas

Porous

Bambers

demalion

darkblu

Althornin

Senior Lurker

demalion

KimB

KimB

Althornin

Senior Lurker

demalion

Similar threads