How can AF be implemented effectively on consoles/RSX?

Something else to consider is that Mintmaster is referring to the hit being "relative" not that AF will get worse in the future...unless somehow TMUs and memory get slower for some reason. Also, how fast the "need" for more AF should be contrasted with this relatively greater hit for using AF and it shown whether there would be shortfall in capability vs. what games need in the future. While giving graphic programmers more flexibility in using AF can only be a good thing I really don't think there will be a significant gap between what devs need for decently filtered textures and the capability of the hardware...although "want" and "need" may never party together.
 
Mintmaster said:
darkblu, you say "the way it works is..." and "when you get more BW", but how often does a console get a sudden unexpected specification boost in bandwidth per clock? ;)

well, for starters i did not mean a particular console, i meant an arbitrary platform (again, pick an UMA and you get all the BW variations in the world). but the crux of this argument we're having is that you keep thinking in terms of memory consumption per clock, whereas in practice you have a FPS to shoot for, i.e. you have a certain amount of fragments, each having a certain amount of memory consumption by its deadline, which is a time span (which low and behold, is the definition of BW); to how many clocks this translates is nothing but a setup detail. or put IOW, a clock boost in your console's video subsystem would automatically translate to increased BW, even if nothing changed per clock. and that's increased BW per fragment of your scene, as you're still shooting for the same finate framerate, you're not shooting for infinity! you don't care that your scene could've run at a gazillion FPS, you care about a minimal FPS. that's your job, that's what you get paid for.


Consider two independent scenarios where you missed your target, where scenario 1 got a 20% boost by disabling AF and scenario 2 got a 30% boost by disabling it. They have different targets (independent scenarios), but both miss by the same amount. Scenario 2 is not more likely to keep the feature.

maybe in a GPU pissing contest, in a title shipping scenario both cases will be dead as you don't reach your FPS.

EDIT: <smacks head> Stupid me, why argue when the data is there?
NoAF, 8xAF. Radeon 9700: 10% hit. 9500 Pro: 6% hit.
NoAF, 8xAF. Radeon 9700: 37% hit. 9500 Pro: 30% hit.

and i never disagreed an increased BW could make AF more expensive vis-a-vis a simpler sampling method. IIRC i agreed with you about this a couple of posts ago.

Regarding aniso implementations, sure, you could be foolish enough to add 80% of the logic needed for double speed texturing whilst only getting double speed AF on pixels that need it. ;) Let's say you get 100 fps with trilinear and 50 fps with aniso (i.e. a brutal texture heavy case). Doubling the texture units gets you up to, say, 90fps. On average, doubling aniso speed halves the extra cycles, so you're up to 66fps. Which looks like a more effective use of transistors? I really doubt you'll see any IHV do this, doubt it even more so than single-cycle trilinear.

funny thing is, you can apply the same reasoning to massive-amounts tri-setup aparata, or massive BW apparata and ask 'why provide for those monstrous figures (ergo, associated trannies) when you'd use that only in < 10% of your frame'. well, you guessed it right - because certain fat pixels have to meet certain deadlines, and the easier you make that for devs the more they're gonna love you : ) apropos, you don't need to doubt about IHVs implementing single-cycle trilinar - the original voodoo did that at the expense of multi-texturing.

i mean, your very own complaint has been that devs are unwilling to pay the extra cost of aniso and they skip using it even when appropriate, even selectively - so a 'foolish AF optimised' architecture might not have been so foolish for you as a consumer, after all (and would have been even less so for the devs of those AF-lacking titles : )
 
Last edited by a moderator:
Mintmaster said:
You have to understand the purpose of the benchmarks I put. It's not to show the impact of AF on playability; rather, it's to show the impact of bandwidth on AF hit. The 9700 and 9500PRO are identical in all ways except bandwidth. Discussing minimum framerates as opposed to maximum framerates is quite irrelevant in this context.
It is not irrelevent by any means when hit to minimum framerate when enableing AF on a 9700 can be 0. In such a case; the the 9500pro, being equal in ablites aside from having less bandwidth, can take a hit on minimum framerate do to the added bandwidth requirements imposed by enabling AF.
 
Any architecture with free AF is just stupid.

As one could argue is any with free AA.

Why would you restrict the No-SSAA rendering to one quarter the speed the hardware is capable of?

You can make the same argument for spending a hundred million transistors or so on eDRAM- that is the point I was making.

Regarding eDRAM: Let's say MS didn't put it on there and used those 105M trannies for more shading power. Now all the data transfer - framebuffer, vertex, texture, CPU - is going over a tiny 128-bit bus.

Moving to a 256bit bus isn't as costly as a second die. I am not saying what they did was wrong- but if you are going to point out the 'waste' that free AF is you should at least acknowledge that the same can be said about free AA. Running in HD the difference in having AA enabled and not is much smaller then it is running in SD.

Every pixel needs to blend each subsample, unless the colour is the same for all subsamples in destination and source.

Which is the case the majority of the time.

Let me get this straight. You're comparing the AF hit of a 6600GT in Doom3 with that of a 7800GTX in 3DMark03?

I could have simply pointed you back to the 6600GT benches showing that a part with a 128bit memory bus becomes less bandwidth limited enabling 2x MSAA. Of course you can show benches where any part takes very little performance hit running AF, you can with AA too.

Yes, I am. Dave is saying the same.

No, Dave isn't. Dave is saying with a different hardware configuration performance characteristics will be different. That is a given.

If anything, more ROPs will increase the performance hit because a deficiency of ROPs could mask the extra cycles of AF.

That makes absolutely no sense. The extra cycles of AF are going to stall the sampling units of a pipe- no matter how many ROPs there are. If you have twice as many ROPs as you can write pixels it isn't going to matter if half of them are stalled at any given point- you can not write that many pixels anyway.

If you can take 4 free extra samples for AF, then you have 80% of the hardware needed to sample 4 different textures without AF. But instead, you're limiting yourself to only one sample without AF.

Not unless you are talking about the particular layout of the current NV4x based parts. As Marco made mention of, the sampling units right now are tied to the shader ALUs. Because of this the amount of transistors you would end up using for free AF would be quite expensive with that type of configuration- but there is no need for it to be that costly or even close to it. The sampling hardware required in a non restricted(ie- nothing tied together) architecture would be a relatively speaking small percentage of overall die size(when compared to eDRAM or replicating all shader hardware). Due to the way the nV4x is configured that isn't viable without using the majority of transistors needed for another ROP- which is why I have been talking about keeping the additional ROPs. That way, you have gained additional sampling units along with increased shader throughput and more raw fill(if you are forced to spend 80% of the transistor budget, then you might as well).

Also- you seem to want to compare a 16ROP part in terms of percentage performance hit with one of lesser ROPs and look at the relative performance. What you need to do is look at the absolute numbers of the 16ROP part compared to the 8ROP parts. The absolute performance enabling AF isn't close.
 
BenSkywalker said:
As one could argue is any with free AA.
You can make the same argument for spending a hundred million transistors or so on eDRAM- that is the point I was making.
Moving to a 256bit bus isn't as costly as a second die. I am not saying what they did was wrong- but if you are going to point out the 'waste' that free AF is you should at least acknowledge that the same can be said about free AA.

Except the eDRAM is not just for 'free' AA.

The eDRAM has a multitude of very important features to keep the system design balanced and effecient by removing bottlenecks, primarily by (1) providing a real fillrate to balance the system by not stalling your ROPs and (2) isolating all the large bandwidth clients from the framebuffers away from the UMA.

So your arguement does not hold any water unless the extra transistors to speed up AF have other uses that improve effeciently and performance across the board. As it is the eDRAM is much more beneficial to the system's effeciency and performance than merely offering cheap AA. Hence the use of eDRAM in the PS2 and GCN, neither of which leveraged AA significantly.

Moving to a 256bit bus isn't as costly as a second die.

Splitting the die has short term benefits by improving yields, but I cannot say whether a 256bit bus is cheaper now or not.

What we do know is that the eDRAM will be on the same package as the GPU at some point. As has been noted dozens of times on the forum, a 256bit bus does not scale as quickly and you are stuck with a large pad. This will prohibit price reduction in the long run and would be more expensive.

In the long run a 256bit bus would have been more expensive and that is one of the most important factor for any console: long term cost reduction. The eDRAM will eventually be ~1/8th its current time by the end of this generation which will result in significant savings for MS. Memory modules manufactured by a 3rd party on a large 256bit pad don't offer the same cost cutting oppurtinities, neither do they offer the same bandwidth available in the eDRAM.
 
darkblu said:
but the crux of this argument we're having is that you keep thinking in terms of memory consumption per clock, whereas in practice you have a FPS to shoot for,
The question I am addressing is this: If Xenos had more bandwidth, would AF be used more?

If it had more bandwidth, it would have had it right from the beginning. Devs playing around with the hardware initially would think the system is faster and would set higher targets. They would make the same stupid mistake of not using AF from the get-go and making it a top priority. They would see a bigger hit, and thus would be less likely to enable it.

In practice, "the FPS to shoot for" is related to the performance of the hardware at hand, not the performance of some silly hypothetical hardware that does AF for free. BW per clock affects your performance, affects your FPS, and affects your ability to reach your target FPS. Increased BW per pixel, which is what you're discussing, is irrelevant to achieving "the FPS to shoot for" unless BW per clock exceeds what the system has.

maybe in a GPU pissing contest, in a title shipping scenario both cases will be dead as you don't reach your FPS.
Unless you take something else out of the renderer. e.g. use quarter size shadow map, use pseudo HDR, use a simpler lighting shader, lower resolution, etc. It's about tradeoffs. Whether you're willing to use AF depends on which tradeoffs you want to make.

I think devs have very screwed up priorities to not be using it, but what can we do? We're not the ones making games. In all honesty, I don't think devs are even trying to put AF in because it's so easy to do. When was the last time a console game review mentioned this? When did an E3, GDC, or TGS reporter comment on AF? (Even with PC reviews, why do test without AF? Higher resolution is pointless without AF.) We should pester devs with emails, and then maybe they'll care.

and i never disagreed an increased BW could make AF more expensive vis-a-vis a simpler sampling method. IIRC i agreed with you about this a couple of posts ago.
Then why are you bringing up all these points to prove me wrong on that very issue?

So can we agree now that a higher BW architecture increases the % hit you get from enabling aniso?




funny thing is, you can apply the same reasoning to massive-amounts tri-setup aparata, or massive BW apparata and ask 'why provide for those monstrous figures (ergo, associated trannies) when you'd use that only in < 10% of your frame'. well, you guessed it right - because certain fat pixels have to meet certain deadlines, and the easier you make that for devs the more they're gonna love you : )
Let's say that instead of 16 texture units, Xenos had 8 texture units capable of free 4xAF (IMO this would make the chip bigger, but let's pretend it didn't). You think devs would applaud this decision? They have half the texturing ability for 80+% of the pixels they'll draw! That would piss off a lot of devs, and I don't think any game would benefit.
apropos, you don't need to doubt about IHVs implementing single-cycle trilinar - the original voodoo did that at the expense of multi-texturing.
Yes, the original Voodoo, the original Geforce, and early S3 cards did it. Note that 3Dfx, NVidia, and I think S3 too all stopped doing it! Read the first paragraph here and last paragraph here to see why it doesn't make sense.

i mean, your very own complaint has been that devs are unwilling to pay the extra cost of aniso and they skip using it even when appropriate, even selectively - so a 'foolish AF optimised' architecture might not have been so foolish for you as a consumer, after all (and would have been even less so for the devs of those AF-lacking titles : )
Such a foolish architecture is essentially penalizing non-AF pixels rather that making AF better. It's like stopping a child from doing math homework because he's not as good in english. Voila! More balanced grades! I certainly hope IHVs don't have to treat devs like children.
 
ok, Minty, one last post of mine in this thread as i really cannot afford any more time for the discussion ATM - i have a project at hand (not to mention i still think the whole dsicussion stemmed from pure semantics).


Mintmaster said:
The question I am addressing is this: If Xenos had more bandwidth, would AF be used more?

If it had more bandwidth, it would have had it right from the beginning. Devs playing around with the hardware initially would think the system is faster and would set higher targets. They would make the same stupid mistake of not using AF from the get-go and making it a top priority. They would see a bigger hit, and thus would be less likely to enable it.

yes, i hope AF would've been used more.

Minty, you do realise that by the logic you're promoting here we may never, ever see AF again in the future as the BWs will increase and AF will become bigger and bigger hit compared to trilinear (not to mention billinear, and why stop there, untextured fragments)? basically, AF is a doomed feature if devs adopt that logic and just throw in more of the fastest fragments they have available. ..which, btw, is partially what happens in reality now and is what pissed me off immensely about this current console gen. and which was the reason for this very thread, i believe.

I think devs have very screwed up priorities to not be using it, but what can we do? We're not the ones making games. In all honesty, I don't think devs are even trying to put AF in because it's so easy to do. When was the last time a console game review mentioned this? When did an E3, GDC, or TGS reporter comment on AF? (Even with PC reviews, why do test without AF? Higher resolution is pointless without AF.) We should pester devs with emails, and then maybe they'll care.

yes, that's always a possibility. hell, having worked on enough game projects myself i can safely say mixed up priorities happens routinely in this business. i am not sure though we need to pester those title's devs, more like their publishers, and most of all the clueless media who ohh and ahh at racing titles without a drop of aniso in 2006. when blind men lead the blind-folded consumer masses you, me and eveybody else being chained on the consumer's chain can expect a lot of head bumps from low ceiling and other hazards.


Then why are you bringing up all these points to prove me wrong on that very issue?

they can, but not necesserily. that's what i've been trying to tell you all this time (you OTH saying it would not make sense). the achitectures which would not exibit that do not even have to be 'AF-foolish' - i gave you an example a couple of posts ago of a hypotetic architecture that would be neither 'AF-foolish' nor would take a clock hit at AF for certain scenarios, given sufficient BW availabe.

So can we agree now that a higher BW architecture increases the % hit you get from enabling aniso?

yes, for all presently availabe architectues.

out and over for the moment.
 
BenSkywalker said:
As one could argue is any with free AA.
Wrong. You completely ignored every reason I gave why this is false, and also ignored the analogy I gave.

BenSkywalker said:
Mintmaster said:
Why would you restrict the No-SSAA rendering to one quarter the speed the hardware is capable of?
You can make the same argument for spending a hundred million transistors or so on eDRAM- that is the point I was making.
No you can't, because eDRAM speeds up No-MSAA rendering, and eDRAM does not cost 4x the area of all non-vertex hardware.

Which is the case the majority of the time.
And this affects only a part of the 25M logic on the daughter die, let alone the full 105M. Maybe 3% of total transistors.

I could have simply pointed you back to the 6600GT benches showing that a part with a 128bit memory bus becomes less bandwidth limited enabling 2x MSAA.
Yet again you rant about the one single data point that deviates a mere 4% from the expected result.

Of course you can show benches where any part takes very little performance hit running AF, you can with AA too.
Yet that's exactly what you did with the 7800GTX to "prove" your point. Pick a benchmark with low AF hit for the 7800, pick one with a high hit for the 6600, and conclude that more ROPs mean lower hit. :rolleyes:

We don't have NVidia chips where ROP:TMU is the only difference. The best I can show you is 6600GT vs. 6800 (here and here). The latter has 3 times the ROPs, but has a higher AF hit.


No, Dave isn't. Dave is saying with a different hardware configuration performance characteristics will be different.
You have a short memory.
Dave Baumann said:
BenSkyWalker said:
If the chip needed extra cycles for AF then having additional ROPs could grant them 'free' AF under most circumstances.
No. If you are limited by the texture samples then you are limited by the sampling end of the chip (texture samplers) not by the pixel output end of the chip (ROPs), so having extra ROPs isn't going to make much difference in this case. If you want cheaper AF then you want more texture sampling capabilities.

That makes absolutely no sense. The extra cycles of AF are going to stall the sampling units of a pipe- no matter how many ROPs there are.
You've obviously been paying no attention to everything I've said.

4 ROP, 8 TMU, single texturing from one mipmap.
No AF: 4 pix/clock.
2xAF: 4 pix/clock.
0% hit.

8 ROP, 8 TMU, single texturing from one mipmap.
No AF: 8 pix/clock.
2xAF: 4 pix/clock.
50% hit.



Not unless you are talking about the particular layout of the current NV4x based parts. As Marco made mention of, the sampling units right now are tied to the shader ALUs.
Nope, I'm talking about all hardware. For NV4x based parts, the address calculation is tied to the shader ALUs. In all hardware, the filtering is elsewhere, the texture cache is elsewhere, the memory controller is elsewhere. Read the last paragraph here (I'm talking about single cycle trilinear in that post, but the same argument applies to accelerated AF).

Also- you seem to want to compare a 16ROP part in terms of percentage performance hit with one of lesser ROPs and look at the relative performance. What you need to do is look at the absolute numbers of the 16ROP part compared to the 8ROP parts. The absolute performance enabling AF isn't close.
*sigh* That's because every 16ROP part has more TMUs than 8ROP parts. And relative performance is all that matters with respect to the thread topic: AF performance. You claim that we can speed up some non-AF pixels to increase absolute framerate. Bravo, Einstein.

-------------------------------------------------------------------------------------

BenSkyWalker, you're just arguing for the sake of arguing. You have ignored most of the things I'm telling you, and are just wasting my time. Unless you miraculously put forth an intelligent post, I'm going to ignore you from now on in this thread.
 
PeterT said:
Your arguments seem sound overall, but what, then, will that mean for the future? Are w looking at ever increasing (relative) costs of AF with increasing hardware capabilities. Is there any hardware part on GPUs (except making AF cost less cycles in hardware) that can be improved and will actually decrease the AF hit?
Actually, I think in future hardware we'll see TMUs increase faster than ROPs and faster than bandwidth, or at worst the ratios will stay about the same. I don't think the AF hit will get worse. I think we'll see 16 ROPs for a while now, maybe even 3 generations. 10-20 GPix/s is simply overkill for 1-2MPix screens. It seems TMUs are staying constant in R600, but G80 might increase them or decouple them.

In terms of hardware improvements, there are a couple of things that might work, but it's possible that they're already being done. One idea I had is making the filtering units operate on two 2x1 blocks instead of a single 2x2 block of texels. This flexibility might allow fewer total samples for the same pixel footprint without causing shimmering. It's also possible that a larger texture cache would help. This is because you have a lot of pixels in flight, and each pixel has a footprint that covers up to 16 texels with AF; on the other hand, you probably don't need to cache the texels for all those pixels at once, so I could be wrong here. I haven't seen problems in R5xx, NV4x, or G7x that would suggest cache is a problem. If it was, you'd see a greater-than-linear hit with higher AF levels in synthetic tests of angled polygons.

I think it's up to the developers now to use it selectively instead of globally. But on the whole, it doesn't seem like AF hit is a problem nowadays with the math heavy shaders we're seeing. I was surprised that R580 doesn't take a much bigger hit than R520 with AF, and I think many B3D members were expecting the same.
 
darkblu said:
Minty, you do realise that by the logic you're promoting here we may never, ever see AF again in the future as the BWs will increase and AF will become bigger and bigger hit compared to trilinear (not to mention billinear, and why stop there, untextured fragments)?
Nah, BW/TMU will likely decrease for the most part in the future. And devs will get some sense knocked into them that AF is worth the hit, especially if it's done on only a few textures that contribute to the majority of screen detail. Here's an interesting fact: If a shader doesn't alias, then edge quality is the only difference between 256x supersampling and 16xAF. They'll see the light soon enough.
 
Mintmaster said:
Here's an interesting fact: If a shader doesn't alias, then edge quality is the only difference between 256x supersampling and 16xAF. They'll see the light soon enough.

:oops: I did not know that. Is this from your own testing or somewhere else? This kind of puts the forward looking emphasis for textures to not only be larger, but smarter (Jawed posted something interesting on this; specifically @ 25min about Features Based Texturing). Seeing a 16x16 texture more detailed than a 64x64 one was very impressive. Do you see a technique like that being adapted by the IHVs? This kind of puts the onus of image quality on edge aliasing and shader aliasing in many ways.
 
It's just the nature of AF and the way hardware picks the mipmap with and without AF. Early on when NVidia introduced MSAA, it was pointed out that SSAA from 3Dfx had slightly sharper textures. But if you added 2xAF to the 4xMSAA, then 4xSSAA looked the same (aside from 3Dfx's superior rotated grid). You can keep extending this concept further.

Of course, 256x AA is pretty pointless, but you get my drift. AF is very important for image quality. It is not nearly as well known as AA, though, and it only started getting a little respect around the time R300 was released. To this day I cannot understand why we have benchmarks without AF but still at many resolutions. The IQ improvement from one standard resolution step (generally resulting in a 35% perf hit) absolutely pales in comparison to that of 16xAF.
 
The eDRAM has a multitude of very important features to keep the system design balanced and effecient by removing bottlenecks, primarily by (1) providing a real fillrate to balance the system by not stalling your ROPs and (2) isolating all the large bandwidth clients from the framebuffers away from the UMA.

By that token why not switch it to 1MB, increase the amount of tiles and have a single die? They were targetting AA with the size of the eDRAM they utilized. That's not criticism, it is simply their design choice.

So your arguement does not hold any water unless the extra transistors to speed up AF have other uses that improve effeciently and performance across the board. As it is the eDRAM is much more beneficial to the system's effeciency and performance than merely offering cheap AA. Hence the use of eDRAM in the PS2 and GCN, neither of which leveraged AA significantly.

The extra sampling units and a significantly smaller amuont of eDRAM would have both been possible on a single die if they had no concern for AA.

What we do know is that the eDRAM will be on the same package as the GPU at some point. As has been noted dozens of times on the forum, a 256bit bus does not scale as quickly and you are stuck with a large pad. This will prohibit price reduction in the long run and would be more expensive.

How long is MS going to keep the 360 in production? When will it be viable to move the daughter die to the parent die and how much additional are they spending now that they may make up later? Judging by how quickly MS dropped the original XBox will there be a viable build process to mass produce a chip of that complexity inside of the timefram they would need to switch over to save enough money to cover the difference on the money they are losing now with that setup. The original XB did not come on strong late in its life cycle as the Sony offerings have done. Much like the GC, its movement was the strongest in the mid portion of its time.

No you can't, because eDRAM speeds up No-MSAA rendering, and eDRAM does not cost 4x the area of all non-vertex hardware.

Neither does sampling units- or remotely close to it. Currently some designs have their TMUs tied to non fixed function hardware- that is not required. Considering that the basis of the RSX being a 8 ROP design is already one built on the premise of nV having reconfigured it into something we have nothing like in the PC space, what big difference would it make to offer up an alternative of what they *could* have done?

We don't have NVidia chips where ROP:TMU is the only difference.

NV10 had just additional sampling units, not even full TMUs. They have done it before.

You have a short memory.

No, I don't. Dave was talking about changing the configuration- I'm talking about actual performance.

You've obviously been paying no attention to everything I've said.

I think it is quite the opposite. Call up the ~150Million console users the world over and ask them if they care more about the relative performance drop their hardware is suffering for a given amount of transistors utilized in a design or which is faster and looks better. I'm betting you'll find one side with far more then 99% falling in line- and that line has nothing to do with what you seem to care so much about.

*sigh* That's because every 16ROP part has more TMUs than 8ROP parts. And relative performance is all that matters with respect to the thread topic: AF performance.

16ROPs- higher AF performance. Considering that is what this thread is about- how is there any confusion? You want to talk about the most effective on a transitor basis then why are we talking about any IMR? The entire premise of that is foolish. What matters is performance- period. Consumers do not care if their console has 1ROP or 1024, it makes no difference. Releasing the RSX as a 16ROP design *would*- without a doubt- offer higher AF performance. Given the die space of the chip it is easily within Sony's capability of mass producing the chip in such a configuration. If they will or not I can't say, but given the current architecture of the chip the most effective way to reasonably offer better AF performance on the RSX is to "increase" the amount of ROPs. Even in a 16ROP configuration is still comes in about 80Million transistors shy of Xenos.
 
Ben, regarding the relative affect of ROPs to AF performances, I'm no sure exactly what you think I've said, but I'm actually saying something more inline with Mintmasters interpretation.

ROPs are fairly largely indepepending on texture sampling capablities. I'm not sure I see any logic for you believing that having 16 ROPs in RSX will increase the AF performance.
 
Mintmaster said:
Nah, BW/TMU will likely decrease for the most part in the future. And devs will get some sense knocked into them that AF is worth the hit, especially if it's done on only a few textures that contribute to the majority of screen detail. Here's an interesting fact: If a shader doesn't alias, then edge quality is the only difference between 256x supersampling and 16xAF. They'll see the light soon enough.

ok, back to the topic.

allow me to make a totally different prediction to yours - regardless of what happens in the immediate future, in the long run we'll see a transition to AF-optimised TMUs. why? for the same reason we nowadays have perspective-correct interpolators in the fragment part of the pipeline - they look nicer when discreet-stepping a projected space. in this regard, any kind of isotropic texture sampling is just as incorrect as non-perspective interpolation is. and you don't get to see many perspective-incorrect interpolators in the fragment part of the pipeline these days, do you? and before you say you can do aniso with bilinear units - yes, you can, but that's far from optimal (the higher the degree of aniso, the less optimal it is to use bilinar TMUs).

as about knocking the sense into developers - let me tell you that developers are usually the most sane element in the game industry. other elements in there need much more sense-knocking.
 
Ben, regarding the relative affect of ROPs to AF performances, I'm no sure exactly what you think I've said, but I'm actually saying something more inline with Mintmasters interpretation.

Then let me ask you Dave- how is possible that a 16ROP part is going to be slower or the same speed in AF performance then a 8ROP part all things being equal? Of course that is an absurd statement- but that is what he is saying.

I'm not sure I see any logic for you believing that having 16 ROPs in RSX will increase the AF performance.

Double the ROPs with all else equal will double the sampling hardware. How is that difficult to understand the logic?

Edit- This is why I mentioned you are talking about a different configuration- you are talking about a 24TMU 8ROP part which we have not seen the likes of yet.
 
Last edited by a moderator:
You are speaking to overall performance. MintMaster is speaking to the hit AF introduces alone.

TMUs handle texturing. ROPs other stuff...like blending etc. The sampling you reference is not the same for TMUs and ROPs. ROPs will not affect AF performance as the bilinear samples used to construct AF are not a function of the ROPs but the TMUs instead. Consequently, both Xenos and RSX have enough TMUs do handle AF comfortably so as has been said...it's most certainly a devloper issue vs. either part having to struggle handling AF.

You two are on different wavelengths methinks.
 
Last edited by a moderator:
Which is why I have been stating that Dave is talking about a different configuration, Dave is saying that a 24TMU 8ROP part wouldn't be slower then a 24TMU 16ROP which is not a configuration we have seen for the NV4x as of yet. I have been saying that of the existing configurations that we know of, the 16ROP parts are hands down the fastest.

Of course Mint is talking about relative performance hits, what that is important for it lost on me. Talking about a fixed hardware platform running a fixed resolution with every game built to run explicitly on that exacting combination I don't understand why he assumes it is important what the relative performance hit would be compared to a completely different environment with a completely different set of design goals.
 
Just a side note, but is there any NV Card with 8 ROPs and a 256bit memory interface? Because that's what RSX seems to be (2x 128bit = 2*64Bit GDDR3, 2*64Bit XDR). Would be a totally different configuration for NV then...

just my 2 cents.
 
I don't think Mint ever said that the 16ROP chip wouldn't be faster than the 8ROP one. He and Dave merely said that the additional ROPs wouldn't make AF faster. They would make the whole chip faster (not always as it also depends on a lot of other aspects of the architecture like bandwidth), but not because 8 extra ROPs would make AF in particular faster.
 
Back
Top