How can AF be implemented effectively on consoles/RSX?

Ben, I'll start off with this:
BenSkywalker said:
Double the ROPs with all else equal will double the sampling hardware. How is that difficult to understand the logic?
ROPs double the output sampling rate on modern hardware this is not linked to the input texture sampling rate which is the critical element of anisotropic filtering. (Xenos is a fine example of this as its ROPs are on a completely different chip to the texture samplers)

BenSkywalker said:
Which is why I have been stating that Dave is talking about a different configuration, Dave is saying that a 24TMU 8ROP part wouldn't be slower then a 24TMU 16ROP which is not a configuration we have seen for the NV4x as of yet. I have been saying that of the existing configurations that we know of, the 16ROP parts are hands down the fastest.
There a difference here in talking bout "NV4X/G7x" and RSX in that all high end NV4x/G7x parts have a 256-bit local memoy bandwidth to deal with - i.e. anything with greater than 16 texture units and with 16 ROPs has been dealing with a 256-bit bus giving it sufficient bandwidth to output more than 8 pixels per cycle. This is not the case for RSX - its local memory bus is 128-bit, so adding more than 8 ROPs is a total waste in all sitations - it can't output more pixels because its banwidth limited.

This is irrespective of AF performance.

london-boy said:
They would make the whole chip faster (not always as it also depends on a lot of other aspects of the architecture like bandwidth), but not because 8 extra ROPs would make AF in particular faster.
No. Sony's doc already points out that 8 pixels are bandwidth limited at 420e/600m (which is a higher bandwidth to pixel ratio than at 550e/700m) - the rest of the operations are designed to be balanced (i.e. two 16-bit Z's = 1 32-bit colour in terms of bandwidth, or 1 FP16 colour = two 32-bit colours) so there will be few, if any, cases where adding more ROPs would imporve the performance.
 
And relative performance is all that matters with respect to the thread topic: AF performance.

Given that he has been repeatedly cherry picking benches to show that parts with higher ROPs take a larger relative performance hit how is it that that statement should be read? Obviously the assertion that relative performance is what matters is wrong to consumers in nearly every instance, but his mentioning this element as an indicator of what matters when discussing performance and take a look at the numbers he is posting.

I have repeatedly talked about the amount of ROPs all else being equal- he seems to be trying to argue the point about ROPs that are attached to an entirely different configuration of chip(which I had mentioned that Dave was talking about- a different configuration then what we have seen). What Dave was talking about I don't entirely agree with necessarily but I would agree that the performance would be considerably closer(ie a 24TMU 8ROP part v a 24TMU 16ROP part feeding 128bit bus) in AF heavy situations.

ROPs double the output sampling rate on modern hardware this is not linked to the input texture sampling rate which is the critical element of anisotropic filtering. (Xenos is a fine example of this as its ROPs are on a completely different chip to the texture samplers)

So a ROP that is paired with a chip that IS NOT EQUAL in configuration will perform differently? A ROP with a NON EQUAL sampling configuration will not perform the way I have been saying one that IS EQUAL will in relative terms? Is that what you are saying?
 
Last edited by a moderator:
Dave Baumann said:
No. Sony's doc already points out that 8 pixels are bandwidth limited at 420e/600m (which is a higher bandwidth to pixel ratio than at 550e/700m) - the rest of the operations are designed to be balanced (i.e. two 16-bit Z's = 1 32-bit colour in terms of bandwidth, or 1 FP16 colour = two 32-bit colours) so there will be few, if any, cases where adding more ROPs would imporve the performance.

You mean 'Yes', right? Cause that's pretty much what i was saying in my brakets, that just adding ROPs won't just make the chip faster unless you also change other aspects of the architecture :smile:
 
london-boy said:
You mean 'Yes', right? Cause that's pretty much what i was saying in my brakets, that just adding ROPs won't just make the chip faster unless you also change other aspects of the architecture :smile:

Semantics, but you're saying sometimes it'll be faster, but not always, dave's saying basically never.
 
Nemo80 said:
Just a side note, but is there any NV Card with 8 ROPs and a 256bit memory interface? Because that's what RSX seems to be (2x 128bit = 2*64Bit GDDR3, 2*64Bit XDR). Would be a totally different configuration for NV then...
7800 GS

BenSkywalker said:
I have repeatedly talked about the amount of ROPs all else being equal- he seems to be trying to argue the point about ROPs that are attached to an entirely different configuration of chip(which I had mentioned that Dave was talking about- a different configuration then what we have seen).
Even if RSX looks very similar to G70/G71 it will still be an entirely different configuration to any chip we've seen before as it is married to a 128-bit bus, which demands that the available bandwidths be taken into account.

BenSkywalker said:
So a ROP that is paired with a chip that IS NOT EQUAL in configuration will perform differently? A ROP with a NON EQUAL sampling configuration will not perform the way I have been saying one that IS EQUAL will in relative terms? Is that what you are saying?
I've no idea what you are saying there, Ben! But simply put, if you take a G71, remove half the bandwidth, then you are not likely to see any difference in performance between having 8 ROPs available or 16.
 
Dave Baumann said:
Even if RSX looks very similar to G70/G71 it will still be an entirely different configuration to any chip we've seen before as it is married to a 128-bit bus, which demands that the available bandwidths be taken into account.

But isnt it just the case that two of the 4x64bit channels have been routed to go via XDR, instead of GDDR3? That wouldn't make that much of a different configuration ...
 
GDDR memory is going to be the primary target for for sample output (although I suspect this is configurable, just not likely to happen).
 
:???:

So, we've been hearing over and over about how the 128bit bus will limit certain things and so forth, but we haven't really heard how nVidia plans to mitigate this. I mean, they're a bunch of smart guys, so they're obviously aware of the bandwidth issues and the need to keep this from being a problem. Maybe it's all the bandwidth/latency talk, but I'm starting to think that even with XDR the RSX is still severely capped, since it doesn't seem like it'll get anywhere near as much use as it should... But, again, nVidia is aware of this so anyone have anythoughts on how this will be mitigated?

Man I really have to sit down and re-read a lot of stuff... I feel so out of the loop.

Oh, also, how many transistors does 8 ROPs account for and are there any new estimates for RSX transistor count? Is it still 300 million?

Sorry for being so out of it.
 
RobertR1 said:
Why does nvidia care? They likely designed a chip to Sony's specs and handed it over.


It's really simple. NVIDIA "cares" because their reputation is on the line, they need to provide not only a chip that works well but a whole bunch of libraries to make sure that the chip works well, considering its limitations. If they do a good job, then the possibility of getting future HUGE contracts like this will be greater, and they will grow as a company.
 
Dave Baumann said:
Even if RSX looks very similar to G70/G71 it will still be an entirely different configuration to any chip we've seen before as it is married to a 128-bit bus, which demands that the available bandwidths be taken into account.

That is what I was saying that you were commenting on, you were talking about if they go with a 3TMU:ROP with 8ROPs total configuration then it wouldn't make a difference in the performance from a 1.5TMU:1ROP with 16ROPs total in terms of AF, I don't take issue with that. What I was taking issue with is the talk that increasing the ROPs will increase the performance hit or that taking any of the current configurations how anything outside of the 16ROP offerings would be the fastest, by a wide margin, in AF performance.


I've no idea what you are saying there, Ben! But simply put, if you take a G71, remove half the bandwidth, then you are not likely to see any difference in performance between having 8 ROPs available or 16.

You can make certain that your bandwidth is utilized to its full potential(isn't the RSX async?), not that it would make much of a differnce but I'm not sure how much of a transistor budget leaving them in place would take. Look at the GS- 16ROPs with 8TMUs- it isn't like Sony is known for picking a configuration that most would consider rational. Leaving the 16ROPs could help them out in certain instances- even with a 128bit bus- although if developers would ever be interested in exploiting that or not is something else entirely.
 
BenSkywalker said:
Look at the GS- 16ROPs with 8TMUs- it isn't like Sony is known for picking a configuration that most would consider rational.
They were just the first to do it - few years later similar approaches became quite common place (double Zixel fill on NV/ATI parts is really not very different - and it's targeted at even more limited set of problems).

At any rate, if cost wasn't an issue GS would have shipped with 8MB of memory, and while cutting ROPs out isn't as big a cost saving, I gather it outweighs the tiny subset of problems where 16ROPs would actually benefit performance of RSX.
And besides, if they had that transistor budget to spare, they should spend it on something actually useful - say, an extra PS quad.
 
How can AF be implemented effectively on consoles/RSX?

Saints Rows is the first game whit full AF- 8X? . The graphs are exceptional. Framerate very bad, 10 - 25 fps, almost always below 25 fps.

And sorry for my bad english....
 
Back
Top