How can AF be implemented effectively on consoles/RSX?

Fafalada said:
There is a number of things on GS that cost a couple of cycles that typically take hundreds on other chips, I don't see how any consumer chip currently available could do 1:1 GS emulation at full speed.
Either you need some kind of workaround for GS friendly stuff (which - if possible, would involve some kind of dynamic recompilation of display lists) - or you just accept that you will be incompatible with some stuff (this might be an option if PS3 was MS machine).
In either case - matching stuff like number of pixel pipelines is obviously irellevant.
Very interesting. Care to elaborate? Are you talking about renderstate changes, and that recompilation of display lists is necessary to group primitives together?
Anyway, in normal rendering 8ROPs@550mhz has roughly 5x higher fill then GS, if emulation isn't possible with that, I don't see how 16 would make a difference.
I thought PS2 was 1.2 GPix/s w/ single texturing and 2.4 GPix/s w/o. Peak fillrate is more than half that of 8ROPs@550mhz. Although I still agree with your conclusion because it's near impossible for PS3 to do anything faster with 16 ROPs as opposed to 8 anyway.
 
kyleb said:
Note the difference between not needing and not being an option.
If it's not needed, whether it is an option or not is totally unimportant and is only brought up by the needlessly pedantic. It's not an issue.
 
Mintmaster said:
Because on most hardware and most workloads, regular trilinear filtering will benefit more from increased bandwidth than anotropic filtering.

ok, i see now, i originally though you were referring to increasing the performance hit of aniso vis-a-vis aniso at limited bw, but you actually meant in relation to isotropic.

Now, I suppose there are some math-heavy cases where a decoupled TMU wouldn't take more cycles per pixel, and thus BW per clock would increase. But because you're math heavy, BW shouldn't really matter then.

actually they don't have to be so heavy. if you have a simple shader doing aniso and another simple one doing trilinear you'd experience increased bw requirements in the aniso case as soon as both shaders take sufficiently similar clocks to execute (ideally same clocks)

Since you're using multiple isotropic units, the number of pixels using those isotropic units simultaneously is reduced. In your system, more BW per fragment, but fewer fragments per clock. It's exactly the same as one isotropic unit per pixel being used over multiple cycles.

not necessarily. there's nothing wrong with a unified shader architecture where you have N universal shader units and N decoupled TMU units. if at a given moment you have 50:50 fragment vs vertex shaders, and the latter do no use any TMUs you can end up with 2 TMUs per fragment ratio.
 
Imbalanced how? Software used that raw fill to good advantage/effect - imbalanced would imply something else in the design made it impossible to utilize it.

Imbalanced doesn't imply that somethign makes it impossible to utilize first off, second I wrote "by typical standards" and you were the single person I had in mind when I wrote that :)

I recall very well you figuring out how to pull off Dot3 on the PS2 with what should have been decent performance when pretty much everyone else said it was a pipe dream. I recall a lot of the different approaches you figured out how to do on the PS2 hardware- but those certainly weren't by the typical standards(ie- check for support- enable).

So I take it you consider Xenos to be crippled as well?

Unless they made some major changes to the RSX reducing the ROPs to 8 would indicate a major reduction to the ALUs also- seriously hurting shader performance. That is a side comment however- yes I consider Xenos crippled. The lack of anisotropic filtering gives wonderful blurry textures and bad aliasing all at the same time- that is getting old very quickly. I have a small section of the screen in a lot of games that looks tollerable- the rest is poor.

I couldn't disagree more - but then what do I know about internals of Sony consoles :p

Could you please explain the 2,560bit bus? :)
 
BenSkywalker said:
Unless they made some major changes to the RSX reducing the ROPs to 8 would indicate a major reduction to the ALUs also- seriously hurting shader performance.
NV4x/G7x ROPs are not directly coupled to the shader quads so 8 ROPs doesn't equate to 3 shader quads - hypotetically there could just be 4 ROPs but still 6 shader quads, in fact 7800 GS AGP has 8 ROP's and either 4 or 5 shader quads (if forget which, but its a different configuration from the any of the other parts based on G70). Sony have already publically stated RSX to be "NV47" based (i.e. G70) with 24 texture units - the number of texture units are "hard" linked to the number of shader pipes in these designs so thats already confirmation of 6 shader quads.

That is a side comment however- yes I consider Xenos crippled. The lack of anisotropic filtering gives wonderful blurry textures and bad aliasing all at the same time- that is getting old very quickly. I have a small section of the screen in a lot of games that looks tollerable- the rest is poor.
You'll need to talk to the software developers there.
 
Sony have already publically stated RSX to be "NV47" based (i.e. G70) with 24 texture units - the number of texture units are "hard" linked to the number of shader pipes in these designs so thats already confirmation of 6 shader quads.

3 TMUs per ROP? I thought nobody liked that setup last time it was done(R100 IIRC).

Edit- If they do have 3 TMUs per ROP then that should give them an advantage when talking about AF compared to Xenos.
 
Last edited by a moderator:
BenSkywalker said:
Could you please explain the 2,560bit bus? :)

What 2,560 bits bus ? I just see a collection of 16x64 bits busses for reads, 16x64 bits busses for writes and 16x32 bits busses for texture reads (4 MB of e-DRAM organized in 16x2 Mbits DRAM macros) and we have 16 Pixel Pipelines/Pixel Engines... all normal to me ;).
 
Last edited by a moderator:
Inane_Dork said:
If it's not needed, whether it is an option or not is totally unimportant and is only brought up by the needlessly pedantic. It's not an issue.
It is an option. The question was asked and that is the answer, and that exchange was by no means intended to offend you.
 
Mintmaster said:
Very interesting. Care to elaborate? Are you talking about renderstate changes, and that recompilation of display lists is necessary to group primitives together?
Renderstates are a big part of it - GS has two contexts to start with (so first renderstate change is 'free'), but even without that, changing all states (rendertarget, texture, alpha/Z states etc.) will run you around 9 cycles from what I remember. Adding 3clocks for attributes of a textured primitive, you could theoretically change ALL render states on Every primitive and maintain ~12Mpoly/sec (which is very high end for the last generation, most games were much lower then that, regardless of platform).
And then you also have things like flushing texture cache on per primitive basis, addressing modes magic, etc.

I thought PS2 was 1.2 GPix/s w/ single texturing and 2.4 GPix/s w/o. Peak fillrate is more than half that of 8ROPs@550mhz.
Yea an oopsie on my part, I meant around 4x. No texture fill is something of a special case, it's hard to judge its impact in an average game (in some it can be really big, some might almost never use it). If emu was smart enough to use Zixel fill in some places (where GS color fills get used for shadows), it could help too.

BenSkywalker said:
Could you please explain the 2,560bit bus?
That's just collection of buses to Page Buffer caches, the eDram->cache was actually 8192bits :p
But anyway - like Xenos eDram, it was designed to be exactly enough to serve bandwith needs of the pixel pipelines without starving them.
Which is what I meant about balance - PS2 components aren't just thrown together - the interfaces connecting them are the way they are for good reasons. Balancing (mostly for cost reasons) was very present in every PS design, just like it is in every other console.
If anything was out of balance, it was perhaps their design ambitions :p
 
Last edited by a moderator:
BenSkywalker said:
3 TMUs per ROP? I thought nobody liked that setup last time it was done(R100 IIRC).
At the time, no game other than 3DMark used more than 2 texture accesses per pixel (you can blame the ubiquity of GeForce and Geforce2 GPUs for that). Also, if you used 4 or 5 textures, fillrate was halved due to multipassing, but on RSX each TMU is working on a different pixel (first done with R300), so fillrate declines more gracefully (e.g. 4 tex is 6 pix/clk, 5 tex is 4.8 pix/clk, etc.

Now we're in the shader era where RSX's ability to use a TMU to do most pixel shader ops makes it very useful. As long as your shader doesn't theoretically run at faster than 8 pixels per clock, it doesn't matter what the TMU:ROP ratio is nowadays.

Finally, according to 7600GT tests, the 128-bit bus couldn't use more than 5 ROP's for the most common scenario (color+z) stripped of all limitations (no shader, no textures, no blending, simple fullscreen quad), so 8 is not a problem. FlexIO is not a factor because you can't split your framebuffer traffic unless you put your colour buffer in XDR, which would reduce BW and performance. The only time I can imagine 16 ROPs being useful is with z/stencil-only rendering. But even here, the 4-ROP 6600GT is good for 40-60 fps at 2MPix resolutions in the stencil heavy game Doom3 without AA. With 4xAA, the 8-ROP 7600GT gives you 60fps at 2MPix.

I don't see this being a problem or limitation at all given PS3's specs. Ignoring bandwidth, 8 ROPs matches Xenos in all scenarios except 4xAA, too.

EDIT:
That is a side comment however- yes I consider Xenos crippled. The lack of anisotropic filtering gives wonderful blurry textures and bad aliasing all at the same time
:rolleyes:
 
Last edited by a moderator:
True, I never avoid the confusion that a 64+64+32 bits interface to each of the 16x2 Mbits DRAM Macros does not make sense (only 160 bits per DRAM Macro while each DRAM Macro in the PSP can transfer 256 bits: 128 bits Read + 128 bits Write * 2x1 MB DRAM Macro = 512 bits Interface to the 2 MB of VRAM... btw...

http://www.google.it/url?sa=U&start...res1_bw.pdf&e=15235&ei=6dqGRL-hFJzK2wKYtqTlDg

http://www.hotchips.org/archives/hc16/3_Tue/8_HC16_Sess8_Pres1_bw.pdf

It covers some slides about the DRAM Interface that I did not see before in publically available documents or from other presentations [namely the Read-Modify-Write and the graph with the PlayStation 2 comment])...

It makes sense when you see it from the Pixel Engine side though, but they are connected to Page Buffers and the GS docs does state that if you break the Page Buffer using textures that do not fit their limit you have additional latency as the buffer has to be re-filled and that happens at 150 GB/s which is just 8192 bits * 150 MHz which is also consistent with the page buffer size).

I am quite sure though that each DRAM MAcro is 2 Mbits and that are 16 of them and a 512 bits interface to each Macro is not really impossible to think about (256 bits reads + 256 bits writes), I hope my memory is not failing me, but I remember reading some technical IEEE or HotChips related paper about the GS and also discussing it here or on GAF about it with various people and learning about the size and number of each DRAM Macros and the way they were connected (do I remember well if I think about a cross-bar switch for the DRAM Macros array ?).
 
Last edited by a moderator:
At the time, no game other than 3DMark used more than 2 texture accesses per pixel

Actually, there were several that utilized more then two. Giants, Sacrifice and Evolva spring quickly to mind. It didn't seem to help out the R100 much at all though IIRC.

Finally, according to 7600GT tests, the 128-bit bus couldn't use more than 5 ROP's for the most common scenario (color+z) stripped of all limitations (no shader, no textures, no blending, simple fullscreen quad)

Six ROPs based on those numbers. They peak when having 2x MSAA enabled too- no matter how perfect the compression implementation you aren't going to gain bandwidth enabling MSAA.

As far as the filtering on Xenos- what platform released this millenium has more texture aliasing then the 360? I own them all, and I certainly haven't seen one that comes close. You take a title like 'The Outfit'(using that as I was just playing for a few hours)- it has serious texture aliasing issues for the first couple of mip levels, then goes blurry all while using bilinear filtering. If the 360 is supposed to be so developer friendly- why are they having these issues?

Which is what I meant about balance - PS2 components aren't just thrown together - the interfaces connecting them are the way they are for good reasons. Balancing (mostly for cost reasons) was very present in every PS design, just like it is in every other console.

You have an interesting take on balance Faf, not a critique at all- but you certainly don't have the same perspective that a lot of others do. Absolutely no worries about FB bandwidth, and multi pass bilinear...... and that doesn't seem unbalanced to you?
 
BenSkywalker said:
As far as the filtering on Xenos- what platform released this millenium has more texture aliasing then the 360? I own them all, and I certainly haven't seen one that comes close. You take a title like 'The Outfit'(using that as I was just playing for a few hours)- it has serious texture aliasing issues for the first couple of mip levels, then goes blurry all while using bilinear filtering. If the 360 is supposed to be so developer friendly- why are they having these issues?

GCN and PS2.

The games on both platforms can vary in this regards but I have seen enough poor filtering on both that I don't think it is even a contest. I have not played enough Xbox1 titles, but in MP some games on the Xbox1 can be very poor. And of course the low resolution of these consoles hides a lot as well..
 
The Outfit's framerate gets crappy on occasion even with the crappy texture filtering, turning on AF would have only made that more so. And yeah, plenty of games had crappy texture filtering last gen as well.
 
I thought it obvious, but apparently it needs to be spelled out.

The X360 is getting cross-generational games like the PS2 got when it launched. It took GT3 to really show off the PS2 (ala FNR3 & GRAW), but it's not like the ho-hum PS1 ports stopped overnight.

The only real mistake here is to take the X360's first games as indicative of what the platform's max is.
 
The games on both platforms can vary in this regards but I have seen enough poor filtering

First off, the PS2 was not released this millenium :)

After that the GCN and XBox have horrific texture filtering without a doubt- what they lack is the amount of texture aliasing that the 360 exhibits in numerous titles. I can understand that they decided that the 1% of screen real estate taken up by edges was far more important then the rest of the screen covered by textures- AA being more important then AF- that was a choice they certainly had the right to make. What I am very bothered by is how horrific the aliasing is on the first couple of mip levels and then it blurs out. Perhaps this a base blend filtering issue? I hope not as that can't be fixed, and some games are much better off then others(Oblivion doesn't have nearly the aliasing as an example- not that it is great but it isn't nearly as bad).

The X360 is getting cross-generational games like the PS2 got when it launched.

Overwhelmingly those games seem to lack the problems of the 360 native titles. Burnout Revenge as an example doesn't have the kind of issues with filtering that the 360 native titles have- likely due to their native platforms(although it has what is likely the poorest HDR implementation I have ever seen, but I dgress). I would not utilize last gen ports as examples of what I'm talking about.
 
BenSkywalker said:
First off, the PS2 was not released this millenium :)

PS2 Japanese launch = March 4th, 2000
PS2 American launch = October 26th, 2000
 
As you pointed out Acert, the PS2 was not launched in this millennium. 2001 is the start of the millennium.

As far as texture aliasing, there were a lot of games at E3 that didn't seem that bad and actually a step up. May not have have anisotropic enabled, but I guess they had the correct mip mapping done.
 
Acert93 said:
PS2 Japanese launch = March 4th, 2000
PS2 American launch = October 26th, 2000

year 2000 was the last year of the past millennium. because years count started from 1AD, not 0AD. so the year number is actually an index, 1st year, 2nd year, etc, etc, 2000th year - the last index of the 2nd millennium.
 
Back
Top