What is the peak theoretical fill rate of RSX and Xenos at varying levels of AA?

Rockster said:
These figures really don't tell the whole story because each application is going to access the framebuffer differently and blends, access patterns, etc. are going to sap available bandwidth even more.

Of course, it was just an academic query. I know in any real application there'll be many kinds of overhead attached to every pixel that goes out

Fox5 said:
Wait, are those G70 numbers right? I thought the fillrate of G70 was like 12Gigapixels, and I didn't know it took a hit from AA.

'G70' is just an architecture's moniker, different implementations with varying ROP numbers will have different fillrate numbers, obviously. IIRC, there isn't a hit with 2xAA.

Fox5 said:
Even the xbox used that fact to advertise its fillrate at 4 gigapixels.

My memory of Xbox's specs are hazy, but I believe that's just the number of samples when using AA, AFAIK, not pixels.

Fox5 said:
RSX sounds like it's going to get destroyed by Xenos if it doesn't have a massive fillrate advantage

Whether RSX had 16 or 8 ROPs, it would not matter. We've known for a long time now how much bandwidth is available, which is where RSX's fillrate is bound, not by the number of ROPs. Fillrate has not been an expected RSX advantage!
 
Last edited by a moderator:
Fox5 said:
Even the xbox used that fact to advertise its fillrate at 4 gigapixels.

original (and misleading) Xbox XGPU / X-CHIP's (assumed to be GigaPixel GP4 or Nvidia NV25) pixel fillrate spec when it was going to be clocked at 300 MHz: 4.8 billion pixels aka 4.8 gigapixels aka 4800 Mpixels

this was touted as the pixel fillrate, but actually turned out to be the AA sample rate, IIRC.


NV2A GPU actual pixel fillrate *if* it had been clocked at 300 MHz:
1.2 billion pixels aka 1.2 gigapixels aka 1200 Mpixels

NV2A GPU actual pixel fillrate *if* it had been clocked at 250 MHz:
1 billion pixels aka 1 gigapixel aka 1000 Mpixels


final NV2A GPU actual pixel fillrate clocked at 233 MHz in the released Xbox:
932 Mpixels aka 0.932 gigapixels aka 0.932 billion pixels


since NV2A has twice as many TMUs as it does pixel-pipelines, naturally the texel fillrate is double at any given clockspeed.
at 233 MHz that gives NV2A 1864 Mtexels aka 1.864 billion texels aka 1.864 gigatexels



lol, Xbox's original announced but misleading & untrue pixel fillrate as of Feb-March 2000
(for either the GigaPixel GP4 GPU or Nvidia GPU which was assumed to be NV25)
was 4.8 billion pixels/sec. that is higher than the actual pixel filllrate of either Xbox 360
or PlayStation3.


ahhh marketing figures. so easily twisted.
 
Last edited by a moderator:
Megadrive1988 said:
original (and misleading) Xbox XGPU / X-CHIP's (assumed to be GigaPixel GP4 or Nvidia NV25) pixel fillrate spec when it was going to be clocked at 300 MHz: 4.8 billion pixels aka 4.8 gigapixels aka 4800 Mpixels

this was touted as the pixel fillrate, but actually turned out to be the AA sample rate, IIRC.


NV2A GPU actual pixel fillrate *if* it had been clocked at 300 MHz:
1.2 billion pixels aka 1.2 gigapixels aka 1200 Mpixels

NV2A GPU actual pixel fillrate *if* it had been clocked at 250 MHz:
1 billion pixels aka 1 gigapixel aka 1000 Mpixels


final NV2A GPU actual pixel fillrate clocked at 233 MHz in the released Xbox:
932 Mpixels aka 0.932 gigapixels aka 0.932 billion pixels


since NV2A has twice as many TMUs as it does pixel-pipelines, naturally the texel fillrate is double at any given clockspeed.
at 233 MHz that gives NV2A 1864 Mtexels aka 1.864 billion texels aka 1.864 gigatexels



lol, Xbox's original announced but misleading & untrue pixel fillrate as of Feb-March 2000
(for either the GigaPixel GP4 GPU or Nvidia GPU which was assumed to be NV25)
was 4.8 billion pixels/sec. that is higher than the actual pixel filllrate of either Xbox 360
or PlayStation3.


ahhh marketing figures. so easily twisted.


I wonder if dev's could have made use of that fillrate had it been that high. (which is about equivalent to a 12-way VSA-100 system) Interesting to see that fillrate really wasn't king, if we've only improved 4-fold since 2001. 2-fold if you count ps2's 2 gigapixel fillrate.
 
Megadrive1988 said:
original (and misleading) Xbox XGPU / X-CHIP's (assumed to be GigaPixel GP4 or Nvidia NV25) pixel fillrate spec when it was going to be clocked at 300 MHz: 4.8 billion pixels aka 4.8 gigapixels aka 4800 Mpixels

this was touted as the pixel fillrate, but actually turned out to be the AA sample rate, IIRC.

Gotta love PR. If we are counting sample rate Xenos is 16 Gigasamples/s.

Titanio said:
Whether RSX had 16 or 8 ROPs, it would not matter. We've known for a long time now how much bandwidth is available, which is where RSX's fillrate is bound, not by the number of ROPs. Fillrate has not been an expected RSX advantage!

I believe some of the PS3 devs mentioned moving some fillrate intensive tasks (like alpha blends if I remember...) to CELL because the bandwidth hit would be negligable.

So just as comparing fillrate numbers (RSX 4.4B/s; Xenos 4.0B/s) is misleading due to architectural and bandwidth concerns, in relation to some game designs the RSX fillrate may be misleading in regards to the end result on screen. Of course that puts us back onto dev skill, budget, workflow, etc as the determining factor of the quality of software -- and IMO where the real difference in software quality (even graphically) will be.

The tech is fun, but whoever has the best developers in the end will have the best looking games as long as the consoles are in the same general ballpark. No substitute for good art and smart use of technology (not just better tech).
 
Fox5 said:
I wonder if dev's could have made use of that fillrate had it been that high. (which is about equivalent to a 12-way VSA-100 system) Interesting to see that fillrate really wasn't king, if we've only improved 4-fold since 2001. 2-fold if you count ps2's 2 gigapixel fillrate.

Fillrate was said to be one of the biggest bottlenecks on the Xbox and one of the PS2's biggest advantages. But even if NV2A did have a 4Gigapixel/s fillrate, it still would not have had the memory system to sustain such--at least without also adding eDRAM or some crazy huge memory bus.

But every generation changes. What is important one generation is less so the next. It really depends on game design and where the bottleneck and workloads shift in games. And ultimately it can vary game-to-game. e.g. The PC GPUs decoupled ROPs and they appear to have stopped, at least for now, at 16ROPs. The reason being that any more is overkill considering the bandwidth limitations and how resolutions increases have slowed down. More is not always better. And that is what we see with RSX. With the memory architecture on the PS3 16 ROPs just is not necessary in most situations because it lacks the bandwidth to sustain the fillrate of 8ROPs in most situations, let alone 16.
 
Acert said:
Fillrate was said to be one of the biggest bottlenecks on the Xbox and one of the PS2's biggest advantages.
Actually for most part, the conzensus was that XBox was faster in vast majority of rendering operations, and frankly I've never seen a proper discussion on bandwith implications for NV2a.

Not gonna debate why this kind of discussion became a big issue this time around(and not before - even though raw fillrate is far less important this generation) - but perhaps of interesting note - NV2a had 2 times less relative bandwith then RSX, to split amongst the same tasks
 
Fafalada, what exactly do you mean when you say 2x less "relative" BW? Relative to what? I do remember some talk about BW for NV2a, but the Celeron could only consume 1GB/s peak anyway, and the remaining BW wasn't that much less than on a GF3/GF4.

For PS3, one thing I'd like to see is FlexIO usage. You have performance counters on PS3, right? During gameplay, how does the data transfer over FlexIO compare to the data transfer over GDDR3 and XDR?
 
I got one comment:

Lets assume for one moment the inquier's rumor that RSX is acting as southbridge is correct, and, if i understood it totally correct, the graphics ram is sitting on the NORTHBRIDGE.

http://www.theinquirer.net/?article=32159

Exept this could prove exacly how weird those dudes in the inquirer is, but if this is true? then, im not shure if its comparable to a pc graphics card after all.
 
Fafalada said:
Actually for most part, the conzensus was that XBox was faster in vast majority of rendering operations

Yes, but it is not like the vast minority of rendering operations could not still provide quite nice alternatives. I doubt that render state switching and texture cache flushing were as fast as on the GS or that render-to-texture operations had the same kind of impact (on NV2A you had to render to UMA and then read-back and fill the texture cache... not a disastrous impact if the texture is heavily re-used for other objects as you can amortize the cost still, but it is not easy to find quick alternatives... example MGS 2 port to Xbox). Still, I leave to you, nAo, ERP, DeanoC, DeanA, Katsura, archie, etc... to correct and puts the dots on the i's and to cross the t's :D.
 
kimg said:
I got one comment:

Lets assume for one moment the inquier's rumor that RSX is acting as southbridge is correct, and, if i understood it totally correct, the graphics ram is sitting on the NORTHBRIDGE.

http://www.theinquirer.net/?article=32159

Exept this could prove exacly how weird those dudes in the inquirer is, but if this is true? then, im not shure if its comparable to a pc graphics card after all.

Total bollocks - not that this should surprise anyone.

The southbridge acts as a southbridge and the graphics chip does the graphics. The graphics ram is connected - shock horror - to the graphics chip.
 
MrWibble said:
Total bollocks - not that this should surprise anyone.

The southbridge acts as a southbridge and the graphics chip does the graphics. The graphics ram is connected - shock horror - to the graphics chip.
Why can't RSX act as a southbridge? Gamecube contains Gekko and Flipper. Flipper is not just the graphics chip, but also contains the sound DSP, memory controller, and all manner of I/O (northbridge and southbridge functionality).

Maybe I'm just an efficiency nut, but it makes perfect sense to me to use this design approach. It will save space on the PCB, require fewer traces between chips, and possibly reduce the number of layers the PCB must be.

Since PS3 has two separate pools of memory, the best combination would be Cell + northbride and RSX + southbridge. From what I understand, Cell has an on-chip XDR memory controller, and since there is no need for PCI or AGP support, the northbridge is essentially already integrated. Seeing as nVidia has plenty of experience integrating northbridges and GPUs, I can't see them having a tough time also putting in some southbridge functionality.
 
OtakingGX said:
Why can't RSX act as a southbridge? Gamecube contains Gekko and Flipper. Flipper is not just the graphics chip, but also contains the sound DSP, memory controller, and all manner of I/O (northbridge and southbridge functionality).

Maybe I'm just an efficiency nut, but it makes perfect sense to me to use this design approach. It will save space on the PCB, require fewer traces between chips, and possibly reduce the number of layers the PCB must be.

Since PS3 has two separate pools of memory, the best combination would be Cell + northbride and RSX + southbridge. From what I understand, Cell has an on-chip XDR memory controller, and since there is no need for PCI or AGP support, the northbridge is essentially already integrated. Seeing as nVidia has plenty of experience integrating northbridges and GPUs, I can't see them having a tough time also putting in some southbridge functionality.
It's not so much that it couldn't be the southbridge, but that it was never intended to be.
kaigai02l0bd.gif

There was always a southbridge in the design, so news that all of a sudden RSX was going to take that duty would be much of a surprise and last minute change.
 
I'm assuming USB and the controller I/O is also handled by the southbridge. Even with that, 5 GB/s total bandwidth between Cell and the southbridge seems like total overkill. I think it's design decisions like these that make the PS3 so unnecesarily expensive.

Even in the event you could maximize data transfer to every possible device connected to the southbridge you wouldn't come close the 5 GB/s. Gigabit ethernet (1000 Mbps) + Blu-ray 2x (73 Mbps) + 6 x USB 2.0 (240 Mbps) + SATA-II (300 Mbps) + 802.11g (54 Mbps) = 2867 Mbps. That's about 360 MB/s of bandwidth, and the Cell <-> southbridge bandwidth is 14 times that.
 
Acert93 said:
The implimentation of the X1800 and Xenos architectures make comparing their fillrates misleading, at best. In real world scenarios in games Xenos should have more fillrate than an X1800 with a 256bit bus to GDDR3 when 4xMSAA is applied.

Then why did ATI say the Toy shop demo would be faster on Xenos at lower res becasuse of higher framebuffer bandwidth and more shaders power but slower when higher resolution is used?

Surely the higher resolution is putting more strain on the fill rate?
 
Even with that, 5 GB/s total bandwidth between Cell and the southbridge seems like total overkill.
It is massively overkill, but I believe that's just the speed of 1 FlexIO lane. It makes sense considering the transfer rates of everything else between RSX and Cell. The missing 5 GB/sec in the one direction suggests that there are 8 lanes in the controller and 1 is being given up to the Southbridge.

It's not so much that it couldn't be the southbridge, but that it was never intended to be.
There was always a southbridge in the design, so news that all of a sudden RSX was going to take that duty would be much of a surprise and last minute change.
Who says? Hiroshige Goto? He's speculating from slides and creating diagrams from that speculation. That's basically his job. The point that I believe Wibble was making was that whether it's a discrete chip or not, it's not doing anything other than being a Southbridge any more than the GPU does something other than be a GPU.
 
Last edited by a moderator:
pjbliverpool said:
Then why did ATI say the Toy shop demo would be faster on Xenos at lower res becasuse of higher framebuffer bandwidth and more shaders power but slower when higher resolution is used?

Surely the higher resolution is putting more strain on the fill rate?

Higher resolutions is going to put more strain on the fillrate of both GPUs. I am not sure how you arrive at the conclusion that fillrate is the bottleneck ATI had in mind in that comments though. Resolution increases affect more than fillrate and since the rep did not mention it being fillrate-bound and the architectures indicates this probably would not be the case, I am not inclined to think it is such either. If I had to guess it would seem to me the ATI comments about performance degrading at much higher resolutions would be related to the small framebuffer size on the Xbox 360. What resolutions he had in mind (higher than 16x12? 21x15? 10x7?) we don't know, ditto whether he had AA in mind as well and the difference in FP16 blending and filtering performance on the X1800 versus F10 on Xenos (and whether that is a bottleneck to memory or ROPs at all), etc And we don't have any numbers, even from ATI, the performance hit from 3 tiles to 4, 5, 6, etc (they claim none at 2 and 1-5% typically for 3). Ultimately the X1800's memory bandwidth (and realized fillrate) will be contending with buffers, texture and geometry assets and I believe the fillrate takes a hit with 4xMSAA (but I could be wrong). Worse case scenario Xenos will always have enough bandwidth for 4Gigapixel/s fillrate. Anyhow, the only absolutes the ATI gave were bandwidth and shader performance. Guessing what bottleneck he had in mind and what resolutions is kind of difficult.
 
Last edited by a moderator:
ShootMyMonkey said:
Who says? Hiroshige Goto? He's speculating from slides and creating diagrams from that speculation. That's basically his job. The point that I believe Wibble was making was that whether it's a discrete chip or not, it's not doing anything other than being a Southbridge any more than the GPU does something other than be a GPU.
No, no. I chose the picture becasue it's nice and clean, unlike the garish ones they showed at E3:
1239of.jpg


I had always been under the impression that the PS3 would have a discrete southbridge which is why this Inquirer talk (which in defence, did come from the same article which had a few factual leaks in it) piqued my interest. I asked in the other thread we had on the Inquirer articles at the time, but nobody answered. So I was just taking MrWibble's response as the answer I asked a few weeks ago.
 
Mintmaster said:
Fafalada, what exactly do you mean when you say 2x less "relative" BW? Relative to what?
4x less fillrate, 8x less bandwith.

I do remember some talk about BW for NV2a, but the Celeron could only consume 1GB/s peak anyway, and the remaining BW wasn't that much less than on a GF3/GF4.
GF3s sold at the time had nearly 8GB/sec local mem bandwith, but anyway that's been something of a precedent over last 2 generations and still holds in this new one (PC GPUs having more local bandwith then consoles have for entire system).

For PS3, one thing I'd like to see is FlexIO usage.
Well you can hope Sony does some semi-public disclosing of such benchmarks like they did with PS2. I do think it'll be quite game/context dependant what and where bus gets used - if last two Playstations are any indication.

Panajev said:
I doubt that render state switching and texture cache flushing were as fast as on the GS or that render-to-texture operations had the same kind of impact
Of course not - running GS optimized render lists will break down even on much faster GPUs then NV2a was - but the reverse was also true.
 
Mmmkay said:
I had always been under the impression that the PS3 would have a discrete southbridge which is why this Inquirer talk (which in defence, did come from the same article which had a few factual leaks in it) piqued my interest. I asked in the other thread we had on the Inquirer articles at the time, but nobody answered. So I was just taking MrWibble's response as the answer I asked a few weeks ago.

Disclaimer:

I was just repeating the information from these publically shown presentations - I haven't looked inside my latest devkit or anything... I'd probably get shouted at for attacking one with a screwdriver.
 
Back
Top