What is the peak theoretical fill rate of RSX and Xenos at varying levels of AA?

MBDF

Newcomer
What is the peak theoretical fillrate (color+z) (texel) (texture) for a 7900 GT with 8 ROP's at no AA, 2xAA, and 4xAA? And secondly, how does it compare to Xenos with it's EDRAM rendering similiarily?

I mainly interested in peak theoretical figures... I realize that they will never be achieved in game situations... but it's something usefull to know.

I have heard that the Xenos with it's daughter die is much more capable of maintaining it's 4 gigapixel fillrate at all levels of AA... is this true?

I've also heard that the RSX has more available fillrate at 720p with no AA, but less with 4xAA... is that true??
 
Oh, and does the drop from 16 to 8 ROP mean that there is less available fillrate with 8 ROP when Anti-aliasing as opposed to 16 ROP?
 
To answer the OP's question, the paper numbers for Xenos is 4Gpixels/sec regardless of AA level. On a 8-ROP G70 it'd be 4.4Gpixels/sec with no AA and I think with 2xAA, and 2.2Gpixels/sec with 4xAA. A search should yield this info.
 
Last edited by a moderator:
With bandwidths taken into account RSX is probably something very similar to this:

7600 GT FSAA Fill-Rate (M Pixel/s)
AA......Color Fill......Z Fill........Colour + Z
0x.......4333.6........9579.7......2866.4
2x.......4290.8........5297.4......2982.2
4x.......2239.8........2655.0......1522.5

Taken from here: http://www.beyond3d.com/previews/nvidia/g73/index.php?p=05#fill

Nothing for me to really compare Xenos to.
 
Last edited by a moderator:
Ostepop said:
Compare it to a X1800?

No. Architecturally they are too different to directly compare; i.e. the X1800's fillrate and how it relates to its architecture won't give you any insights into Xenos. The X1800 has 16 ROPs (Xenos has 8); Xenos has eDRAM from the frame buffers and the X1800 uses GDDR3 for the frame buffers; Xenos' fillrate is not cut with MSAA applies, etc

The comparison breaks down in real world use as well. Xenos has no fillrate penalty up to 4xMSAA (i.e. it can do 16 gigasamples/s with 4xMSAA). i.e. 4Gigapixel/s is available with our without MSAA. I believe the X1800's fillrate is cut when 4x MSAA is used.

And on bandwidth, the eDRAM is designed with a peak bandwidth that in worse case scenarios there is always 4 Gigapixel/s fillrate. The X1800 uses GDDR3 and will run into stalls and bottlenecks in fillrate intensive scenarios. The X1800 does not even have enough bandwidth to feed the theoretical peak fillrate of 16 ROPs @ 650MHz.

The implimentation of the X1800 and Xenos architectures make comparing their fillrates misleading, at best. In real world scenarios in games Xenos should have more fillrate than an X1800 with a 256bit bus to GDDR3 when 4xMSAA is applied.
 
Last edited by a moderator:
Rockster said:
With bandwidths taken into account RSX is probably something very similar to this:

7600 GT FSAA Fill-Rate (M Pixel/s)
AA......Color Fill......Z Fill........Colour + Z
0x.......4333.6........9579.7......2866.4
2x.......4290.8........5297.4......2982.2
4x.......2239.8........2655.0......1522.5

Taken from here: http://www.beyond3d.com/previews/nvidia/g73/index.php?p=05#fill

Nothing for me to really compare Xenos to.

Yeah, but the RSX has 24 pixel shaders, wouldn't it be double? What exactly above does the 8 ROP's affect?
 
Titanio said:
To answer the OP's question, the paper numbers for Xenos is 4Gpixels/sec regardless of AA level. On a 8-ROP G70 it'd be 4.4Gpixels/sec with no AA and I think with 2xAA, and 2.2Gpixels/sec with 4xAA. A search should yield this info.

So, with 16 ROP it would still be at 4.4 with 4xAA applied? Wouldn't that be a good thing?
 
MBDF said:
So, with 16 ROP it would still be at 4.4 with 4xAA applied? Wouldn't that be a good thing?

Not if you don't have the bandwidth to attain the peak.

An analogy: You have a sink that can drain 1 cup of water per second. Regardless of how much water you can get to the drain, it can never dispose of more than 1 cup of water per second. So whether you have a fire hydrant (10 gallons/second) or a garden hose (1 cup/second) piping water into the drain, you will never be able to flush more than 1 cup/second. So in this scenario a garden hose is just as good as a fire hydrant.

While not a direct parallel I think that demonstrates the issue. This is true of many bottlenecks. Not all are created equal of course (e.g. a bottleneck in one area may not make another peice of hardware useless, only useless in certain situations, in which cases developers would need to find other beneficial uses for their projects).
 
MBDF said:
Yeah, but the RSX has 24 pixel shaders, wouldn't it be double? What exactly above does the 8 ROP's affect?


1.) we don't know that RSX has 24 pixel shaders. the amount of pixel shaders that are active within RSX have not been disclosed by SCEI or Nvidia. they're all hush hush about it since GDC when they did not give out the amount of vertex shaders or pixel shaders in RSX. many people assume RSX will have 8 vertex shaders + 24 pixel shaders just like the G70 and G71 PC GPUs. but it's likely that for RSX, some of these units have either been cut out altogether,
or are inactive, to increase yields (produce more chips) and perhaps to reduce power-consumption. IIRC, Ken Kutaragi said the RSX GPU would have built in redundancy like the CELL processor. the PS3 CELL processor has 8 SPEs, but only 7 will ever be active. So I guess we can assume that RSX will NOT have all 8 vertex shaders and all 24 pixel shaders active, even *if* they're all physically there on the RSX.

people in the know (here, and elsewhere) about exactly how many vertex shaders and how many pixel shaders the RSX has, are being quiet. we only recently got confirmation that RSX has 8 ROPs.


2.) pixel shaders do *not* determine the actual pixel fillrate of a GPU. the pixel-pipelines (combined with core clockspeed) do. but what a pixel pipeline is, is becoming more "blurred". they have been detached / decoupled in modern GPUs (from what i understand). they're now often referred to as ROPs (rendered output unit or raster operation). it is the amount of ROPs * (times) the core clockspeed that determines a modern GPU's peak pixel fillrate.

RSX has 8 ROPs. so at 550 MHz, that gives it 4,400 million pixels/sec fillrate, or 4.4 billion.
(8 ROPs x 550 MHz)

Xenos (Xbox 360's GPU) also has 8 ROPs. it's clocked at 500 MHz, that gives it 4,000 million pixels/sec or 4 billion.
(8 ROPs x 500 MHz)

the cool thing about Xenos is, when 4x anti-aliasing is used, the fillrate does not drop.

but on RSX, the fillrate will drop when 4x anti-aliasing is used.

thus, when 4x anti-aliasing is used, Xbox 360 is going to have a higher pixel fillrate than PlayStation3.


also, there are rumors that RSX will not be clocked at 550 MHz. rumor has it that RSX will be clocked lower. however others are speculating it'll be clocked higher. for now, I will assume the announced clockspeed of 550 MHz.

sorry for the long-winded reply.
 
Last edited by a moderator:
Megadrive1988 said:
1.) we don't know that RSX has 24 pixel shaders. the amount of pixel shaders that are active within RSX have not been disclosed by SCEI or Nvidia.

There was a public slide recently from a Sony conference that did disclose 24 Pixel Shaders. Anyhow, RSX has 24 2D texture lookups per clock and performs 384 flops/clock. It also can perform 48 MAD/MUL/DP3 per clock on float3's and 48 scalar per clock (or 24 and 24 special functions). Sounds like the 24 Pixel Shaders in G70 to me ;)

RSX has 8 ROPs. so at 550 MHz, that gives it 4,400 million pixels/sec fillrate, or 4.4 billion.

Memory bandwidth will prevent RSX from reaching 4.4Gigapix/s.

That is one problem with spec sheets in general. They don't tell you about the architecture and the bottlenecks and limitations the architecture place within the workflow. Not that you are doing this MD. Its like the old days of being able to push X number of poly/s. What happens when you want to adding shading, lighting, or a texture to that polygon?
 
Acert93 said:
There was a public slide recently from a Sony conference that did disclose 24 Pixel Shaders. Anyhow, RSX has 24 2D texture lookups per clock and performs 384 flops/clock. It also can perform 48 MAD/MUL/DP3 per clock on float3's and 48 scalar per clock (or 24 and 24 special functions). Sounds like the 24 Pixel Shaders in G70 to me ;)

well, we'll see. it would be highly disappointing if RSX had anything less than
24 pixel shaders. what about the 8 vertex shaders?


Memory bandwidth will prevent RSX from reaching 4.4Gigapix/s.

That is one problem with spec sheets in general. They don't tell you about the architecture and the bottlenecks and limitations the architecture place within the workflow. Not that you are doing this MD. Its like the old days of being able to push X number of poly/s. What happens when you want to adding shading, lighting, or a texture to that polygon?

so, 22.4 GB/sec bandwidth is not enough to support 4.4 billion pixels/sec? so then what happens when 4x AA is turned on and the fillrate falls well below that spec? or won't it matter since 4x AA costs bandwidth itself on RSX....
 
Megadrive1988 said:
also, there are rumors that RSX will not be clocked at 550 MHz. rumor has it that RSX will be clocked lower. however others are speculating it'll be clocked higher. for now, I will assume the announced clockspeed of 550 MHz.

sorry for the long-winded reply.

only rumors...........

Sony dismisses rumours of PlayStation 3 downgrade!

Speaking to our sister site, Eurogamer, a Sony spokesperson has dismissed rumours that the PlayStation 3's hardware specs are to be downgraded as "ridiculous."

A report on website Games Radar claimed that Sony was having trouble fitting all the PS3's components within the console case without risk of overheating. The article also suggested that the Cell processor could run at a lower speed than originally stated.

But Sony spokesperson Jonathan Fargher told Eurogamer: "The PS3 downgrade story is categorically not true.

"Developers have been working with PS3 dev kits for anywhere between eight and 12 months, and to suggest that we'd now take the decision to downgrade the hardware at such a late stage, is, well, ridiculous...

http://www.gamesindustry.biz/content_page.php?aid=17747
 
Rockster said:
With bandwidths taken into account RSX is probably something very similar to this:

7600 GT FSAA Fill-Rate (M Pixel/s)
AA......Color Fill......Z Fill........Colour + Z
0x.......4333.6........9579.7......2866.4
2x.......4290.8........5297.4......2982.2
4x.......2239.8........2655.0......1522.5

Taken from here: http://www.beyond3d.com/previews/nvidia/g73/index.php?p=05#fill

Nothing for me to really compare Xenos to.

sorry...maybe I'm missing something...

Tell me about the graphics chip...
NVIDIA's chip is codenamed RSX. The chip runs at 550MHz and is capable of rendering two 1080p signals simultaneously. It's touted to hit 1.8 TFLOPS of floating point performance and can perform 100 billion shader operations per second, or 136 shader operations per cycle. The RSX uses 128-bit precision for enhanced color definition, making the system capable of High Dynamic Range rendering. Programming-wise, it's based on OpenGL and NVIDIA's CG language.

NVIDIA recently released its GeForce 7900 GTX GPU for the PC, which provides a reasonable real-world approximation of what sort of effects the RSX and PlayStation 3 can handle. The RSX is said to be a step beyond the GeForce 7900 GTX however, making it faster than anything currently available for the PC.

http://ps3.ign.com/articles/636/636848p1.html

maybe you wrote 6 instead of 9....just a mistake....

however the full specs of RSX are not available...so...I think I can just use all these numbers to win something...
 
Bliss said:
sorry...maybe I'm missing something...



http://ps3.ign.com/articles/636/636848p1.html

maybe you wrote 6 instead of 9....just a mistake....

however the full specs of RSX are not available...so...I think I can just use all these numbers to win something...

No, it's an apt comparison (as I see now, thanks all) because the 7600 GT has 8 ROP's as well... and the comparison only concerns fillrate.

But, one more question... So it's true, that 22.4 GB's is not enough, or just enough for 4.4 gigapixels, but no more? Is there some rough way of calculating this?

Thanks again.
 
Last edited by a moderator:
The reason the 7600GT is valid for a fillrate comparison with RSX is because it's an NVidia product (meaning same memory controller, ROP, and compression technology) with a very similar core clock rate (560 vs 550) and equivalent memory configuration (128bit 700Mhz DDR2 for both).

MBDF said:
But, one more question... So it's true, that 22.4 GB's is not enough, or just enough for 4.4 gigapixels, but no more? Is there some rough way of calculating this?

There is. 22.4/8 (bytes per pixel for color+z) yields 2.8GPixels/sec, close to real world result for the 7600GT. But, that doesn't take into account compression rates which will vary depending on the scene and variations in memory efficiency. No exact way to calculate as a result. It's better to measure.
 
Isn't the 4.4Gpixels/sec paper figure just for raw fillrate? Because the benched color fillrate as per those number above comes very close at 0xAA and 2xAA (i.e. ~4.3Gpixels/sec).

Rockstar said:
There is. 22.4/8 (bytes per pixel for color+z) yields 2.8GPixels/sec, close to real world result for the 7600GT. But, that doesn't take into account compression rates which will vary depending on the scene and variations in memory efficiency.

Nor cache, to be very precise about it.
 
Titanio said:
Isn't the 4.4Gpixels/sec paper figure just for raw fillrate? Because the benched color fillrate as per those number above comes very close at 0xAA and 2xAA (i.e. ~4.3Gpixels/sec).

Not sure what you are asking here. Each of the 7600GT's (or RSX) 8 ROPs are capable of writing a 32bit color and a 32bit z+stencil value in the same clock cycle. However, its available bandwidth won't allow it to, so color+z get's ~2.8Gpixels/sec. By writing only a color value and no z (not common in games but interesting for benchmarking) they reduce the bandwith required per pixel from 8 bytes to 4 and can essentially reach its theoretical rate.

The same benchmark also shows that the ROP is also capable of writing two z+stencil values in the same clock cycle, at a rate faster than it's theoretical peak. This is likely due to the effects of Hier-Z and the fact that Z values get better compression rates. We also know it can write 2 AA samples per clock as well, so writing 4 AA samples takes 2 clock cycles, which isn't a problem because it's bandwidth limited anyway. They can't however, double Z with AA.

These figures really don't tell the whole story because each application is going to access the framebuffer differently and blends, access patterns, etc. are going to sap available bandwidth even more.
 
Titanio said:
To answer the OP's question, the paper numbers for Xenos is 4Gpixels/sec regardless of AA level. On a 8-ROP G70 it'd be 4.4Gpixels/sec with no AA and I think with 2xAA, and 2.2Gpixels/sec with 4xAA. A search should yield this info.

Wait, are those G70 numbers right? I thought the fillrate of G70 was like 12Gigapixels, and I didn't know it took a hit from AA. (unless you're talking about actual achieved fillrate with bandwidth, but that's not an on paper spec) I thought the whole purpose of using MSSA over SSAA is that there is no fillrate hit from using it. Even the xbox used that fact to advertise its fillrate at 4 gigapixels.

RSX sounds like it's going to get destroyed by Xenos if it doesn't have a massive fillrate advantage, since Xenos is superior in feature-set.
 
Fox5 said:
Wait, are those G70 numbers right? I thought the fillrate of G70 was like 12Gigapixels, and I didn't know it took a hit from AA. (unless you're talking about actual achieved fillrate with bandwidth, but that's not an on paper spec) I thought the whole purpose of using MSSA over SSAA is that there is no fillrate hit from using it. Even the xbox used that fact to advertise its fillrate at 4 gigapixels.

RSX sounds like it's going to get destroyed by Xenos if it doesn't have a massive fillrate advantage, since Xenos is superior in feature-set.



G70 - 430 MHz: ~6.8 gigapixels
G70 - 550 MHz: ~8.8 gigapixels
G71 - 650 MHz: ~10.4 gigapixels

peak, before bandwidth conciderations, and without/before 4x AA being applied

RSX has much less to work with.
 
Back
Top