G70 Benchmarks @500/350

DaveBaumann said:
Obviously this isn't a scientific test, but may give some thoughts.
The only thought it has given to me is that this test is meaningless regarding PS3 since it doesn't resemble how RSX will work in that system.
Except that I was expecting a bigger hit on AA performances, too bad we don't even know if RSX will be able to AA FP16 render targets (at this time I don't think it will..)
 
There is a lot of talk about how much BW that will be available to the RSX for it to be able to do AA, HDR and stuff. In many of these people are trying to assess how much the RSX<->FlexIO<->XDR will "enhance" the BW for the RSX. There are qoutes ranging from 15GB->25GB. To me this is just ridicilous.

To me, the "nextgen" SW that will live on the PS3 (XB360, Rev...) will be pushing an immense number of triangles. I'm assuming that we will start to see in excess of 1 mega tris a frame later on. Since the design of (seemingly, supposing the RSX is ~G70) the PS3 relies on dynamic processing with the SPE's as in generating geomtry and more, the CELL->RSX BW will be used a lot by _just_ pushing the geom to RSX..

Just to make a simple example of "needed" BW to draw a scene with 1 mega tris.

For every single vertice we want
1) Pos (X,Y,Z) -> 3 floats -> 12 bytes
2) Norm (X,Y,Z) -> 3 floats -> 12 bytes
3) Base tex coords (U,V) -> 2 floats -> 8 bytes
4) Diffuse light tex coords (U,V) -> 2 floats -> 8 bytes
5) Specular light tex coords (U,V) -> 2 floats -> 8 bytes
6) Gloss map? -> 2 floats -> 8 bytes

->12+12+8+8+8+8=56 bytes

We need 1 vert/tri assuming that we have a perfect convex hull mesh. This gives us 1 M tri=1 M verts + 3 M indexes. Each index would probably be 32 bits, i.e. 4 bytes, so each tri would be 3*4+56=12+56=68 bytes. 1 M tris * 68 bytes=68 Mbytes of vertice buffer needed. To achieve this at 60 Hz->68 M * 60=4080 Mbytes= ~4Gb BW.

In more realistic scenarios we probably need more than this as on the average you need more than one vert for every new tri. Probably need more tex coord's to do interesting shaders. The list just continues. Say that we want to do a Z-pass first, that also has to be accounted for. In essence, I think most of the CELL<->RSX BW will be used for geomtry/texture stuff.

Also, this was "just" 1M tri scene.. That is not much. Say that we want to play a nextgen FPS with 32 players. We want higly detailed characters. Give them ~25K tris -> 25K*32=800K tri's for _just_ the characters. One could argue that that's not realistic since ther would be LOD, not all characters visible at the same time. The thing though is that they _could_, i.e. would be worst case.

And do belive that scenes > 1M tri is really realistic on the next gen consoles, as in the CELL case one has immense power to do bezier patches, skinning a.s.o.

Now, this also leads to another big Q regarding the BW estimations that are used. Many try to measure the needed backbuffer BW by todays existing games. This, to me, seems a bit shortsighted. I don't belive the BW saving features that exists in modern GPU's will be effective for the "nextgen" games.

Let's do some more math. 1 M tri scene, 5X overdraw, 1920*1080. Let's do an even distribution of the tri's. That would mean that the average coverage for a tri would be,
1M / 5 = 200K tris
1920 * 1080 =2073.6 K pixels
Avg. coverage=2073.6 K / 200 K=~10 pixels

10 pixel sized tris means that practically every single 4xAA sampled pixel will be touched by a edge and the MSAA compression in today's GPU's will not work, at least not to my understanding. This is not really the case with current games as they push _far_ less geometry.

Would be interesting to make such a benchmark to really see the effect of the current BW saving features for MSAA with high geomtry detail.
 
ChryZ said:
Was it confirmed yet, that RSX is a carboncopy of G70? Did I missed something?
How's that saying go again... ah yes, guilty until proven innocent. :oops:

Anyway, I'm with nAo here. Even if these tests were run with an actual RSX chip on the board they'd still be pretty meaningless in relation to PS3 software.
 
Robbz said:
For every single vertice we want
You're completely underestimating the precision requirements. You should use 64bit FP as the very minimum, but to be on the safe side I'd go with extended doubles for all the attributes.
So realistically you're looking at 112-140bytes for your vertices!
 
Fafalada said:
Robbz said:
For every single vertice we want
You're completely underestimating the precision requirements. You should use 64bit as the very minimum, but to be on the safe side I'd go with extended doubles for all the attributes.
So realistically you're looking at 112-140bytes for your vertices!

I was trying to do a "bare" minimum estimation, sort of the least one can get "away" with. But as you say, if we want "hidef" this will radically go up, and this just makes the BW situation worse!
 
Robz..I don't know how to explain it to you but Faf was being sarcastic, have you ever heard about vertex compression?
Your estimation is way overinflated!
 
nAo said:
DaveBaumann said:
Obviously this isn't a scientific test, but may give some thoughts.
The only thought it has given to me is that this test is meaningless regarding PS3 since it doesn't resemble how RSX will work in that system.
Except that I was expecting a bigger hit on AA performances, too bad we don't even know if RSX will be able to AA FP16 render targets (at this time I don't think it will..)

That is not totally true. While I agree the actual NUMBERS are irrelevant, the IMPACT are interesting to at least consider. I think we would all expect, including Dave, that a PS3 game would be more optimized and would have a faster frame rate in these games if designed for the PS3. But that does not mean analysing potential impact is meaningly. Of course it is meaningful, it demonstrates that this CHIP, in this memory configuration, will require special care NOT to be memory bandwidth limited. Dave did say it was NOT SCIENTIFIC. Meaning, "The numbers are not important" but I think he was drawing our attention to the theme. While being in a closed box with dedicated software alieviates some problems, it is not a magic cure for AA, AF, HDR, etc.

As for if, or if not, the RSX will work this way in the PS3, it is fair to say SOME games WILL work this way. Obviously developers pushing the graphical end and willing to sacrifice CPU performance some can leech off the XDR pool so they can get fancy back buffer effects.

Which begs the question: Should not developers play to the strengths of the PS3? Would it not be best, in general, to NOT have AA or HDR, and instead use CELL to power a MASSIVELY complex game? Stealing bandwidth away from CELL (which is very memory sensative) for backbuffer effects does not appear to play to the PS3's strengths. Why not instead build games around the PS3's STRENGTHS? Which of course being massive CPU performance that can power complex geometries, massively interactive worlds, accurate physics, excellent animation, and so forth.

While I agree the specific numbers are irrelevant, the theory is at least sound for situations where RSX will be limited to the GDDR3 memory pool. And no one can say that no game will run in this configuration because I believe it was already stated some will.

Also, as Dave made no conclusions, it is unfair to be defensive.

And you are correct that this does not necessarily represent the total memory available to the RSX in the PS3. But this does indicate that for HDR and 4x MSAA RSX will most likely need that extra memory bandwidth in modern games.

So this test is not saying, "Oh no, PS3 is doomed!" Nothing of the sorts. It does point out that, in a very loose unscientific way, that PS3 games will be bandwidth limited before shader limited at HD resolutions when HDR and AA are used. Meaning that to get the BEST performance the RSX will need (in most situations) to use some of the bandwidth from the XDR memory pool to get a better balance.

And I do not believe anyone realistically expected anything different. There is a reason that GPUs with 16-22GB/s of memory bandwidth (like the 6600GT and 9800Pro) have a hard time with those features at higher resolutions. Obviously not apples-to-apples, but they are an indication that even less powerful shader cores ARE bandwidth limited with those effects enabled.

And that is all I got out of Dave's post. That AA and HDR will most likely cause PS3 to be bandwidth limited at HD resolutions. So either 1.) developers will need to use memory bandwidth from the XDR (which they can!!) or 2.) they can skip the HDR and AA and instead focus on the strengths of their platform.

If I was a developer I would not be trying to match the 360's strengths. Instead I would be going, "Ok, 218GFLOPs of CPU performance. A GPU that is MASSIVELY powerful at shader ops. Lets go with a LARGE world with a ton of geometry, everything interactive, dynamic, and breakable. Lets make some killer AI and breath taking animation and physics. And then lets fill the screen with huge monster 50 stories tall and thousands of little creatures. All on the screen at the same time, in a forst, for miles on end with no fog". Or advance similations (like for car racing) or whatever. There are things that the PS3 should be better at be a large margin.

I just do not understand the facination of competing with the 360's strengths (strengths relatively free in its setting) when it means compromising the PS3's strengths. They are different design philosophies and we should enjoy that fact. Not to say some games wont have a lot of AA or HDR, because the PS3 can very well do that. The question is, is that the best approach for most games? What if that is at the expense of CELL performance?

Because no one knows yeat what will happen to CELL if you leave it with 5GB/s of bandwidth (my guess: not a good situation).

Ps- Why is the focus ALWAYS on RSX using XDR memory bandwidth? CELL can access the GDDR3 pool as well. So why is the assumption always the reverse situation? I think it says a lot about how graphic-centric we are :LOL:
 
nAo said:
Robz..I don't know how to explain it to you but Faf was being sarcastic, have you ever heard about vertex compression?
Your estimation is way overinflated!

I know that he was that.. doubles, esp. ext doubles, is something you don't use... no GPU, that I know of, can take doubles as input... I just took it for a spin.. :)

But, more seriously, how would you achieve vertex compression when doing tesselation with the SPE's? AFAIK, the SPE's strong points is not doing bitwise compression..

Also, if my est. are so overinflated, what would a "real" world est in your point of view be?

Btw, exactly what type of vertex compression are you talking about? And what ratios?
 
Assuming G70 is to RSX what GeForce3 is to NV2a in XB, have we got comparative numbers for PC implementations on a PC game with that GPU versus the XB game with that GPU?

This'll give a foggy guide on how similar hardware in a closed box environment performs, and give an indication how G70 would perform in PS3 (all things are being equal) if RSX is but an upclocked-G70.
 
Shifty Geezer said:
Assuming G70 is to RSX what GeForce3 is to NV2a in XB, have we got comparative numbers for PC implementations on a PC game with that GPU versus the XB game with that GPU?

This'll give a foggy guide on how similar hardware in a closed box environment performs, and give an indication how G70 would perform in PS3 (all things are being equal) if RSX is but an upclocked-G70.

Yeah, the PS3 is gonna kick the G70's @$$ !!!!!!!!!!111one!!111

And that is no joke or exeggeration.

Stupid PC games are still designing to be compatible on DX7 and DX8 hardware. With SM 3.0 style HW the bare minimum and a ton of RAM... oh yea, bring it on!
 
Acert93: You're assuming a lot of stuff I never said/addressed in my reply, I've just wrote that that test has not meaning at all regarding PS3, what should it demonstrate? that RSX will be bw limited?
One can't just hide behind the sentence "this is not a scientific test", of course it isn't, it's a malformed test!
Try to find my first comments about RSX the day Sony/NVidia talked about it, when I didn't know it has access to XDR ram and CELL's cache and local stores, I wrote it sucked and I didn't need to run some tests to prove it, it's called common sense ;)
That test, IMHO, has no meaning cause it doesn't represent in any way the enviroment where RSX will 'play', one can't just cut away 35 GB/s of bw and pretend all that bandwith is not there.
Sidenote: one could use ALL the FlexIO bw without stealing a syngle cycle on the XDR ram (thus not starving CELL).
 
Acert93 said:
Which begs the question: Should not developers play to the strengths of the PS3? Would it not be best, in general, to NOT have AA or HDR, and instead use CELL to power a MASSIVELY complex game?

That might be a smart approach, but still, as AA is so essential feature it just feels like a bad deal on the whole. As an end user I would really really desperately want to have AA. A tradeoff that loses AA and gains massive improvements somewhere else still ends up a bit disappointing. A bit similar as if the game would be otherwise super-duper megaimpressive, but as a tradeoff have no sounds at all...
 
nAo said:
Sidenote: one could use ALL the FlexIO bw without stealing a syngle cycle on the XDR ram (thus not starving CELL).

At 3200MHz effective frequency XDR indeed quick.

XDR = 25GB/s
GDDR3 = 22GB/s
Total = ~47GB/s

RSX = [22GB/s (GDDR3)] + [20GB/s + 15GB/s (XDR)] = ~57GB/s

Basically the RSX could theoretically saturate the RAM's bandwidth limits before hitting the FlexIO limitations allotted for the RSX.

Btw, how are you defining "starve"? Usually on the PC CPU and GPU end starved is defined as the point where there is no performance hit due to a bottleneck elsewhere in the system.

Do you have information on how much bandwidth CELL will require when running at full capacity (i.e. 7 SPEs & PPE at 100% utlization)? If so please share :D
 
Robbz said:
But, more seriously, how would you achieve vertex compression when doing tesselation with the SPE's? AFAIK, the SPE's strong points is not doing bitwise compression..
You don't want to compress your vertices with huffman or lempel-ziv like algoritmhs. On current platforms vertex compression is achieved through range and precision reductions. As example vertex position can be encoded with 3 signed shorts, or 3 signed bytes in some corner case.
Also, if my est. are so overinflated, what would a "real" world est in your point of view be?
Don't think about floats anymore, once your data range is bounded (and you can bount it 99% of the time) you can achieve very good quality simply compressing it to a given range (-1..1, or 0..1) and converting it to a fixed point representation using far less bits.
On the title I'm working on I use shorts to represents position coordinates and texture mapping coordinates, and bytes to represent normals.
Regarding your estimation it would be really simple to cut that figure and have something under 30 bytes per vertex
 
jimpo said:
Acert93 said:
Which begs the question: Should not developers play to the strengths of the PS3? Would it not be best, in general, to NOT have AA or HDR, and instead use CELL to power a MASSIVELY complex game?

That might be a smart approach, but still, as AA is so essential feature it just feels like a bad deal on the whole. As an end user I would really really desperately want to have AA. A tradeoff that loses AA and gains massive improvements somewhere else still ends up a bit disappointing. A bit similar as if the game would be otherwise super-duper megaimpressive, but as a tradeoff have no sounds at all...

If Sony had not emphasized 1080p I think the AA issue is a moot point.

720p/1080i have 1/2 the pixels of 1080p and AA is a much smaller hit and HDR is very playable on G70 at this resolution as well. One could even make the arguement that if G70 supported HDR+MSAA some developers could design games around these features at 720p and do quite well.

Which in fact me WAY see. Just because PS3 supports 1080p does not mean everyone has to use it. Tradeoffs. A lot of AA and other effects at 720p and just upscaling for 1080i/p sounds like a good tradeoff from a developer and consumer standpoint.

In the real world I think devs will shock us with they can do, and I also think many will be pragmatic about it. Just because a feature is there does not mean it must be used. With so few 1080p sets out on the market I would think a lot of early games would go for features/frame rate over catering to a VERY SMALL 1080p market.

That could change in 2008/2009. But by then developers will have a much better handle on the platform.
 
Acert93 said:
RSX = [22GB/s (GDDR3)] + [20GB/s + 15GB/s (XDR)] = ~57GB/s
Dunno if you got my point, I've just said you don't need to 'steal' XDR bw in order to make some use of the FlexIO bw
Basically the RSX could theoretically saturate the RAM's bandwidth limits before hitting the FlexIO limitations allotted for the RSX.
Yes it could.

Do you have information on how much bandwidth CELL will require when running at full capacity (i.e. 7 SPEs & PPE at 100% utlization)? If so please share :D
You don't know how much bw you need if you don't know what it will run
 
nAo said:
Robbz said:
But, more seriously, how would you achieve vertex compression when doing tesselation with the SPE's? AFAIK, the SPE's strong points is not doing bitwise compression..
You don't want to compress your vertices with huffman or lempel-ziv like algoritmhs. On current platforms vertex compression is achieved through range and precision reductions. As example vertex position can be encoded with 3 signed shorts, or 3 signed bytes in some corner case.
Also, if my est. are so overinflated, what would a "real" world est in your point of view be?
Don't think about floats anymore, once your data range is bounded (and you can bount it 99% of the time) you can achieve very good quality simply compressing it to a given range (-1..1, or 0..1) and converting it to a fixed point representation using far less bits.
On the title I'm working on I use shorts to represents position coordinates and texture mapping coordinates, and bytes to represent normals.
Regarding your estimation it would be really simple to cut that figure and have something under 30 bytes per vertex

I was about to edit my post, suggesting such a scheme, as is normally used in mesh compressions. The thing I was a bit unsure about was if you actually can decompress such things properly in the vertex shader.

Still, it's a pity that one have to do this "extra" work because of memory BW considerations. It'd be a lot nicer if the BW would be "enough" to sustain the maximum precision, but that's just wishful thinking.. :D

Btw, this kind of compression is probably mandatory for the storage of the original meshes/base meshes a.s.o. I was just not convinced that it would be a good thing for realtime tesselated stuff.
 
Robbz said:
The thing I was a bit unsure about was if you actually can decompress such things properly in the vertex shader.
XBOX and PS2 decompress data in vertex shaders all the time in current games ;)

Still, it's a pity that one have to do this "extra" work because of memory BW considerations. It'd be a lot nicer if the BW would be "enough" to sustain the maximum precision, but that's just wishful thinking.. :D
What do you prefer, more precision almost no one can be aware of or potentially more 'stuff' to display?

Btw, this kind of compression is probably mandatory for the storage of the original meshes/base meshes a.s.o. I was just not convinced that it would be a good thing for realtime tesselated stuff.
It would be..imho.
 
Back
Top