PS3 GPU not fast enough.. yet?

DUALDISASTER said:
that 500 million poly count for the 360...isn't that with the 360's gpu rendering poly's with no textures or pixels?

No, 500M is the setup limit just like the 275M is the RSX setup limit. Xenos can setup 1 vertex/cycle (500M) or 1 vertex every 2 cycles when tesselating (250M). As Megadrive said, the exact PR-phrase was, "500M with non-trivial shaders" or if all 48 ALUs were dedicated to only vertex processing (and not setup limited) it is in the 6B range (G71's Vertex Shaders are in the Billion range as well).

Of course this is all assuming no limits on bandwidth or memory footprint. I don't think we will be seeing 500M poly/s games; but what has been said is that the poly counts this gen are more reasonable in regards to screen output. Xbox1 had PR-speak of 100M+ tris/s, yet games were probably at about 10% of that. The Xbox1 PR numbers pretty much assumed minimal effects. This gen the GPU itself can push that many triangles, the problem is whether the rest of the system bottlenecks can achieve this and whether this is the best budget of resources for a game design. So far games have been decidely pixel heavy and have used normal mapping and parallax occlusion mapping to fake geometry and shadowing/lighting.

On the other hand we have seen geoemtry increase this gen. Of the few non-PC ports, I think PGR3 is a good example. The cars were in the 100K range (of course much of this was culled/non-visible) and some of the world assets had staggering polycounts (e.g. the Brooklyn Bridge). Likewise a game like Kameo had 3K individial characters on screen and we have heard of massive GPU-based particle systems (1M particles).

Whether or not the setup limit will be an issue is something developers can tell us more about. From a technical standpoint, having a high setup limit could be helpful for situations like a Z setup or shadow passes as Dave mentioned. It would be nice to get some more direct information on vertex shader processing and setup limit in regards to vertex bound segments of a frame's rendering. I know Robobo (sp?) has some nice slides showing a typical frame's rendering, it would be nice to get more information on what is happening when and what it is being limited by.
 
It's too bad that these pr poly numbers don't factor in other limits in the system. But, hey, I realize its a numbers race for those companies involved(7 controllers etc.).

Maybe one day we will know the main performance hog in these machines.
 
Gubbi said:
275Mtris is 4.5 tris/pixel @1280x720x60fps.

The only thing that seem likely to be bounded by this is the noise caused by certain people concerning the 500Mtris/s setup limit on Xenon.

/ passes per frame.
 
Panajev2001a said:
Well running eight 16 cycles shaders at 550 MHz (and normally on NV4X/G7X chips we have seen the VS ALU's domain clocked a little higher than the quoted clock-speed) in parallel would produce exactly that amount and I do not think that all the vertex shading will be done only by those 8 VS ALU's or that SPE's will always be charged to rasterize and texture all polygons they run Vertex Shader programs locally on.

We know that Pixel Shading is where you will spend a HUGE amount of your frame-time on, but are we really assuming no T&L at all will be done on any SPE and if it gets done the SPE's will always also rasterize them ?

Last I heard Xenos' ability to churn through VS programs dedicating for some limited number of cycles up to all 48 Shader ALU's to the task did not seem so unuseful. So it is not like having a good amount of available Shader ALU's never pays off :p.

I'm sure you could write a program that between the SPEs and VS would feed enough tris to the setup stage of the pipe such that it became a bound ;)

I'd simply question how reflective that would be of any game, or typical game. RSX, I'm sure, is not any different from other G70s with regards to its two cycles per tri setup speed, and I don't think I've ever heard a dev talking about being bound by that, or indeed any chip's setup rate. I think it's far more likely you'll be bound by something else like fragment shading or fillrate (even vertex shading I've read is rarely the bottleneck). My outlook is kind of fed from books and presentations here so I'd defer to the comments of developers, but in reading various material on GPU optimisation for a current project, one didn't even give any time to triangle setup as a possible bound (nVidia's GPU Gems, Chapter 28..gives a nice overview of the entire pipe in terms of optimisation, but this doesn't even factor into it :p), and the other which I referenced earlier simply says no more than that it is never the bound.
 
Megadrive1988 said:
nope. not from what most of us have read.

the 500 million polygon count of the Xenon's (Xbox 360's) GPU ( theXenos) goes all the way back to mid 2004. when tech info first came out. it's the triangle setup limit.

X360's 500M polygons/sec is with textured polygons, most likely with all features on (an old-school term) and with "non trival shaders" meaning probably, decent amounts of pixel-shading going on, I would think.

the Xbox360's Xenos GPU, if using ALL of its 48 shader ALUs simply for geometry transforms, could theoretically hit
6 BILLION vertices / polygons per second.

or I suppose if you want to count 1 vert, as 1/3 of a polygon, then 2 Billion polygons/sec.
that's raw computational performance. not what could be displayed on screen.

anyway, the Xbox 360's Xenos GPU is more likely to hit 500 million pixel-filled, textured, pixel-shaded polygons/sec with features on, than Xbox1 hitting ~116 million polygons/sec with its 233 MHz NV2A GPU.


now one or more of Beyond3D's real techheads can point out where i screwed up :p
Thanks, Know do that with the RSX!
 
Dave Baumann said:
/ passes per frame.
How important is multipass rendering going to be on next-gen conosles? It was bread-and-butter for PS2, but with complex single-pass shaders how many times is overdraw going to need to be applied on the full scene?
 
Panajev2001a said:
Sure, the beauty is to see if and by how much each set of numbers drop when adding real work done by "real applications" to it, but devs ere do not ant to play nice and give the numbers out ;). J/K, no jobs risked is better :).

Well judging from the Games we actually saw at E3, the PS3 does not seem to have any problems with Polycounts. In fact Games look like they got a lot more polys than current 360 games (+ running @ 60fps).
 
Shifty Geezer said:
How important is multipass rendering going to be on next-gen conosles? It was bread-and-butter for PS2, but with complex single-pass shaders how many times is overdraw going to need to be applied on the full scene?

With lots of post effects and offscreen rendering, it'll probably actually be more than last gen.
 
With so many polys the systems ar supposed to be able to push, with more polys/ screen than pixels, I do not understand why so apparent poly adges that one can still see are not a thing of the past.

I almost hope that some dev will go for the extremely high poly count route where for once we might get really smooth poly edge free images, and I don't care if shaders and textuers are more simplistic...
 
Shifty Geezer said:
How important is multipass rendering going to be on next-gen conosles? It was bread-and-butter for PS2, but with complex single-pass shaders how many times is overdraw going to need to be applied on the full scene?
HDR, bloom, depth of field, reflection/refraction maps, shadow, Z, opaque color, alpha color and God knows what else increase overdraw. The opaque color pass is simply one of many passes to take every frame.

But yes, we should generally not be multipassing for opaque color anymore.
 
If nothing else, polys consume bandwidth - the data that defines them doesn't come out of thin air.

What's puzzling me is did we or didn't we know that NV4x sets-up 1 triangle every 2 clocks? Was this a surprise?

As to the Cell<->RSX bandwidths - well it seems to me that if you think of this like fast TurboCache, then everything's hunky dory. No big deal really.

Jawed
 
Jawed said:
What's puzzling me is did we or didn't we know that NV4x sets-up 1 triangle every 2 clocks? Was this a surprise?

It was to me, but I am not sure that is saying much ;) My experience at NV hardware launches was for them to focus on the abilities of the Vertex Shaders. e.g. the 7600GT is claimed to push 700M vertices/s, and the 7800GTX 512MB 1.1B vertices/s. I am not sure I have ever seen NV mention the setup limit for the NV40 or G70 series. But I could have just missed it. I am not aware of ATI's R5x0 series' setup limit either.

In the bigger picture, nothing has really been said officially about the RSX vertex shading abilities. There is some stuff going around now, but even 2 months ago there was some speculation that they still may not have VS or they may be augmented in some significant way.

If nothing else, polys consume bandwidth - the data that defines them doesn't come out of thin air.

Unless you are procedurally creating them and streaming them to the GPU :LOL: But geometry is pretty large. I think a dev once mentioned that 5-10M poly/frame would be 100-160MB in explicitely stored geometry, but that is from memory (my numbers could be offer). And as mentioned stuff like multi-texturing is going to increase the work on the GPU side (although it was mentioned instancing could alleviate some of this problem). But like you (and ERP and others said) these are just peaks and are not really relevant to what we will see in games. I wonder how slow a 10M poly model would be to shadow in realtime :D

Btw, on the PC side of the B3D fence there has been some talk about small-triangles slowing down GPUs and how D3D10 will resolve some of this. Is this a factor on either of the current console GPUs?
 
Btw, on the PC side of the B3D fence there has been some talk about small-triangles slowing down GPUs and how D3D10 will resolve some of this. Is this a factor on either of the current console GPUs?

Your mixiung up two different things here, small polygons are rendered significant'y slower, because the quads, memory buffers and ALU groupings are not effectively utilised.

D3D10 doesn't fix this.

What D3D10 fixes is the small batch problem, where submitting large numbers of draw calls with small numbers of primitives is expensive. But this isn't an issue on consoles anyway.
 
Platon said:
With so many polys the systems ar supposed to be able to push, with more polys/ screen than pixels, I do not understand why so apparent poly adges that one can still see are not a thing of the past.

I almost hope that some dev will go for the extremely high poly count route where for once we might get really smooth poly edge free images, and I don't care if shaders and textuers are more simplistic...
This is my feelings too. Whenever this discussion comes up someone screams we don't need all those polys because that's 4 per pixel blah, blah, blah but I can think of a ton games that could use a crap load more polys. If poly rendering isn't the problem then why are we still seeing poly edges in almost every game. Where's the glitch in the system that's preventing devs from coming anywhere close to those 275M numbers?
 
ralexand said:
Where's the glitch in the system that's preventing devs from coming anywhere close to those 275M numbers?

And to be fair, the 500M number in the Xbox 360. Here are some of my thoughts. I am sure some are iffy, if not outright wrong. But hey, it may get some devs to post better reasons ;)

* Part of it can be chalked up to porting and getting a handle on the new hardware.
* Setup limit is only part of the story; you still have to transform all those polygons. Another issue is vertex shading and pixel shading don't always seem to run in parallel; if Vertex Shading runs heavily for 15% of a frame you are nowhere near your peak.
* High Poly is not very performance compatible with a lot of the current shadowing techniques.
* The current tools and middleware has been engineered and evolved with Pixel Shader focuse in view.
* Geometry is resource intensive. If I quoted the numbers right, 5M triangles per frame is ~100MB. @ 60fps that is 6GB/s of bandwidth just for triangles. You still have to have textures, animation, sound, game engine, etc and caching -- and balancing your rendering engine (you still have to apply Pixel Shaders and Textures to all those triangles)
* Without tesselation (or LOD with super high resolution models for close up), to avoid poly edges you are going to need much higher poly counts than pixels (e.g. when a 40k poly object gets really close and is the only object on screen you are going to have about half culled from the back side, so your 1M-2M pixel display will be showing off a 20k poly model, which is 1 poly for every 5-10 pixels)
* Heavy Pixel Shading currently has higher IQ returns.
* Traditional GPUs have a fixed ration of Pixel and Vertex Shaders (favoring the former) and we are reaping the effects of this legacy.
* As ERP kindly pointed out, small triangles are not as effeciently handled in current GPU designs.

But like you I am curious where the bottlenecks are in the current GPUs/Consoles in general, but specifically the hurdles for reaching higher poly counts. Will these be things that we slowly overcome in 5 years?
 
500 Million is just the setup engine limit on Xenos. ~8 billion polygons/sec is the theoretical limit for Xenos (untextured, unshaded, unlit)
 
Panajev2001a said:
And at 1080p that means 2.2 triangles/pixel... the little problem is understanding if nVIDIA specs mean what the specs SCE used for the GS and many other vendors do as well mean 1 vertex = 1 triangle when rating their graphics processors else I would have to believe that the GS 's set-up engine would saturate at 225 MVertices/s (since they quote it at 75 MTriangles/s) ;).

You need 3 points to setup a triangle or can the setup pipeline cache 2 of the previous points like the transform pipeline ?

I hope that I am getting this wrong and when nVIDIA says 2 cycles per triangle really means three vertices and not each vertex of a triangle list because RSX having a triangle set-up rate only 3.6x the one of GS seems lower than what people would expect from a console shipping in late 2006 at such a price-point (which it still worth what's inside the console for me though).

The rate is still the same 2 cycle per triangle. If you read up on all NV performance presentation over the years, they always mentioned while other things are improving, triangle setup only improve mostly with clock speed.

Even if these seem PRman like stats, I do not think that 2.2 or 4.5 vertices per pixel counting overdraw and multi-pass rendering (which sometimes you just cannot avoid) is that high amount. It reminds me of the answer SCE gave when asked about the fact that PSP's GE+GU do not support HW clipping on other planes besides the front plane... oh why worry... you have a coordinate system [0;0] till [4096;4096] which is MORE than enough and suddenly tons of developers rose in anger :p.

Lots of sub pixel triangles are bad for the efficiency of the pixel shader. This were often mentioned in NV performance presentation over the years too.
 
Last edited by a moderator:
DopeyFish said:
~8 billion polygons/sec is the theoretical limit for Xenos (untextured, unshaded, unlit)
You forgot most important one - UnRendered.

Panajev said:
you have a coordinate system [0;0] till [4096;4096] which is MORE than enough and suddenly tons of developers rose in anger :p.
I imagine it has something to do with the fact that calling that "more then enough" implies they think of developers as complete idiots that never wrote a line of code for realtime 3d.
 
Fafalada said:
You forgot most important one - UnRendered.

:LOL:

The "I gotcha!" of PR-marks with a unified shading design hehehe

Faf, if you have the time and interest, what are your thoughts on relaxand's question about poly count in games. What are some of the factors (bottlenecks) preventing us from getting closer to the high theoretical setup rates on current GPUs?
 
Back
Top