PS3 GPU not fast enough.. yet?

Fafalada said:
You forgot most important one - UnRendered.

I imagine it has something to do with the fact that calling that "more then enough" implies they think of developers as complete idiots that never wrote a line of code for realtime 3d.

That's some developer love right there you know, for SCE PSP developers are their little special idiots ;).

SCE: aaaaw, look how cute they are...
 
Acert93 said:
:Faf, if you have the time and interest, what are your thoughts on relaxand's question about poly count in games. What are some of the factors (bottlenecks) preventing us from getting closer to the high theoretical setup rates on current GPUs?
who saiz we've prevented?, last time i tested with a benchmark (on my gf2mx) i was able to achieve the theoretical maximum actually rendered onscreen ~20.6million with a reasonable sized window.
i dont know about current cards (cause i don't have one, just a gffx) but i wouldnt be surprised if the quoted figures from nvidia/ati were achievable.
of course after i found out that 20million was possible i gave up worring about polygon counts, since polygon count had dropped down the list of main bottlenecks causes.
 
Gubbi said:
275Mtris is 4.5 tris/pixel @1280x720x60fps.
Shifty Geezer said:
How important is multipass rendering going to be on next-gen conosles? It was bread-and-butter for PS2, but with complex single-pass shaders how many times is overdraw going to need to be applied on the full scene?
Dynamic environment maps - need to draw the scene's geometry 6 times (hopefully simplified geometry, but it's still there)

Shadow maps - need to draw the entire scene's geometry (non-simplified), possibly multiple times for cascaded shadow maps

Motion blur - depending on the technique used, you need another pass to fill the velocity texture

Gears of War uses multipassing (and I think all UE3 games do, actually), and they stated that they draw 10 million pixels per frame. 720p is 0.92 MPix.

Many (maybe over half) of your polygons will be either offscreen or backfaces. When processing those, the pixel shaders' only hope of staying active is to feed off of visible polygons sitting in the post-transform cache, which won't be much. This means a lot of idle time, so saying 4.5 tris/pixel assumes the vertex shader is always working at full tilt and the pixels shaders never cause a VS stall (which is impossible because you can't enforce all the triangles to be small in a dynamic game).


On top of all that, you have no guarantee of getting 275M polys per second anyway. RSX's vertex engine can only load one attribute (i.e. a 4x32-bit vector) per clock. For a simple bumpmapped surface, you need position, normal, tangent, and texture coords. More complicated shaders could need much more data.

Add this all up, and I think practically achieving even 1M visible polygons in the scene would be very tough.
 
Last edited by a moderator:
Mintmaster said:
Dynamic environment maps - need to draw the scene's geometry 6 times (hopefully simplified geometry, but it's still there)

Shadow maps - need to draw the entire scene's geometry (non-simplified), possibly multiple times for cascaded shadow maps

Motion blur - depending on the technique used, you need another pass to fill the velocity texture

Gears of War uses multipassing (and I think all UE3 games do, actually), and they stated that they draw 10 million pixels per frame. 720p is 0.92 MPix.

Many (maybe over half) of your polygons will be either offscreen or backfaces. When processing those, the pixel shaders' only hope of staying active is to feed off of visible polygons sitting in the post-transform cache, which won't be much. This means a lot of idle time, so saying 4.5 tris/pixel assumes the vertex shader is always working at full tilt and the pixels shaders never cause a VS stall (which is impossible because you can't enforce all the triangles to be small in a dynamic game).


On top of all that, you have no guarantee of getting 275M polys per second anyway. RSX's vertex engine can only load one attribute (i.e. a 4x32-bit vector) per clock. For a simple bumpmapped surface, you need position, normal, tangent, and texture coords. More complicated shaders could need much more data.

Add this all up, and I think practically achieving even 1M visible polygons in the scene would be very tough.
Now everything i knew about graphics is out the window...What in hell do they mean about "x" amount of pixels in one pass of you still have to draw mutiple polygons?
 
Mintmaster said:
On top of all that, you have no guarantee of getting 275M polys per second anyway. RSX's vertex engine can only load one attribute (i.e. a 4x32-bit vector) per clock. For a simple bumpmapped surface, you need position, normal, tangent, and texture coords. More complicated shaders could need much more data.

Add this all up, and I think practically achieving even 1M visible polygons in the scene would be very tough.
So when these guys talk 275M polys etc., they're too stupid to realize that they can only load one attribute per clock? Seems like all of those things would be factored into that 275M poly number along with bandwidth availability etc.
 
ralexand said:
So when these guys talk 275M polys etc., they're too stupid to realize that they can only load one attribute per clock? Seems like all of those things would be factored into that 275M poly number along with bandwidth availability etc.

Nope the 275 is just a peak number.
 
forgot the numbers...when sony presents PS3 at E3'05 they talk about CPU & GPU speeds , the 2 Tflop performance , memory speeds , GPU fillrate , but they don´t talk about polygon performance...they possibly hide it because this numbers should be worst than xbox 360 and obviously they don't talk about it. because they wan´t :devilish:

what do you think about it ;)
 
[V] said:
forgot the numbers...when sony presents PS3 at E3'05 they talk about CPU & GPU speeds , the 2 Tflop performance , memory speeds , GPU fillrate , but they don´t talk about polygon performance...they possibly hide it because this numbers should be worst than xbox 360 and obviously they don't talk about it. because they wan´t :devilish:

what do you think about it ;)

Maybe I am wrong, but I think to remember nVidia talking about 1.1 billion.

Anyway, not interesting. I am more interested in how Cell and RSX cooperate.
 
Update from Joystick:

Update: The misunderstanding evident in the linked story relates to the distinction between local video memory and local system memory. The slow read speed under discussion is indicative of the feature's lack of utility. This is even reflected on the slide's statement: "No, this isn't a typo ..." A contact at Sony confirmed this telling me, "Again I cannot imagine a situation where you have any SPU reading from the RSX local memory." Nothing to see here, folks.

http://portable.joystiq.com/2006/06/05/rumor-ps3-hardware-slow-and-broken/
 
@ManuVlad: Hopefully no one here is still confused about the Inq article, though perhaps that 'update' will spread throughout the net since B3D seems to be the origin of a lot of other forums' talking points. ;)

@DarkRage: That was the vertex rate, not the setup rate.
 
Last edited by a moderator:
On top of all that, you have no guarantee of getting 275M polys per second anyway. RSX's vertex engine can only load one attribute (i.e. a 4x32-bit vector) per clock. For a simple bumpmapped surface, you need position, normal, tangent, and texture coords. More complicated shaders could need much more data.
I don't see this as any real problem since the work you do with an attribute typically occupies more cycles than the number of attributes anyway. In a best-case scenario where everything has already been processed in software, you still have to at least copy the input values into the output registers, which is just that many cycles.
 
ShootMyMonkey said:
I don't see this as any real problem since the work you do with an attribute typically occupies more cycles than the number of attributes anyway. In a best-case scenario where everything has already been processed in software, you still have to at least copy the input values into the output registers, which is just that many cycles.

you can copy a lot more than one attribute/per cycle accross your vertex shader array.
It's quite possible to be attribute read limited on RSX, to the point I know of people packing multiple "real attributes" into a single "attribute" to try and avoid this.
 
ShootMyMonkey said:
I don't see this as any real problem since the work you do with an attribute typically occupies more cycles than the number of attributes anyway. In a best-case scenario where everything has already been processed in software, you still have to at least copy the input values into the output registers, which is just that many cycles.
Post VS cache can screw with that. One transformed vertex can be reused to setup another triangle without processing it again.
 
you can copy a lot more than one attribute/per cycle accross your vertex shader array.
D'oh... for some reason, it went into my head that he meant per shader in the array.

It's quite possible to be attribute read limited on RSX, to the point I know of people packing multiple "real attributes" into a single "attribute" to try and avoid this.
Yeah, I can see it as one of the caveats of doing loads of work in software. But in general, there's enough work to be done inside of a vertex shader (at least when you have loads to do in a corresponding pixel shader) that you can theoretically fill in a good percentage of stalls that might occur due to attribute read rates.

Post VS cache can screw with that. One transformed vertex can be reused to setup another triangle without processing it again.
Yeah, but I think as the shaders get more and more complex, the ratio of cycle count per shader to attribute count per vertex goes up in general, so there are just that many more ALU slots to fill in the gaps. Also, just how big is the post-transform cache? I don't think I've heard of one that's all that huge.
 
Maybe I am wrong, but I think to remember nVidia talking about 1.1 billion.

Anyway, not interesting. I am more interested in how Cell and RSX cooperate.

The 1.1 billion are the number of vertices/sec of the G70...but in all the specialized webs never mention it.

By the other way...Cell and RSX should be a mess, this happens when you try to copy the competence. If Cell and RSX had an unified memory this should be better ;)
 
[V] said:
The 1.1 billion are the number of vertices/sec of the G70...but in all the specialized webs never mention it.

By the other way...Cell and RSX should be a mess, this happens when you try to copy the competence. If Cell and RSX had an unified memory this should be better ;)

Why? Doing it this way is logical if you consider RSX as being the show-horse of the whole thing: it's about the graphics bandwidth so make the graphics chip rule the memory bus. By having isolated pools RSX is allowed to play without caring what Cell is doing and vice-versa, how often should Cell be attempting to read out graphics data from the graphics chip memory (this is the same as on the PC)? It's only when you try to be clever and do GPGPU things that the PC architecture becomes limiting, Cell provides enough horsepower that you shouldn't need to do this *but* the option is there if you use the XDR to do compositing effects to buffers.

A unified architecture introduces a bottleneck when both chips wish to access the memory bus at once. As you want the graphics to be going full tilt and the same for Cell you really don't them to both be playing in the same paddling pool (that often).
 
Kryton said:
Why? Doing it this way is logical if you consider RSX as being the show-horse of the whole thing: it's about the graphics bandwidth so make the graphics chip rule the memory bus. By having isolated pools RSX is allowed to play without caring what Cell is doing and vice-versa, how often should Cell be attempting to read out graphics data from the graphics chip memory (this is the same as on the PC)? It's only when you try to be clever and do GPGPU things that the PC architecture becomes limiting, Cell provides enough horsepower that you shouldn't need to do this *but* the option is there if you use the XDR to do compositing effects to buffers.

A unified architecture introduces a bottleneck when both chips wish to access the memory bus at once. As you want the graphics to be going full tilt and the same for Cell you really don't them to both be playing in the same paddling pool (that often).

The PS3 should be programmed as a PC, this is obvious...but when SONY said "RSX is built for CELL and viceverse" , we can do the same than Xbox 360 , help the GPU when it's needed by the CPU, and you see the performance of these use is really worst. You might consider the better use of your System and this happens when you're making things that bottleneck your memory system.

The unified architecture make a good results , think in the original xbox and tell me if it was a bad idea. In 360 you have two memory controller one for the edram and the other for the System...you've got also a full duplex line to the memory, if the memory controller make a job good forgot all the problems.

Sorry of my poor english but ive got it oxydized.
 
[V] said:
The PS3 should be programmed as a PC, this is obvious..
No it shouldn't unless you want suckful performance, I'm saying the graphics hierarchy is the same as what we had (AGP) and no one had a problem with it until trying to push it too hard (the PS3 interface is entirely different but follows a similar design).

but when SONY said "RSX is built for CELL and viceverse" , we can do the same than Xbox 360 , help the GPU when it's needed by the CPU, and you see the performance of these use is really worst. You might consider the better use of your System and this happens when you're making things that bottleneck your memory system.
So, you're trying to say it's benefical because the performance of RSX's local memory is worse than the XDR? The whole point is XDR provides a common-ground bandwidth situation to both enabling the kind of help you are envisioning. It isn't Cell's place to tell RSX how to manage it's memory - they are distinct specialised processors. Sony have taken the 'use specialised hardware' against MS's choice to 'unify the entire graphics hierarchy' (as seen with the L2 cache locking, unified shading, shared memory etc). Both have benefits and are equally justified.

The unified architecture make a good results , think in the original xbox and tell me if it was a bad idea.
And, the PC has the opposite - which is better I don't know.


[quote[In 360 you have two memory controller one for the edram and the other for the System...you've got also a full duplex line to the memory, if the memory controller make a job good forgot all the problems.[/quote]

eDRAM is reserved for Xenos, as I understand (much like the GDDR3 pool here). Making a good memory controller that can handle requests coming at it from 2 locations is tricky because you have to give priority to someone, but who? Full duplex just means you can read/write so I'm not sure what you mean here.

Sorry of my poor english but ive got it oxydized.[/QUOTE]
 
Mintmaster said:
Add this all up, and I think practically achieving even 1M visible polygons in the scene would be very tough.
not tough but is it desirable, u will have horrible alaising (even with 4xAA perhaps even with 16x)
 
Back
Top