PS3 GPU not fast enough.. yet?

http://www.theinq.com/images/articles/PS3_memory_bandwidths.jpg

Since it said Devstation at the bottom, I'm assuming it means the Devstation conference held during March 1-2.

At this time, final devkits were not available, going by This news report. More importantly neither Cell nor RSX were finished at that time.

So it is not clear what the picture at theinquirer.net means. If it meant the unfinished Cell & RSX, then this photo may be meaningless. If it is refering to final version of Cell and RSX, then they would be something worth discussing.

As for the vertex setup rate, I can only assume that is the same setup rate for the GF7900.
 
Fafalada said:
How about the same way everything else gets there - namely by GPU initaing a read from memory.

yeah but I was thinking along the lines of that the memory needs to be written at some point by the cpu.
 
So..this doesn't seem to mean much at all..basically..Cell cant read or write very fast from GDDR..

More interesting to me is the clockspeeds. RSX being an all new design with the new memory controllers, I wonder if they're struggling to get it to speed. In other words it may not clock the same as G71.

On the other hand, Sony guys now have an excuse for less than great looking in progress titles, just like the Xbox360 alpha kits :) "Just wait'll they get that extra 130 mhx clockspeed on rsx!!"

I assume this is why Sony is not divulging the speeds, it's not a planned upgrade or downgrade, they just dont know yet. The positive side is 550 mhz is apparantly still the plan.
 
The article get me to guess what they mean its SPU's local memory got that bandwith, but those numbers is not any flaw, they is so special, its probably a design feature we "commoners" dont know much about yet.
Shouldnt be any NDA'ed stuff at this monent, though, would be interesting to hear what some who works on it has to say.
 
kimg said:
The article get me to guess what they mean its SPU's local memory got that bandwith, but those numbers is not any flaw, they is so special, its probably a design feature we "commoners" dont know much about yet.
Shouldnt be any NDA'ed stuff at this monent, though, would be interesting to hear what some who works on it has to say.
If the SPU's were capped at 16 MB/s, you'd manage only 8 million float reads a second, so a processor capable of many gigaflops would be achieving single figure megaflops :oops:

It strikes me as odd that RSX can access XDR at high rate, but Cell can't access GDDR at anything like it's actual rate. That slide is tricky to understand, with no definition of what local memory is. Main Memory appears to be XDR based on quoted speeds, with RSX access going through FlexIO with the 35 GB/s aggregate figure we know. Local memory appears to be GDDR from the figures, RSX having 22 GB/s, so the name Local Memory seems misapplied - GDDR is not local to Cell. Unless they're talking about RSX while showing this :???: . But how is the Cell access so restricted? We're looking at 1/1000th the read performance of the memory's capability! I wouldn't have thought any bus could be that slow. Even if the GDDR isn't supposed to be read by Cell, if you're going to have that ability, how do you manage to have it so slow?
 
The GDDR3 is the only 22.4 GB/s for RSX memory that I can think. If 16MB/4GB is the Cell-GDDR3 BW that mean that you should forget about Cell using anything that is there.

If you don't put the backbuffer in the XDR memory you won't be able to use the Cell for postprocesing.
 
For local memory, the measured vs theoretical bandwidth is missing, I wonder why? RSX is at a solid 22.4GBps for both read and write, good job there green team. Then comes the blue team with Cell. Local memory write is about 4GBps, 40% of the next slowest bandwidth there. Then comes the bomb from hell, the Cell local memory read bandwidth is a stunning 16MBps, note that is a capital M to connote Mega vs a capital G to connote Giga. This is a three order of magnitude oopsie, and it is an oopsie, as Sony put it "(no, this isn't a typo...)".

The poor memory performance between Cell and the GDDR3 (local memory) should be attributed to broken RSX. The Inq article should blame the Green Team.

But it should be expected. Cell was designed for other Cells to tunnel through it, in getting data for its memory pool. The RSX is different story. You probably can tell this from where NV SLI performance can falter.

However, the team probably expect the GDDR3 is going to be going at full capacity. Beisde from all the discussion here, the GDDR3 is the one that going to need help. Cell with its local store might be able to spare the XDR bandwidth. So its good to know that the RSX going to get abit more bandwidth compare to the typical 128bit memory bus GPU. I was afraid that NV was going to screw up that part too.
 
Graham said:
yeah but I was thinking along the lines of that the memory needs to be written at some point by the cpu.
Well, I have a hard time imagining a design where GPU reads directly from CPU registers, so it goes without saying that results have to be written out by CPU to somewhere before GPU can get them.
 
Fafalada said:
How about the same way everything else gets there - namely by GPU initaing a read from memory.

It is faster, but it steals cycles for Memory Operations between the CBE chip and XDR about data needed by programs running on the CBE chip itself.

A write initiated by CELL to GDDR3 could be a win if the data comes from a LS as that could allow it to run in parallel with a memory operation on XDR memory.
 
deathkiller said:
If you don't put the backbuffer in the XDR memory you won't be able to use the Cell for postprocesing.

Look at how RSX can write out to XDR. Presumably if you want your backbuffer to be processed by Cell you would have RSX copy it out to XDR rather than have Cell do that or operate on it directly in GDDR3?

Also, look at Dave's comment (which is fitting in more than one way). This is just about who is wearing the britches really, when it comes to FlexIO. You might even say - gasp - PS3 is pretty GPU-centric as far as memory access goes. When RSX is the client on FlexIO, things are impressively "go go go!", when Cell is, not so much.
 
Last edited by a moderator:
Well, this is all about the Devkit status back in march, three months ago. Since nobody complains right now, and E3 games didn't seem to have any problems at all, i presume this is the usual inq bulls***.
 
Nemo80 said:
Well, this is all about the Devkit status back in march, three months ago. Since nobody complains right now, and E3 games didn't seem to have any problems at all, i presume this is the usual inq bulls***.

Given by RSX GDDR3 bandwidth figures it would appear they are thinking about a 550 MHz RSX with 700 MHz memory clock.
 
Shifty Geezer said:
It strikes me as odd that RSX can access XDR at high rate, but Cell can't access GDDR at anything like it's actual rate. That slide is tricky to understand, with no definition of what local memory is. Main Memory appears to be XDR based on quoted speeds, with RSX access going through FlexIO with the 35 GB/s aggregate figure we know. Local memory appears to be GDDR from the figures, RSX having 22 GB/s, so the name Local Memory seems misapplied - GDDR is not local to Cell. Unless they're talking about RSX while showing this :???: . But how is the Cell access so restricted? We're looking at 1/1000th the read performance of the memory's capability! I wouldn't have thought any bus could be that slow. Even if the GDDR isn't supposed to be read by Cell, if you're going to have that ability, how do you manage to have it so slow?

Some of this stems back to the “PC design” comments.

What NVIDIA have done is basically keep the same command processor as G70/71, which is fundamentally designed for PCI Express bandwidths (i.e. 4GB/s in one direction and 4 in the other) and turn two of the of the 4 64-bit memory channels that would go to DDR on G70/71 back on to FlexIO (so two go to DDR and two go to FlexIO) which is why you see 22.4GB/s of FlexIO bandwidth (these two memory channels will be running at the same speed as the two going to the DDR memory – if the DDR memory changes then the FlexIO bandwidth is likely to change). [Edit] - Sorry, FlexIO read is 20.4GB/s so there is a little difference between what FlexIO can read from main memory.

The bandwidths for RSX FlexIO operation is mainly for texturing as this doesn’t need to pass through the command processor. If vertex data is being produced by Cell then it has to go through the command processor so the read in performance here is 4GB/s, but, as the document states, that’s probably going to be setup limited before the bandwidth for the command processor is saturated.
 
Panajev said:
It is faster, but it steals cycles for Memory Operations between the CBE chip and XDR about data needed by programs running on the CBE chip itself.
If you are so starved for CPU XDR bandwith that you can't share it at all, you have bigger problems then trying to assist GPU IMO.

A write initiated by CELL to GDDR3 could be a win if the data comes from a LS as that could allow it to run in parallel with a memory operation on XDR memory.
Personally I pretty much consider any larger CPU write initiated to external memory (that isn't meant to be highly cloned/reused data instance) a loss, period.
Depending on the machine / memory architecture it may or may not be a loss that you can live with.
Maybe if external memory wasn't so damn slow compared to our computing architectures - I would reconsider my stance, but it doesn't seem like that's likely to happen anytime soon.

But I disgress. Anyway looking at GDDR to aleviate XDR bandwith seems to be upside down perspective on things to me.
 
700 mhz memory should be a slam dunk. It's already in 360. BUt of course 360 had issues with RAM shortages according to the rumor mill. And I wonder if 360 is contending with PS3 for any limited supply of GDDR?
 
sonyps35 said:
700 mhz memory should be a slam dunk. It's already in 360. BUt of course 360 had issues with RAM shortages according to the rumor mill. And I wonder if 360 is contending with PS3 for any limited supply of GDDR?

PS3 has 2 memory pools, one for the CPU and one for the GPU, everyone with its bandwidth.

360 main bandwidth is shared between the CPU and the GPU and if you want to take the real performance you must use the eDRAM in Xenos.
 
sonyps35 said:
And I wonder if 360 is contending with PS3 for any limited supply of GDDR?
700MHz was still reasonably new when 360 started manufacturing; its a year on now and yeilds should be be good for these memories - graphics cards are shipping with 900MHz and 800MHz RAM and with demand for production of these speeds, 700MHz should be fine.

Urian said:
360 main bandwidth is shared between the CPU and the GPU and if you want to take the real performance you must use the eDRAM in Xenos.
Its not a "choice" - Xenos will always use eDRAM and there is nothing you can do to stop it (save just never rendering a pixel!).
 
Dave Baumann said:
Its not a "choice" - Xenos will always use eDRAM and there is nothing you can do to stop it (save just never rendering a pixel!).
How efficient is that compared to PS3 that has separate bandwidth for CPU and GPU?
 
Back
Top