"PS3 & Xbox360 in the Real World"

Essentially that massive TLP is about the only thing that you can do to squeeze power out of the machines, and that's still very much an unknown area in gaming. For that matter, there's an extent to which you can't really do it simply because games are so linearly dependent (i.e. order in which things happen matters). How to solve that will be a huge problem for the long run, but it's a necessary problem to solve considering everybody's going multi-core. In the end, they're within "spitting distance" as you put it, in terms of actual CPU and GPU power.

BTW, When referring to the "wait to be seen" aspects of Xenon, without giving anything specific away, I ought to make it clear that the beta kits really don't deliver on a lot of the things that were originally promised, and it's yet to be seen how big of a difference the final hardware will have.

While we are signed on to do a PS3 project, that was decided not more than 2 weeks ago... so no hardware yet. Being that we are a multi-SKU developer and we don't license our tech, our core tech circumvents serious incompatibilities between platforms (within bounds). If you want to see something that really shows the power of an individual platform, you're going to have to look to the 1st party exclusives... more or less.

I also have to agree with XBD on one thing. If I were designing a new CPU, I'd probably do something like what 360 has, if for no other reason than the fact that a CELL-like design wouldn't be the first thing to pop into my head. Also, if I were doing a CELL-like design, I'd at least have chosen to make the SPEs scalar devices rather than SIMD. SIMD really doesn't see that much utilization. And I would have thrown at least 4-way SMT into each of them just so you can fill in some latencies with more TLP.
 
my eyes hurt after reading the whole article...

but suffice to say it was a good read :)

not very technical but it gets the job done for layman like myself.

nothing new to contribute to what was already been posted/argued over in this particular forum.

but i think it's good read for people who are just coming into the console scene... forum-wise.

i'm pretty sure some people will refute and bash the said article...

but i'm quite glad that i read it.


-just my two cents :) -
 
ShootMyMonkey said:
PS3 is seemingly better suited to micro-thread type tasks where the operations themselves are small, but there are many of them that are completely independent.
I've heard the theory before but I'm not sure that the best use of the SPEs is to feed them a bunch of threads is right. To maximize performance it seems like you'd want the PPE & SPEs to not have to hassle with context switches (which seem like they'd be especially painful), you want them to be single-purposed and crunching data. That's where we see the big numbers generated, like with that ray tracing article someone posted. Of course it could be that to find a use for them in a gaming context you have to accept the hit in efficiency in exchange for getting them to do something useful.

Overall what I read seemed like a pretty good article, it's not an obvious puff piece anyway and that should always be encouraged. :)

The one thing you might want to look at that stands out is the memory latency numbers for a cache miss, I thought ERP said they were both ~500 cycles. I'll see if I can search it up...

http://www.beyond3d.com/forum/showpost.php?p=499275&postcount=7

In the context of the thread and then a later DeanoC comment it seems pretty obvious you shouldn't expect much better numbers from the PS3.
 
Data dating back to 2003 regarding XDR DRAMs at 250 MHz suggested latencies getting as high ~310 CPU clock cycles (for a 3.2 GHz CPU, not including cache latency). Since PS3 uses 400 MHz DRAMs (3.2 GHz signaling rate), it will probably fare a little better.
It would be better if we knew his source but... Anyone have a link to a XDR benchmark?
 
Guden Oden said:
Of course it can, don't be rediculous.

No, it can't. Cell can access the XDR and RSX, but it cannot access the GDDR-3.


The two chips also share 20GB/s write and 15GB/s read performance (seen from perspective of cell) buses between the two of them. I don't see why this bandwidth should be ignored when clearly it is required to take advantage of both main memory and video memory pools to reach maximum utilization of system resources.

Tell me, would you also then include the 21.6 GB/sec bandwidth between the 360's CPU and GPU? Also, we should include the additional 32 GB/sec bandwidth between the GPU parent die and eDRAM, since those are also required to take advantage of all memory pools to reach maximum utilization, correct? While we are at it, shouldn't we also include the 1 GB/sec bandwidth to the Southbridge controller?

Oh, and since we are being technical, the 256 GB/sec bandwidth is between the eDRAM and the Logic Controller in the daughter die of the 360's GPU, and that is technically a point to point data transfer, is it not?


This type of math is silly, and just plain wrong. You don't just go adding bus bandwidths together and comparing totals. It's bad, bad, bad.


What do you know about "disturbing"? This is completely your own fabrication.

You might as well say PC graphics cards have "disturbing" latency when accessing main RAM in AMD64 systems too, since they have to go through the CPU's memory controller in that case also. In practice though, AMD64 systems usually lead in games performance compared to systems that use the traditional architecture of a memory controller on the same chip as the graphics bus controller, so obviously there's not neccessarily any inherent problems with this design setup.

If you wish to claim the situation is otherwise for PS3, you better show some cold hard facts to back up your position...

Physics.

It takes longer to do a 2 point data transfer than it does a single point transfer. In otherwords, it takes less time to go from GPU to local RAM and back than it does to go from the GPU to the Hypertransport chipset to the CPU to the system RAM, back to the CPU, back to the Hypertransport chipset, back to the GPU, using your example.

Every hop in that process introduces an additional 30-80 nanoseconds of delay.

Now, if you are talking about Nvidia's Turbocache feature, it should be noted that part of the reason it appears to work so well is because they slowed the video card down to mask the latency. They extended the internal hardware pipelines in the GPU so that it would lag as much as Turbocache.


It should also be noted that Nvidia is only achieving about 4GB/sec bandwidth on PCI-Express X16 using Turbocache, and it is limited to only 112MB of accessable RAM (In systems wth 512MB of System Memory, which is twice the amount of the PS3). Since they are using similar hardware on the RSX, one should not assume the GPU can access the full bandwidth of the XDR memory, nor the full 256MB of XDR, as it is also likely limited in both.

Perhaps. Or perhaps it's just your own misconceptions and misinterpretations that made you think it is. :devilish:

Perhaps, but doubtful.
 
Last edited by a moderator:
It would be better if we knew his source but... Anyone have a link to a XDR benchmark?
It was a Toshiba thing about them sampling 250 MHz XDR DRAMs, and it said something about total latencies in the 95-100 ns range (DRAM alone). When I said "not including cache latency," I probably should have said it was for the DRAM only. Doesn't include latencies local to the CPU. It also assumes that the latencies for higher clocked XDR would be the same in DRAM cycles, which is really not the safest assumption to make, but it's the best I can do without hardware.

The one thing you might want to look at that stands out is the memory latency numbers for a cache miss, I thought ERP said they were both ~500 cycles.
That doesn't sound too surprising. I expected only a little bit better at most. 525 seems to be the typical number for X360. I figured that even if it was a fair bit better, you've got 8 cores instead of 3 contending for that one bus, so it still won't be anything great in practice.
 
Powderkeg said:
No, it can't. Cell can access the XDR and RSX, but it cannot access the GDDR-3.

How do you mean, exactly?

The CPU must be able to read from or write to the GDDR3. Do you think CPUs in PCs can't access memory on GPUs?

Both RSX and Cell can read from and write to anywhere in system memory (XDR+VRAM). They may go through each other's memory controllers do so, but that's a given. It's no different than the XeCPU accessing X360's main memory through Xenos's memory controller.

dukmahsik said:
wow so whats the hoopla about other devs/sony saying ps3 is 2-3x as powerful?

I don't remember any dev saying that (at least not in the general case), but some devs apparently see a more significant difference. But like everything you'll have varying opinions, especially when so many still don't have (PS3) hardware.
 
Last edited by a moderator:
Titanio said:
How do you mean, exactly?

The CPU must be able to read from or write to the GDDR3. Do you think CPUs in PCs can't access memory on GPUs?

Both RSX and Cell can read from and write to anywhere in system memory (XDR+VRAM). They may go through each other's memory controllers do so, but that's a given. It's no different than the XeCPU accessing X360's main memory through Xenos's memory controller.

You left out a step, and it's the one major difference between the PS3 and Xbox 360.

In order for Cell to access the GDDR-3, it has to go through it's own memory controller, to the RSX memory controller, and then through that to the GDDR-3. It cannot access it itself. Same for the RSX accessing the XDR. It goes through it's own memory controller, then through Cell, then to the XDR. It's a 3 step process either way. (6 to actually receive the data)

And this isn't including the extra clock cycles that the memory controllers themselves require.

The Xbox 360 GPU includes the Memory Controller for the entire system, so the CPU goes to the Memory Controller to Ram. It's only a 2 step process (4 to receive data), which results in lower latency data transfers.
 
Last edited by a moderator:
Powderkeg said:
The Xbox 360 GPU includes the Memory Controller for the entire system, so the CPU goes to the Memory Controller to Ram. It's only a 2 step process (4 to receive data), which results in lower latency data transfers.

The last part about 360 gets lower latency due to a 4 step process is true, you forgot to mension :

1. The added latency of its memory type anyway.
2. That PS3's XDR bandwidth will more than make up for the extra time required for cell to acess the VRAM
 
scatteh316 said:
The last part about 360 gets lower latency due to a 4 step process is true, you forgot to mension :

1. The added latency of its memory type anyway.
2. That PS3's XDR bandwidth will more than make up for the extra time required for cell to acess the VRAM

Bandwidth doesn't make up for latency and vise versa.
 
Powderkeg said:
In order for Cell to access the GDDR-3
...Which you just claimed it couldn't do, period... :rolleyes:

it has to go through it's own memory controller, to the RSX memory controller, and then through that to the GDDR-3.
Where did you dream this up? No it doesn't!

The RSX isn't even physically attached to the memory controller, it sits on the flexIO interface, physically on the opposite side of the chip compared to the DRDRAM controller.

Same for the RSX accessing the XDR. It goes through it's own memory controller, then through Cell, then to the XDR.
What a bunch of fairytales! I don't know where you cooked up this load of hooey, but none of it is true. First of all, when you access off-chip memory, why on EARTH would you have to go through the on-chip memory controller? The on-chip controller only accesses THAT chip's memory! What you're describing is completely nutty and WRONG.

And don't say "that's the way they designed the hardware" or some bull like that, because they DIDN'T. You're making stuff up. Off-chip memory accesses in PS3 go through the bus unit connected to the other chip and to that other chip's memory controller. Same as xcpu<->xenos. And there's no limit on which chip can read the other chip's memory either, we're not in the dark ages of computing here you know.
 
Guden Oden said:
...Which you just claimed it couldn't do, period... :rolleyes:

Where did you dream this up? No it doesn't!

The RSX isn't even physically attached to the memory controller, it sits on the flexIO interface, physically on the opposite side of the chip compared to the DRDRAM controller.
I think what's he's saying is it can't directly access the XDR but has to go through the EIB which is attached to the XDR memory controller. The RSX has to have some sort of memory interface to attach to the EIB otherwise the two memory pools couldn't be used by both devices. What you end up with is more latency than if you were going to the "native" memory pool, the thread I referenced earlier had some of the big boys talking about it and it did sound like latency would be an issue that developers had to work around (edit: if they want to use the XDR for GPU stuff).
 
Last edited by a moderator:
Personally... I do not see the Cell storing and retrieving information from the RSX's memory pool (GDDR3) and there is only a limited number of things the RSX can store and retreve on the Cell's memory pool (XDR)... aside from the latency issues... think about this for a moment.

Do you REALLY want the Cell to be accessing information ACROSS the GPU and it's memory pool while the game is active and graphics is being processed? Do you realize how much memory bandwidth the Cell CPU eats? I would image games would lag HORRIBLY if you was to store or retrieve information to the RSX's memory pools while you are actually playing the game. The same deal would apply the other way... normally the RSX would be limited to using the system memory for textures (think of this like AGP/PCI texturing), but what if we was to use the Cell's memory pool for video information? Do you realize how much memory bandwidth on the Cell's side that will eat? I would imagine instead of graphics lag you would get instead processing lag of the actual game as you are eating a lot of system memory bandwidth. Regardless... even if it was feasible to implement such a thing in the PS3 the disadvantages of doing it would be significant... and I would just as well just leave the two memory pools be.

I personally think some of these comments from Sony is hyperbole... especially in regards to their comments that the Cell/RSX can access each other's memory pool's (which it can, but not in the context they use and it far more limited) and in regards to the Cell/RSX sharing workloads and working together on tasks (which it can't).

I am still trying to find the PS3's version of MEMEXPORT.
 
Actually I can see a lot of reasons for Cell to write in RSX's GDDR3 pool - particles, bezier patches, instance rendering etc etc. All the good stuff Cell is supposed to excel at.
Now, all 'static' stuff like static geometry, textures etc, should be uploaded once into VRAM and stay there.
On the other hand I don't see much need for RSX to read/write into XDR - not to mention I'm pretty sure the Cell will max up the bandwidth as soon as you get all SPUs churning data.
PS3 is clearly non UMA system, I don't know why people try so hard to use it as such.
 
I apologise to this forum, but this character as it coming for alooong time. The_game_master.
The GameMaster said:
Personally... I do not see the Cell storing and retrieving information from the RSX's memory pool (GDDR3) and there is only a limited number of things the RSX can store and retreve on the Cell's memory pool (XDR)... aside from the latency issues... think about this for a moment.

Do you REALLY want the Cell to be accessing information ACROSS the GPU and it's memory pool while the game is active and graphics is being processed? Do you realize how much memory bandwidth the Cell CPU eats? I would image games would lag HORRIBLY if you was to store or retrieve information to the RSX's memory pools while you are actually playing the game. The same deal would apply the other way... normally the RSX would be limited to using the system memory for textures (think of this like AGP/PCI texturing), but what if we was to use the Cell's memory pool for video information? Do you realize how much memory bandwidth on the Cell's side that will eat? I would imagine instead of graphics lag you would get instead processing lag of the actual game as you are eating a lot of system memory bandwidth. Regardless... even if it was feasible to implement such a thing in the PS3 the disadvantages of doing it would be significant... and I would just as well just leave the two memory pools be.

I personally think some of these comments from Sony is hyperbole... especially in regards to their comments that the Cell/RSX can access each other's memory pool's (which it can, but not in the context they use and it far more limited) and in regards to the Cell/RSX sharing workloads and working together on tasks (which it can't).

I am still trying to find the PS3's version of MEMEXPORT.

i understand why you talk like this in teamxbox.com, because nobody there understands things and you just have to join the words "Ps3... inefficient... no efective..." in the same post to have them look at you as a God because you say the things they say (x360 >ps3 no matter what) with technical talk, ergo, must be true for them.

The thing is, its beyond3d right here. That flawded talk doesn't pick fans here and i pitty those who keep staying in the dark with your talk at teamxbox. The positive things about some competition console, is no good to talk about, right? that means no respect over there.

With this guy, Every good thing on the little known Ps3 architecture is a Flaw compared to X360.
The only difference in your speach here is that you keep 360 out of the deal so nobody can suspect.
And then somebody with brains at teamxbox points out to this forum links to say otherwise, and you come up with a re-formulated 500 word essay (that you wrote hundreds of times already in there across endeless topics about the "good things in ps3"), on how about Rsx is inifficient and slow because its based on the G70 (which happens to be the Top gpu today), and you keep the trash talk how about it never will never hit 550mhz because you say so, even though its made at 90nm and that fact you never talk about.

After that Essay full with spectulation and omiting most of the good things, you go and talk Trash about this Article ( http://forum.teamxbox.com/showthread.php?t=364632 ) made by someone way better placed than you are at videogames, and you keep talking how the author must be crazy to talk stuff when RSX isn't even out yet...
....This being said, let me remind again about your 500 word essay of the same speculation (but way, way more negative) about how RSX is gonna suck with its huge limitations and never gonna reach 550mhz because Xenon is only 500 and that is no go for teamxbox, no sir. It must be Equal or worst than Ps3 by all means. 550 is impossible for rsx, but 500 is super doable in xenon, no doubt about it.
Oh, i forgot to mention that he says rsx is gonna be 420mhz FOR SURE, even though its a 90nm chip with all the advantages in thermaldinamics that it has.

I'm with high hopes that someone at teamxbox catches this post here... maybe you can start to give them things as they really are about the goods things of the opposition.

onde again, i apologize for this uncalled post. I'm new here, but i was cought in surprise when i saw this guy bringing his shortsight view about the other guys console in this respectable forum.

and yes, NUMA is a very very good solution to bring down the tipical bandwidth usage between Cpu and Gpu (such as a back buffer in a Gpu does) because in this case it keeps the cpu away from using bandwidth by not going to the Ram pool in the other side. It as his own memory to use without touching the main channel (so this one it can be free to transport the important stuff).
About the cpu and Gpu accessing/writing each others ram, no harm here. GPU couldn't care less about latency, and the GPu writting in cell ram i see no need for that.
 
Last edited by a moderator:
I don't believe the cell is going to use the gddr . I can use it and prob will push generated textures to it

But for its main data it will send it to the xdr ram. Its my understanding that the cell loves low latancy ram and that is certianly what the xdr is (or should be from my understanding ) It also requires less hops . I belive the cell has to go through the rsx to reach the gddr ram which will increase latancy .

The rsx however i believe will acess the remaining xdr ram (which will vary depending on game ) and the gddr ram .

The main advantage i see on the xenos is its 4x fssa + hdr . The ps3 will have its higher fillrate and more powerfull cpu .

My real question are the compressions avalible to both. I remember reading somewhere that the xbox 360 has 24-32 diffrent compression schemes and i wonder how that will affect things .
 
dskneo said:
I apologise to this forum, but this character as it coming for alooong time. The_game_master.


i understand why you talk like this in teamxbox.com, because nobody there understands things and you just have to join the words "Ps3... inefficient... no efective..." in the same post to have them look at you as a God because you say the things they say (x360 >ps3 no matter what) with technical talk, ergo, must be true for them.

The thing is, its beyond3d right here. That flawded talk doesn't pick fans here and i pitty those who keep staying in the dark with your talk at teamxbox. The positive things about some competition console, is no good to talk about, right? that means no respect over there.

With this guy, Every good thing on the little known Ps3 architecture is a Flaw compared to X360.
The only difference in your speach here is that you keep 360 out of the deal so nobody can suspect.
And then somebody with brains at teamxbox points out to this forum links to say otherwise, and you come up with a re-formulated 500 word essay (that you wrote hundreds of times already in there across endeless topics about the "good things in ps3"), on how about Rsx is inifficient and slow because its based on the G70 (which happens to be the Top gpu today), and you keep the trash talk how about it never will never hit 550mhz because you say so, even though its made at 90nm and that fact you never talk about.

After that Essay full with spectulation and omiting most of the good things, you go and talk Trash about this Article ( http://forum.teamxbox.com/showthread.php?t=364632 ) made by someone way better placed than you are at videogames, and you keep talking how the author must be crazy to talk stuff when RSX isn't even out yet...
....This being said, let me remind again about your 500 word essay of the same speculation (but way, way more negative) about how RSX is gonna suck with its huge limitations and never gonna reach 550mhz because Xenon is only 500 and that is no go for teamxbox, no sir. It must be Equal or worst than Ps3 by all means. 550 is impossible for rsx, but 500 is super doable in xenon, no doubt about it.
Oh, i forgot to mention that he says rsx is gonna be 420mhz FOR SURE, even though its a 90nm chip with all the advantages in thermaldinamics that it has.

I'm with high hopes that someone at teamxbox catches this post here... maybe you can start to give them things as they really are about the goods things of the opposition.

onde again, i apologize for this uncalled post. I'm new here, but i was cought in surprise when i saw this guy bringing his shortsight view about the other guys console in this respectable forum.

and yes, NUMA is a very very good solution to bring down the tipical bandwidth usage between Cpu and Gpu (such as a back buffer in a Gpu does) because in this case it keeps the cpu away from using bandwidth by not going to the Ram pool in the other side. It as his own memory to use without touching the main channel (so this one it can be free to transport the important stuff).
About the cpu and Gpu accessing/writing each others ram, no harm here. GPU couldn't care less about latency, and the GPu writting in cell ram i see no need for that.


What?
 
Back
Top