Observations, thoughts and questions about X360 and PS3

BlueTsunami said:
:LOL: It sucks. I know. I also hate that I can read through your post...some or alot may not be right (as DeanoC stated, alot) and not be corrected. :(

Can DeanoC be a little more specific about what he was referring to? The post immediately previous, or some posts, or all? :p

MPR = Microprocessor Report, I think
 
rendezvous said:
I knew i shouldn't trust MPR. :cry:

No to be fair, the bits from MPR are largely right...

But really you don't know what version of Cell is being used, what RSX is and how Cell <-> RSX works. How likely are you to be right?

Trying to estimate things like bandwidth, with the information you have currently is futile.
 
DeanoC said:
No to be fair, the bits from MPR are largely right...

But really you don't know what version of Cell is being used, what RSX is and how Cell <-> RSX works. How likely are you to be right?

Trying to estimate things like bandwidth, with the information you have currently is futile.

This means...give up for now right? ...I'm so impatient though. I didn't mean any harm. Just wanted to talk about things a little. Bummer...I tried hard too.

I gather Cell has changed since DD2 was done maybe? I know you can't say.

Thanks for even bothering to look in the first place though :)

edit:
Can't seem to find a way to lock this so if a mod sees it just delete it or something.
 
Last edited by a moderator:
scificube said:
Cell must have it’s own memory controller to access the XDR ram it uses for main ram. RSX must have it’s own memory controller to access it’s pool of GDDR3. What is interesting is that RSX also has access to the XDR ram in the system. This allows RSX to access up 512 MB of Ram minus whatever Cell consumes which makes perfect sense. (no different than what Xenon consumes of the GDDR3 in X360) RSX has 22.4GB/s bandwidth to its pool of GDDR3. RSX can write 20GB/s to Cell and read 15GB/s from Cell. Cell does not have access to the GDDR3 in the system. What is most significant is that via Cell’s memory controller RSX had an additional 25.6GB/s read/write bandwidth with the XDR ram in the system. This provides RSX with 48GB/s to read/write from memory in the system on top of the read/write bandwidth between it and Cell.

I dont understand how you can calculate the bandwidth for RSX to 48GB by adding the VRAM+XDRAM togheter.

As i se it as the bandwidth of VRAM is underwhelming you could "lock" a certain portion of the XDR memory and read from there as your framebuffer but as i understand you cant get more then the write(15GB) that the flexIO supports and you would still need to let the Cpu have this BW shared also(if my understanding of the bus i right of course) .

So in a game where the dev would need around 30GB of bandwidth you still have around 5-7 GB for Cell "info" left to be sent to the RSX. Thats how i have understand the hardware atleast, if some knows more well then im wrong.
 
Titanio said:
edit - Here's a Kutaragi quote re. Cell accessing GDDR3:

"CELL and RSX have close relationship and both can access the main memory and the VRAM transparently. CELL can access the VRAM just like the main memory, and RSX can use the main memory as a frame buffer. They are just separated for the main usage, and do not really have distinction."
Let's hope Kutaragi is right :cool:
 
overclocked said:
I dont understand how you can calculate the bandwidth for RSX to 48GB by adding the VRAM+XDRAM togheter.

As i se it as the bandwidth of VRAM is underwhelming you could "lock" a certain portion of the XDR memory and read from there as your framebuffer but as i understand you cant get more then the write(15GB) that the flexIO supports and you would still need to let the Cpu have this BW shared also(if my understanding of the bus i right of course) .

So in a game where the dev would need around 30GB of bandwidth you still have around 5-7 GB for Cell "info" left to be sent to the RSX. Thats how i have understand the hardware atleast, if some knows more well then im wrong.

I made a bad assumption. I think you're right. I was thinking about how fast Cell could access XDR when I should of been thinking about how fast the mem controller could pipe data to RSX. It has also been pointed out that a 7800 can gobble up 38GB/s of bandwith. In retrospect it would appear RSX has a good chance of being hungry for bandwith too.

I seriously want to scrap this thread and never show my face again. I'm disgusted with myself.
 
scificube said:
I seriously want to scrap this thread and never show my face again. I'm disgusted with myself.

Geez scifi, relax, it's been a good thread. I've liked it anyway :)

nAo said:
Let's hope Kutaragi is right

I guess this ain't how it's working in dev kits right now, but..yeah, I hope he's right when it comes to the final system!

edit - about cell<->rsx, doesn't rsx write to cell at 15GB/s and read from it at 20GB/s? That's enough to allow it to consume XDR's bandwidth entirely if it wanted (XDR is 12.8GB/s read, 12.8GB/s write), and still leave some cell<->rsx bandwidth for cell to read and write directly from chip to chip (7.2GB/s to RSX and 2.2GB/s to read from it) , but of course, you'd have no xdr bandwidth left over for cell ;) So assuming no Cell bandwidth usage, you could say RSX had 48GB/s to use. In reality it will be "up to" that figure depending on CPU usage and how much cpu<->gpu bandwidth you require for things other than direct memory transactions....?
 
Last edited by a moderator:
DeanoC said:
No to be fair, the bits from MPR are largely right...

But really you don't know what version of Cell is being used, what RSX is and how Cell <-> RSX works. How likely are you to be right?

Trying to estimate things like bandwidth, with the information you have currently is futile.

So MPR is largely right.
It is somewhat in line with what you wrote in your presentation, and both ways seem to make sense.
I just hope you would be more clear ;) but I respect your NDA and time will tell eventually anyway.

And no, I don't really know what version of Cell is being used, for all I know it could be a totally revamped design with PPE's featuring out of order execution, andvanced tournament branch predictiors and an espresso maker.
What we mortals have to go on is what is publicly available on the current generations of Cell which isn't much on the PPE.
As for the RSX I think we have even less info, which is why I refrain from speculate on it and how it works together with CELL.

I appreciate that you say if something is wrong. I want nothing more than to (see a) decrease the amount of wrong information.
 
scificube said:
I seriously want to scrap this thread and never show my face again. I'm disgusted with myself.

Theres nothing wrong with the thread or the speculation that goes with it, its an interesting read. As long as everybody knows the limits, some people unfortately assume that they know more than the do, just don't fall into that trap and you'll be fine...


Deano
 
Gotta hand it to you fellas a B3D. You're a good bunch.

Titanio:

Just checked Ign (again) and they have it 20GB/s read from Cell and 15GB/s write to Cell for RSX.
 
Last edited by a moderator:
scificube said:
Just checked Ign (again) and they have it 20GB/s read from Cell and 15GB/s write to Cell for RSX.

Yeah, I checked against the conference vid. It's biased toward the GPU getting data, which seems logical.

ps35xx.jpg


I could be wrong, but RSX could saturate the XDR bandwidth if it really wanted (i.e. 48GB/s to itself), and still leave some flexio over for Cell. Of course, that leaves it with no main memory bandwidth of its own.

One question I have: can the SPUs load data and/or SPU code directly off the southbridge i.e. a disc or HD or camera or whatever, without that data/code having to pass through XDR? Can you treat them as their own little computers with their own RAM, and a connection to the southbridge? Of course, latency would be through the roof. That could save a little memory bandwidth in certain instances (starting up, where all data has to come off the southbridge anyway, perhaps?), theoretically, although I'm not sure how practical it'd be.

edit - thinking about it further, the savings would be MEASLY. I guess it's just a theoretical point ;)
 
Last edited by a moderator:
one said:
Cell can handle 9 HW threads and Xenon can handle 6 HW threads
To be clear on what this really means though, TTBOMK XeCPU can have 3 threads actually executing at any one time and Cell can have 8. The hardware threads on XeCPU and PPE are an optimization for context switches on stalls or task-switching.

This is on the understanding the functional units on a core cannot be shared between threads concurrently where resources are left going unused, such as thread 2 using the VMX while thread 1 is using the core's integer ALU, which I think is the case but am hazy on. Even if it's the case XeCPU can share such resources, that'll only be part time when they're free. In terms of program threads in execution at a given clock cycle then, it's 3 for XeCPU, 8 for Cell.
 
Shifty Geezer said:
In terms of program threads in execution at a given clock cycle then, it's 3 for XeCPU, 8 for Cell.
Thanks, I posted without reading most of scificube wrote so it seems my reckless post was out of context. Anyway superscalar PPE has some hardware resources to support 2-way fine-grained MT if not full execution resource for true SMT, therefore they can't be "logical" or software threads, as described in MPR article rendezvous probably indicated (sorry if it's a different article)
The Cell Power core has hardware fine-grain multithreading. The multithreading design supports fine-grained multithreading with round-robin thread scheduling. If both threads are active, the processor will fetch an instruction from each thread in turn. When one thread cannot issue a new instruction or is not active, the other active thread will be allowed to issue an instruction every cycle. Threading does add some burden to the die size (around 7% in this case), as there must be duplicated register files, program counters, and parallel instruction buffers (before the decode stage).
But I'm still wondering about what Crytek guy said about the difference between PPE threading and Xbox 360 CPU threading.
 
When you guys say that current PC GPUs have ~40GB/s of bandwidth, what does that mean? Bandwidth to what? I was under the impression that the CPU<=>GPU bandwidth in the next-generation consoles was streets ahead of current PCs.
 
Titanio said:
Geez scifi, relax, it's been a good thread. I've liked it anyway :)

edit - about cell<->rsx, doesn't rsx write to cell at 15GB/s and read from it at 20GB/s? That's enough to allow it to consume XDR's bandwidth entirely if it wanted (XDR is 12.8GB/s read, 12.8GB/s write), and still leave some cell<->rsx bandwidth for cell to read and write directly from chip to chip (7.2GB/s to RSX and 2.2GB/s to read from it) , but of course, you'd have no xdr bandwidth left over for cell ;) So assuming no Cell bandwidth usage, you could say RSX had 48GB/s to use. In reality it will be "up to" that figure depending on CPU usage and how much cpu<->gpu bandwidth you require for things other than direct memory transactions....?

I was also wrong with that one, its 15GB write and 20GB read for RSX.
Heh i thought the XDR was 25,6GB read that changes the picture i guess, in a good case you could have 15GB for framebuffer and 5Gigs left for cell to send over the 20GB FlexIO, but thats wrong then(IF 25,6GB).

Edit

Are you sure about the 12,8GB read/write for XDR cause on the diagram you showed the bandwidth to/from RSX is on two arrows while the XDR inteface shows 25,6GB both ways..
 
Last edited by a moderator:
Gholbine said:
When you guys say that current PC GPUs have ~40GB/s of bandwidth, what does that mean? Bandwidth to what? I was under the impression that the CPU<=>GPU bandwidth in the next-generation consoles was streets ahead of current PCs.

Bandwith between GPU-GDDR3 alone.
 
one said:
...therefore they can't be "logical" or software threads, as described in MPR article rendezvous probably indicated...

If both threads are active, the processor will fetch an instruction from each thread in turn.
Wow, I ought to read that article. This explains Deano's comments on PPE being used as a dual 1.6 GHz core, as if you have both threads active then they're interleaved. Have I missed the same sort of details on XeCPU's cores as well? I checked the Ars explanation that only seemed to confirm the second thread remains inactive unless the first becomes inactive, but I'm not absolutely sure that's the case.
 
overclocked said:
Are you sure about the 12,8GB read/write for XDR cause on the diagram you showed the bandwidth to/from RSX is on two arrows while the XDR inteface shows 25,6GB both ways..

I think it only shows the XDR bandwidth on one, bi-directional arrow, because that figure is split evenly going up and down. If it was 25.6GB/s both ways, we'd be talking about 51.2GB/s of bandwidth to XDR ;)

And that is an interesting quote from MPR, I had missed that too. So basically if a thread is blocked, it's like a 3.2Ghz PPE, but if both threads are not blocked, it's like 2 1.6GHz PPEs? Interesting..

In reality, the split would be arbitrary I guess, depending on the blocking behaviour of the threads. One might get 2.5 billion cycles, the other 0.7billion etc. etc.

That might also explain the Crytek guy's comments if Xenon is different in this respect?
 
Last edited by a moderator:
Back
Top