Observations, thoughts and questions about X360 and PS3

scificube · Sep 30, 2005

I have taken many of observations concerning the PS3 and X360 and have come to form some opinions about what I see. I would like to put out what Iâ€™m seeing and see if I am at least justified in my thinking and whether or not I need to consider other things. I will warn those who would get upset about it. Yes, I do see allot of advantages for the PS3. I expect this may rub some people the wrong way so I invite you all to give me your perspective on things. Itâ€™s not like Iâ€™m going to disagree with facts or I have my mind made up. It is the primary purpose of this thread to see if have framed things correctly and secondarily to see if any interesting things should pop up during the discussion if there is anyâ€¦I hope so, I really do.

Ohâ€¦I am forced to apologize for how long this post is. Most of it is stuff one may already know so again I apologize in advance for that. I felt it better though to explain well where I was coming from rather than simply spouting off a string of declarative statements. For one, I am not so brash as to think I am an authority on anything so making decelerations just seems something wrong for me to do. The other issue is if I my thinking is indeed off, I need to have my thinking out there so people can see where exactly it went bad and correct me at right spot so that I can understand better what theyâ€™re saying.

Next-Gen CPUs:

What I would like to note is that Xenon has 3 cores while Cell has 8 active cores. What I have begun to focus on is that this means Xenon can handle 3 HW threads while Cell can handle 8 HW threads. I think this is important when thinking about execution resources. Logical threads on a core end up fighting for execution resources while HW threads do not. Xenonâ€™s cores all can handle 2 logical threads allowing Xenon to handle 6 threads at a time. Cellâ€™s PPE supports 2 logical threads but itâ€™s seven SPUs support only one HW thread. As I donâ€™t know the PPC equivalent I am forced to describe Xenonâ€™s cores and Cellâ€™s PPE as being hyper-threaded. The classification would seem to fit in my mind as well. This seems a very significant thing to me when comparing Xenon and Cell. What I gather is that all 6 threads on Xenon if used have to fight for execution resources in that if a resource is not free that particular thread must wait or be swapped out (unlikely). However 7 of the 9 threads Cell can handle have full reign over the resources on a core. (threads on the SPUs). I feel this may be the most or at least one of the most significant observations we can draw about these two CPUs. Threads on Cell have more resource of various types available to them. There are two things I gather from this. Threads with more resources available to them can get more work done. Threads that donâ€™t have to wait for resources to free up can get more work done.

Critical thoughts:
Threads on Cell can better leverage the power of the silicon available because of the structure of the HW. There is also more silicon to take advantage of.

Pitfalls:
1. One would note that if 2 threads are executing on a core but are demand different resources in the pipelines that this increases efficiency not decreases it.
2. SPUs in Cell can only task on execution element at a time putting them at a disadvantage.

I agree. In this situation the logical threads can be considered to be just as efficient as HW threads doing the same thing. However, one must realize that the situation will arise quite often when this is not the case.

SPUs donâ€™t need to do more than one thing at a time. Threads on SPUs have the whole core to themselves so while one is doing its thing there are six others doing their thing. With Xenon one can only guarantee 3 threads are executing where 3 more can be executing if resources are available to do work. (Work != threads being still being presentâ€¦the threads arenâ€™t going anywhere)

Real world basis:
X2 vs. P4 with HT. The X2 routinely wins out because it has 2 actual cores that can give 2 threads more execution units to work with where, as the P4 with HT does not and thus cannot. Something to note is that HT on the average only provides a 10-20% bump in speed where it would not seem uncommon for the X2 to have a 100% speed bump.

Bandwidth:

There is a surprising disparity here!

Before I go on I need to caution that I am unsure as to whether RSX has access to the PS3â€™s XDR. I know Iâ€™ve read it a bunch of times but I canâ€™t recall where I saw it officially or at least in solid form such as a technical document. If this is not the case than most of what I say next is made null and void. I am confident however this is the case as it makes good sense.

Xenos serves as the memory controller in the X360. Xenos can read/write 22.4GB/s from the GDDR3 in the system and can read/write 10.8GB/s from Xenon. Iâ€™ve seen no mention yet of a south bridge so I will assume for now that I/O for system components eats into the bandwidth the GDDR3 provides. 2GB/s or less seems reasonable.

Cell must have itâ€™s own memory controller to access the XDR ram it uses for main ram. RSX must have itâ€™s own memory controller to access itâ€™s pool of GDDR3. What is interesting is that RSX also has access to the XDR ram in the system. This allows RSX to access up 512 MB of Ram minus whatever Cell consumes which makes perfect sense. (no different than what Xenon consumes of the GDDR3 in X360) RSX has 22.4GB/s bandwidth to its pool of GDDR3. RSX can write 20GB/s to Cell and read 15GB/s from Cell. Cell does not have access to the GDDR3 in the system. What is most significant is that via Cellâ€™s memory controller RSX had an additional 25.6GB/s read/write bandwidth with the XDR ram in the system. This provides RSX with 48GB/s to read/write from memory in the system on top of the read/write bandwidth between it and Cell.

It is not difficult to see how this is possible when looking how two Cell chips would communicate via a crossbar. The crossbar makes possible the communication between Cells while Flexio handle communicating with the XDR Cell uses. I suspect RSX merely sits on the other side of one of the crossbars in the PS3â€¦and now we know just how much Cell chips data Cell chips can communicate between each other in such a setup by looking at the link between Cell and RSX. Also there appears to be a south bridge in the PS3 that will keep I/O from other system components from stealing band with from Cell and RSX.

Critical thoughts:
Where Xenos would appear to be headed for bandwidth limited days in having less bandwidth available to it than a PC GPU, RSX on the other hand has more bandwidth available to it than any GPU part seen to dateâ€¦at least by me. The only reason I donâ€™t classify this as too much bandwidth is that RSX is clocked at 550MHz so in the end that extras bandwidth should come in handy in trying to feed RSX. (and I've heard there's no such thing as too much bandwidth) Looks like a good fit situation where RSX isnâ€™t cruising towards being bandwidth limited but doesnâ€™t have bandwidth to spare either.

Pitfalls:
MS says its directx compression will give them 50% more bandwidth to work with. I see this statement a PR move until proven otherwise. 50% more bandwidth due to what being compressed when and where will un-compressed data be stored and wonâ€™t storing that eat up bandwidth just the same? Is this compression HW accelerated? DX calls eat up CPU time not bandwidth with its overhead. Seems bogus until someone explains to me how this could be possible.

The intelligent memory of Xenos removes a good amount of the bandwidth load from the main ram by containing the frame buffer in it and not there. However Xenos still needs to get textures from main ram and â€œtilesâ€ from/for the frame buffer when rendering to 1080i. Textures should be pretty large and larger if things are rendered to 1080i and not scaled to 1080i. These thought are what make me think Xenos â€œcouldâ€ be bandwidth limited. Xenos is the X360â€™s memory controller forcing Xenos to share the bandwidth from the GDDR3 at all times. This moves me from â€œcouldâ€ be bandwidth limited to thinking that it is probably the case especially when rendering to 1080i.

RSX will not have constant access to 48GB/s of bandwidth. In using Cellâ€™s memory controller to get to the XDR pool it is actually consuming bandwidth Cell could use. There is an interesting possibility here though. One idea is to feed Cell with data from RSX (itâ€™s pipes) while RSX is accessing XDR ram. This would ensure Cell doesnâ€™t completely starve for data to work on during this interval. Just an idea.

Something to keep in mind is that RSX has 22.4GB/s bandwidth guaranteed and can variably pull from another 25.6GB/s as needed or when possible. Xenos is always sharing its 22.4GB/s bandwidth to memory with Xenon despite how the daughter die eases things.

CPU-GPU relations:

Xenos can send/receive data at 10.8GB/s to/from Xenon. They will communicate primarily to aid each other in the task of rendering.

RSX can send data to Cell at 20GB/s and receive data from Cell at 15GB/s. They will also work together primarily on the task of rendering.

Here is the way Iâ€™m looking at it. From my perspective one of the parts is working at the task at handâ€¦rendering while the other is supplying it data to work onâ€¦a VERY intelligent memory if you will. One could then look at the bandwidth between the two parts as that from the â€œgraphics processorâ€ to its memory and go from there.

When thinking like this these are some observations I have. When looking at the bandwidth between Xenos and Xenon it appears to be about 1/3 of that â€œgraphics processorsâ€ enjoy now. When looking at Cell and RSX the bandwidth appears to be about 1/2 to 2/3 rds what â€œgraphic processorsâ€ utilize today. (PC graphics parts get approx. 30GB/s bandwidth)

This is not so bad when you go back and look at â€œhowâ€ each part will contribute to rendering. Xenos and RSX are to do the heavy lifting while Xenon and Cell are to provide flexibility these parts donâ€™t have and then do what they can to aid in the task(s) at hand. These parts wonâ€™t need to communicate THAT much data but there are some interesting differences I think about just what can be done here.

I expect neither Xenon nor Cell to be particularly good at rasterization. I expect both Xenon and Cell to aid in vertex processing, particles, and post processing affects. I am thinking along the lines of tessellation, disp mapping, integrating particles into the physics simulation, etc. What can be done and to what extent is the question.

As things look now I would say that Xenon and Xenos are at a disadvantage to Cell and RSX on these special tasks. Cell has more threads it can dedicate to these tasks where each thread has more execution resources available to getting the tasks done. The second issue is that Cell and RSX can communicate more data between one another. The last issue has to do with ease of use but I feel it is significant. I cannot find where I saw it (so if you wish to dismiss this I understand) but it would appear programmers can use Nvidiaâ€™s Cg to program in vertex work for Cellâ€™s SPUs. This does nothing for making the HW more powerful in relative terms but itâ€™s goes a long way in turning the potential into the kinetic. To my knowledge there is no equivalent for this in working with Xenonâ€™s core to the same end.

Pitfalls:
I have not ignored that Xenos can do a heck of lot of vertex processing on itâ€™s own. Xenos is not a CPU however so itâ€™s lack flexibility in what it can do with its entire vertex processing power. Xenos should be able to throw up INSANE amounts of geometry but it will not be applying physics etc to itâ€¦these tasks still best falls on Xenon and Iâ€™ve already spoken to the situation there. (the MEMEXPORT capability is a factor here though and shouldnâ€™t be ignored)The other thing to consider is when Xenon is in INSANE vertex worker mode it is also in â€œnot doing pixel workâ€ mode and Xenon wonâ€™t be able to pick up the slack thereâ€¦Cell wouldnâ€™t be able to do it either.

Random thoughts:

I am curious about how well Xenos, itâ€™s tessellator and Xenon can work together to make displacement-mapping work. There is a Z-brush demo of this I think. Could we see stuff like this in real time? Yes/no/maybeâ€¦what would it take of the tessellator etc?

I think it might be a good thing to place RSXâ€™s frame/z buffers in the XDR ram. This would give Cell direct access to them so that some interesting things may be possible. What could this allow to be done? This would guarantee RSX consumes some of Cellâ€™s bandwidth to main ram but it could it also free up bandwidth between Cell and RSX? Would the trade off be worth it?

Is there any news as what changed with respect to Cellâ€™s PPE VSU? Any clue into what Crytek means by saying that Cellâ€™s PPE has slighter/somewhat better â€œhyper threadingâ€ than a core in Xenon?

Is it true the dynamic scheduler used with Cell will find work for the SPUs to do if they arenâ€™t tasked explicitly to something?

I donâ€™t understand the issue with Xenos being triangle setup limited. Could someone explain this to me? Is it an issue where all the arrays couldnâ€™t be tasked to vertex processing or is this merely a limit to the amount of vertex processing each array could do?

What would be some interesting things Nvidia could do with RSXâ€™s feature set? Just want to here some interesting ideas.

MOST IMPORTANT THOUGHTS:

What do you think of anything I said?

What other factors may play a role when thinking about the overall performance of these machines and would these factors be more significant than these? (Iâ€™d prefer HW related issues, but if they tie into ease of use etc thatâ€™s ok)

Donâ€™t worry about going over my headâ€¦learn me something! Thatâ€™s what Iâ€™m here for.

Titanio · Sep 30, 2005

I'm sure there'll be much more said about this, but I'll just raise two small informational points for you:

scificube said:
Cell does not have access to the GDDR3 in the system.

I don't think this is true, as far as I was aware, both cell and rsx have access to each others memory pools.

The issue of bandwidth is an interesting one though. I'm also curious about how things will materially pan out on PS3, especially if HDR and MSAA can't be done together. That could significantly cut down on the framebuffer bw requirements and leave more left over for non-framebuffer tasks (texturing etc.) than X360, after you factor out CPU consumption.

Also on another point you made, some PC GPUs have ~50GB/s of bandwidth at the moment, and of course they don't have to share with the CPU.

scificube said:
I cannot find where I saw it (so if you wish to dismiss this I understand) but it would appear programmers can use Nvidiaâ€™s Cg to program in vertex work for Cellâ€™s SPUs.

This is by no means confirmed, AFAIK. A lot of people have speculated that it would make a lot of sense, but nothing certain yet.

edit - a third point, X360 does have a southbridge, two PCi Express lanes 500MB/s up, 500MB/s down.

Lysander · Sep 30, 2005

Scifi, you are very vague between logic and hardware thread difference. I insist that X2cpu is 6 hardware threaded. Can someone in short define basic hardware difference between spe and ppe(dd2)?

Titanio · Sep 30, 2005

Lysander said:
Scifi, you are very vague between logic and hardware thread difference. I insist that X2cpu is 6 hardware threaded. Can someone in short define basic hardware difference between spe and ppe(dd2)?

Xenon's threads are hardware threads, but they share the core. The SPU has one thread at a time that has the SPU to itself.

one · Sep 30, 2005

scificube said:
What I would like to note is that Xenon has 3 cores while Cell has 8 active cores. What I have begun to focus on is that this means Xenon can handle 3 HW threads while Cell can handle 8 HW threads.

Cell can handle 9 HW threads and Xenon can handle 6 HW threads though context switching may be more costly in Xenon than Cell PPE when a core has more than 2 threads.

scificube · Sep 30, 2005

What cards have 50GB/s? I'm just curious. That's a HECK of allot of bandwith. I guess I haven't kept up well enough.

If what you say about Cell being able to access the GDDR3 is correct then that certainly eases the bandwith load all around in the system. I am curious as to how this could be possible though. Flexio talks to only XDR no? So could the crossbar be where the link is...that would suggest some customization there right? Actually I have RSX sitting on the other side of the crossbar where another Cell chip normally would so there definitely would have to be some further customization.

Perhaps it's best to ask where you saw Cell would be able to access the GDDR3 and maybe I could figure the out the answers for myself. I need to find that pdf that detailed how the crossbar setup worked as well.

I was just throwing the Cg thing out there of course. If it's not the case I still it as an ease of use issue. The potential will still be there but may be lost in the effort to pull it out.

It is interesting to think about MSAA + HDR. If they can be done together bandwith is saved in not doing both at the same time right? The penalty is it taking more time on the processing end to get the job done right? Any ideas as to which is more desirable?

I also wonder if Nvidia has paid attention to what Valve has done with it Lost Coasts Expansion. If their solution works with AA, performs well and give pretty good results perhaps it would be best if both Nvidia and ATI went in the direction of promoting HDR usage in this fashion. It will of course require Nvidia to swallow it's pride and relent on one of it's prized marketing lines.

scificube · Sep 30, 2005

Lysander said:
Scifi, you are very vague between logic and hardware thread difference. I insist that X2cpu is 6 hardware threaded. Can someone in short define basic hardware difference between spe and ppe(dd2)?

I only wanted to highlight the difference between threads that do and do not have to share resources. I've seen multiple threads running on a core simultaneously referred to being "logical" threads presented to the OS. I consider all threads to be HW threads but some distinction must be made to describe things better so I used that convention. The naming convention really isn't what is important to me. It is the resources available to each thread that I make note of and I think is significant.

Did that help? I couldn't explain myself out of a paper bad ya know

scificube · Sep 30, 2005

one said:
Cell can handle 9 HW threads and Xenon can handle 6 HW threads though context switching may be more costly in Xenon than Cell PPE when a core has more than 2 threads.

Again I re-iterate that what the threads are called are not what I make note of. If you ignore the naming convention you'll see that I did not steal anything from Xenon in that I did make it clear it could handle 6 threads.

I really did not address context switching in the manner that you describe. I have no idea as to the penalty for doing this with either Cell or Xenon. For 2 threads on a core in Xenon switches should be very fast between them. Switches still need to be done though and cannot take place until execution resources free up. Switching one of these 2 threads out with another from elsewhere will be more costly for sure but I've no idea as to whether it more or less costly on a Xenon core than on Cell's PPE. Given the similarities between these two cores I would imagine the cost of doing this is close to being equal. (perhaps cache issues could affect this though to provide some separation)

Titanio · Sep 30, 2005

scificube said:
What cards have 50GB/s? I'm just curious. That's a HECK of allot of bandwith. I guess I haven't kept up well enough.

Sorry, that should have been more like ~40GB/s. The 7800 GTX starts at 38.4GB/s, different implementations might have more if the clockspeed has been bumped up. And of course, that's not shared with the CPU.

scificube said:
If what you say about Cell being able to access the GDDR3 is correct then that certainly eases the bandwith load all around in the system. I am curious as to how this could be possible though. Flexio talks to only XDR no? So could the crossbar be where the link is...that would suggest some customization there right? Perhaps it's best to ask where you saw Cell would be able to access the GDDR3 and maybe I could figure the answers for myself.

I'll try look it up to be sure, I'm second guessing that now

I thought it was mentioned in some interview somewhere..

edit - kutaragi did confirm cell can access gddr3, see the end of this post.

scificube said:
It is interesting to think about MSAA + HDR. If they can be done together bandwith is saved in not doing both at the same time right? The penalty is it taking more time on the processing end to get the job done right? Any ideas as to which is more desirable?

Not quite sure what you mean, but I meant that if it was not physically possible for RSX to do MSAA and HDR together, then framebuffer bandwidth requirements, vs say Xenos, will be a lot lower. Also if you factor in colour compression. So it still has to use main memory bandwidth for that, but the requirement should be a lot lower than it otherwise would be if RSX could do MSAA and HDR together, and you were using both (I think, at least!?).

scificube said:
I also wonder if Nvidia has paid attention to what Valve has done with it Lost Coasts Expansion. If their solution works with AA, performs well and give pretty good results perhaps it would be best if both Nvidia and ATI went in the direction of promoting HDR usage in this fashion. It will of course require Nvidia to swallow it's pride and relent on one of it's prized marketing lines.

Hadn't heard what Valve were up to - any links to more info?

edit - Here's a Kutaragi quote re. Cell accessing GDDR3:

"CELL and RSX have close relationship and both can access the main memory and the VRAM transparently. CELL can access the VRAM just like the main memory, and RSX can use the main memory as a frame buffer. They are just separated for the main usage, and do not really have distinction."

nelg · Sep 30, 2005

Every time there is debate about cell vs. xenon I am reminded about the article that Deano linked to. If I understand it correctly, the thrust of the article is about how even though you can take a problem or task and break it down to utilize MP the fact remains that the final execution speed will be determined by the slowest process. So while cells 9 processing elements may be 50% more that xenon's 6 in the real world the difference may be of little consequence. Or, on the other hand I could be talking out of my ass.

scificube · Sep 30, 2005

Titanio said:
Sorry, that should have been more like ~40GB/s. The 7800 GTX starts at 38.4GB/s, different implementations might have more if the clockspeed has been bumped up. And of course, that's not shared with the CPU.

Thanks. Well know I know and knowing...god that show sucked.

Titanio said:
Not quite sure what you mean, but I meant that if it was not physically possible for RSX to do MSAA and HDR together, then framebuffer bandwidth requirements, vs say Xenos, will be a lot lower. Also if you factor in colour compression. So it still has to use main memory bandwidth for that, but the requirement should be a lot lower than it otherwise would be if RSX could do MSAA and HDR together, and you were using both (I think, at least!?).

It is known HDR is possible on the PS3. It's also a given MSAA is as well. I was thinking if the tasks could not be done simultaneously than they must be apart from one another. I doubt devs are going to give up on HDR and AA being done together (perhaps not at the same time) with their PS3 games. I was thinking I think in error this would save bandwith and only cost extra processing time. Bandwith should still be consumed by HDR and MSAA just the same if they are done seperately...I'm thinking logically on this as I don't know for sure.

Titanio said:
Hadn't heard what Valve were up to - any links to more info?

I'll try to find you a quote. Apparrently Valve has found a way for SM2.0 HW to do HDR and MSAA at the same time and to boot with good performance. They supposed tested four different methods and came up with the one they are going to use for the Lost Coast level. If SM2.0 HW can handle it I've little doubt both Xenos and RSX will blast through HDR+MSAA using Valve's method...if Valve feels like sharing how they got it done. I think Humus is onto them anyway with his own experiments. Give me a moment and I'll try to find you a link.

Titanio said:
edit - Here's a Kutaragi quote re. Cell accessing GDDR3:

"CELL and RSX have close relationship and both can access the main memory and the VRAM transparently. CELL can access the VRAM just like the main memory, and RSX can use the main memory as a frame buffer. They are just separated for the main usage, and do not really have distinction."

KK said it? Well...it's better than nothing. I kid. I doubt he'd say something like this when devs will tear him a new one for lying to them about something so significant.

Titanio · Sep 30, 2005

nelg said:
Every time there is debate about cell vs. xenon I am reminded about the article that Deano linked to. If I understand it correctly, the thrust of the article is about how even though you can take a problem or task and break it down to utilize MP the fact remains that the final execution speed will be determined by the slowest process. So while cells 9 processing elements may be 50% more that xenon's 6 in the real world the difference may be of little consequence. Or, on the other hand I could be talking out of my ass.

If you require things to be finished at the same time, you'd break your "slow" task up further or get your other tasks to do more. For example, if a particular task was the "slow" task on chip X, you might have the opportunity to split it up in a number of different ways and execute those parts concurrently on chip Y, thus ridding yourself of that bottleneck. Or if while you're waiting for a "slow" task to finish on one core, you could spend your other core's "free" time doing a multi-frame task for example, or move them on to wholly independent tasks of the next frame perhaps, if you couldn't simply increase the amount they were doing on tasks for the current frame (but heh, i'm sure you could).

In other words, I wouldn't worry about the chip getting used in the instance of having a relatively slow task/thread.

rendezvous · Sep 30, 2005

scificube said:
I only wanted to highlight the difference between threads that do and do not have to share resources. I've seen multiple threads running on a core simultaneously referred to being "logical" threads presented to the OS. I consider all threads to be HW threads but some distinction must be made to describe things better so I used that convention. The naming convention really isn't what is important to me. It is the resources available to each thread that I make note of and I think is significant.

Did that help? I couldn't explain myself out of a paper bad ya know

What resources?
All threads on a system is sharing the memory which is a system resource.
They are also sharing the execution units on the CPU, even on non multithreaded processors by time slicing.

One way of saying what I think you want to say is that the PPE and the cores in the Xenon have multiple (two) hardware contexts for multithreading.

And the equivalent to the capabilities in the PPE (and i presume the Xenon cores) in the PC world is not Hyper Threading or SMT (Simultaneous Multi Threading).
The difference is that you in the case of the PPE have fine grained multithreading where only instructions from one core can be issued each clock cycle whereas in the case of SMT you are able to issue and execute from both threads simulataneously.

The second pitfall in the first section seems to have a typo so please forgive me if i misinterpreted it.

2. SPUs in Cell can only task on execution element at a time putting them at a disadvantage.

The SPU is able to execute on two execution units at a time.

DeanoC · Sep 30, 2005

To be blunt:

Wrong in lots of ways. But any more details would get me in trouble.

scificube · Sep 30, 2005

http://www.bit-tech.net/gaming/2005/09/14/lost_coast_screens/2.html

Here you go Titanio

The statement that they have HDR+MSAA working with SM2.0 HW is at the bottom of the page.

The whole article is a good read if you're interested, but be warned...it's full of Lost Coast spoilers in the form of HDR/no HDR comparison shots.

Titanio · Sep 30, 2005

scificube said:
It is known HDR is possible on the PS3. It's also a given MSAA is as well. I was thinking if the tasks could not be done simultaneously than they must be apart from one another. I doubt devs are going to give up on HDR and AA being done together (perhaps not at the same time) with their PS3 games. I was thinking I think in error this would save bandwith and only cost extra processing time. Bandwith should still be consumed by HDR and MSAA just the same if they are done seperately...I'm thinking logically on this as I don't know for sure.

I don't think you can just do them seperately. There might be other ways as per your Valve example, but I'm sure there are catches. I'd be very interested to hear more about Valve's work on that though.

edit - thanks for the link. Sounds interesting, pity there isn't more detail!

scificube · Sep 30, 2005

DeanoC said:
To be blunt:

Wrong in lots of ways. But any more details would get me in trouble.

That's not fair. Just use a word. Threading, bandwith...whatever.

rendezvous · Sep 30, 2005

DeanoC said:
To be blunt:

Wrong in lots of ways. But any more details would get me in trouble.

I knew i shouldn't trust MPR.

BlueTsunami · Sep 30, 2005

scificube said:
That's not fair. Just use a word. Threading, bandwith...whatever.

It sucks. I know. I also hate that I can read through your post...some or alot may not be right (as DeanoC stated, alot) and not be corrected.

scificube · Sep 30, 2005

rendezvous said:
What resources?
All threads on a system is sharing the memory which is a system resource.
They are also sharing the execution units on the CPU, even on non multithreaded processors by time slicing.

I am only talking about execution units...FPU, VMX, etc.

rendezvous said:
One way of saying what I think you want to say is that the PPE and the cores in the Xenon have multiple (two) hardware contexts for multithreading.

Sounds good to me.

rendezvous said:
And the equivalent to the capabilities in the PPE (and i presume the Xenon cores) in the PC world is not Hyper Threading or SMT (Simultaneous Multi Threading).
The difference is that you in the case of the PPE have fine grained multithreading where only instructions from one core can be issued each clock cycle whereas in the case of SMT you are able to issue and execute from both threads simulataneously.

I've seen discussions to this affect. Real SMT required doubling the resouces...might as well toss in another core is what I gathered. From the perspective of the hardware contexts I am looking at it like this. One thread is exectuting and if it hangs up on something a quick switch can be perfomed so that the other thread can execute and the threads flip flop back and forth at a really fast rate. This is what I thought but someone suggested the other approach that I laid out above was the more correct way of looking at this so I went with what they said.

edit:
I think I get it...my friend is right about how HT works. Rendezvous I missed it the first time. HT is not the correct way to look at the PPE and Xenon cores. Well...that was painful.
end edit:

I actually agree with your thinking but the guy is often so sharp I was scared to go against what he said.

In truth, allot of what I said is nullified if in fact I am correct.

rendezvous said:
The second pitfall in the first section seems to have a typo so please forgive me if i misinterpreted it.

The SPU is able to execute on two execution units at a time.

Oops...I knew that. Actually that was a bad description of a different idea I was trying to convey. When talking agian with friends they have made not that SPU only work on one thread at a time which puts them at a disadvantage. I had too many execution units runninng through my head.

I should add to the thinking about that pitfall --- Considering things again. The SPUs may in fact be at a disadvantage. When they hang there is no other context avaliable to switch to and must wait for the PPE to task them to something else. Xenon's cores however have another hw constext to switch to so in this respect they would have a good advantage.

-------------------------------------------

Did I find it DeanoC? anybody...c'mon someone tell me...

Observations, thoughts and questions about X360 and PS3

scificube

Titanio

Lysander

Titanio

one

Unruly Member

scificube

scificube

scificube

Titanio

nelg

scificube

Titanio

rendezvous

DeanoC

Trust me, I'm a renderer person!

scificube

Titanio

scificube

rendezvous

BlueTsunami

I laugh at you! HA HA HA!

scificube

Similar threads