Will Microsoft trump CELL by using Proximity Communication in XB720 CPU?

In a big CPU, you get defects, which stop bits of the processor working. If PS4's Cell is 300 mm^2 again, they'll be taking a hit on chips. I've no idea how many transistors we'll be at by then, but let's just say for the argument that 4 PPEs and 32 SPEs fit on 300 mm^2. Yields on 100% working chips will be as low then as 1:8 Cells are now. If Sony are willing to lose up to 4 PPEs, yields will be much improved. Alternatively they go with a smaller chip with better yields, such as a 2:16 Cell at 150 mm^2, or even 4 1:8 Cells networked together.
I think you mean SPE's and while I see what you're saying it's all speculation at this point. A 4x32 configuration may produce excellent yields in 2012 or whenever production begins.
 
If Sony will release PS4 I don't think they will take a risk again, full of negative responses around for the Cell now ... When programmers learn how to run Cell @ full performance , probably at 2009 - 2010 , i don't see any reason to move for another CPU... Here is Cell BE Roadmap ;

16903bl1.jpg



3 PPE + 32SPE by 2010 ... Probably powerful enough to compete with its rivals [ with a powerful GPU , designed from the beginning , not a last second move like RSX ] ... My thoughts based on current Cell's success , but if it fails it'll be dramatic for Sony , IMO ...


Should be 2 PPEs + 32 SPEs I believe.

http://forum.beyond3d.com/showthread.php?t=36335

I think Sony is also working on some sort of optical interconnect based on papers people posted but I can't find it now (This part might be hype though).
 
I think you mean SPE's and while I see what you're saying it's all speculation at this point. A 4x32 configuration may produce excellent yields in 2012 or whenever production begins.
Yes, SPEs :oops: And the 4:32 was only a figure. The real figure to note is the silicon size. 300 mm^2 has enough defects to warrant a disabled SPE. Going forwards, whatever configuration Cell is, 4:32 or 8:64, there'll be defects at 300 mm^2 which knock out bits of the CPU.

However, they'd probably not need to disable more than a couple of SPEs. If we take one defect per 300 mm^2 on average, that takes out a SPE currently, so Sony have to go with 7 working SPEs. In 2012, there'll still be one defect per 300mm^2, which would be one SPE. With many cores, yields actually should improve. Where once a defect took out a whole CPU, trashing the ALU or MMU, now it takes out a core, and soon it'll just be disabling one core of loads.
 
That's true to an extent. You can definitely make 3 OOOE core fit on one die if you must. Whether it can reach 3.2 Ghz, have decent fp power and decent integer power, have reasonable power consumption, and can be fabbed by a non-dedicated fab is another question. The latter was the killer, and IBM was clearly in no position to make such a chip. At some point and probably early in the development cycle, it because clear that 3 in-order core was much more practical than trying to force 3 OoOE core to the same level of performance.

One need only look to the Pentium 4 for evidence that an OoO processor can reach 3.2+ Ghz.

Since the Northwood core (at 130nm) had a similar number of pipe stages to the in-order PPE, and internally could issue 6 micro-ops a cycle, we can assume a narrower, more conservative OoO core at 90nm could reach that range.

An OoO core would have likely taken longer to develop, which ran counter to Microsoft's desire to have a longer lead time over the PS3.

The amount of money Microsoft was willing to pay per-chip is also a likely factor. An OoO variant of Xenon would have been bigger, though a conservative design would not be massively bigger.
If Microsoft could have stomached a larger die (possibly still smaller than Cell) prior to the 65nm shrink, as Sony did, it certainly sounds feasible to me.

Microsoft's internal and schedule constraints aside, I don't think there was a technical reason why such a chip couldn't be manufactured.
 
One need only look to the Pentium 4 for evidence that an OoO processor can reach 3.2+ Ghz.

Since the Northwood core (at 130nm) had a similar number of pipe stages to the in-order PPE, and internally could issue 6 micro-ops a cycle, we can assume a narrower, more conservative OoO core at 90nm could reach that range.

Except that you can't even mention "conservative" and Northwood in the same sentence.:p The P4 line was an engineering freakshow and might have well been put together by magic. Intel spent billions and billions of dollars get the P4 line to the clockspeeds they did, until they finally hit the genuine clockspeed brickwall in Prescott. IBM can't do that, or even think about doing that. They don't have a tiny fraction of the kind of resources Intel threw at the P4 line.

An OoO core would have likely taken longer to develop, which ran counter to Microsoft's desire to have a longer lead time over the PS3.

The amount of money Microsoft was willing to pay per-chip is also a likely factor. An OoO variant of Xenon would have been bigger, though a conservative design would not be massively bigger.
If Microsoft could have stomached a larger die (possibly still smaller than Cell) prior to the 65nm shrink, as Sony did, it certainly sounds feasible to me.

Microsoft's internal and schedule constraints aside, I don't think there was a technical reason why such a chip couldn't be manufactured.

Totally agreed except for the "conservative" OoO CPU. Not going OoO was in their better interest for the Xbox 360.
 
One need only look to the Pentium 4 for evidence that an OoO processor can reach 3.2+ Ghz.

The amount of money Microsoft was willing to pay per-chip is also a likely factor. An OoO variant of Xenon would have been bigger, though a conservative design would not be massively bigger.
Is there much point to conservatice OoO though? If it's slimmed down, it's gains will be also, perhaps to the point of not benefitting much. We hear Xenon hasn't the greatest implementatiuon of features like branch prediction. To create a proper, well rounded OoO processor that benefits from the OoO features, you'd be looking at bigger cores. I'd be surprised at more than dual-core in that case, which, if games do become vector heavy, would put the CPU at a considerable disadvantage.
 
There are certain aspects of code, including game code which require single threaded performance and cannot benefit greatly from multiple threads (so I understand anyway). Im not saying "Cell sucks", im saying its not better at everything. When you require single threaded performance then Cell will be poor compared to a modern x86 Core. Its no good saying the devs wouldn't code for single threaded performance if that particular piece of code cannot be spread across multiple threads.
I don't think we're talking about the same thing here.. I'm saying that if your writing a "software system" for a multi-core architecture then you'd write it using multi-threading to take any kind of advantage of the performance of the chip (even to a reasonable extent).. And since CPUs in both the PC and console space are rapidly going multi-core, Gone is the day when any one will choose to write single-threaded software systems in the hopes to take advantage of the architecture..

Sure there are "algorithms" and "computations" which can't benefit from parallelisation due to things like a necessary high degree of synchronisation and cross-thread communication (which would likely slow down the process rather than offer any kinds of performance gains) but there's different ways to skin a cat and if your smart and you're writing a multi-threaded software system which contains such processes, you'd relegate these processes into a single-thread and design the software to perform as little cross-thread communication as possible.. It all depends on how you decide to break down the components of the system in to seperate threads.. Some processes will need to communicate alot, some very little and some parts of the system can very effectively work completely independantly from the rest of the code..
But as I said before, no software system would ever benefit from single-threaded execution on a multi-core system without at least some form of labour devision [onto seperate threads] (There will always be some portion of computation that can benefit from parallelization)..

Sure, and I don't dispute that. But your post read to me as though you were saying any code can be re-written to take advantage of Cells superior vector performance and thus run better on Cell than an x86. Regardless of what that code is.

Well thats simply not what I was trying to say..

Im saying that some code simply won't benefit all that much and thats were Cells advantages evaporate. It may still be comparable or even superior in other areas but the point is that its not "an order magnatude" in everything "as long as you optimize your code". Some code just won't work all that well on Cell (relatively speaking). The big question is whether games code fits into this catagory - not GPU style graphics code - but the type of code you can't run on a GPU and you need a seperate CPU for.

I don't dispute that some sections of code would be difficult to parallelise as for example, it would be incredibly difficult to get good performance from a game engine where the core logic and physics resided on seperate threads (since a very high degree of synchronisation between the two is necessary) but this doesn't stop you from finding other elements from the overall system which can be split off into seperate threads, allowing you to pick up your performance gains that way.. Rendering, IO processing, networking etc can all be split off and processed independantly where some aspects don't necessarily even have to be processed every frame.. Also there will be areas within these which can be broken down even further and even more performance can be gain from doing this (an example would be having several threads for rendering..)
It's very rare (if not impossible) to find a situation where you couldn't gain at least some kind of performance from multi-threading in such large, complex and real-time systems.. Like I mentioned earlier you just need to be smart about how you go about it..

Of course it couldn't, but it doesn't change the fact that gaming performance sells a lot of CPU's at all levels, and not just because of gaming, but because of the the mind share that CPU gets from being perceived as faster as measured in the majority of benchmarks sites publish when a new CPU launches. If it were as simple as "add 20% more FP performance and add 20% gaming perofrmance" then im sure AMD or Intel would be jumping at it. Or we could just look at the classic example of a 3Ghz Celeron vs a 2Ghz A64. The Celeron has the higher peak FLOPs performance yet much lower games performance. Clearly there are other more important factors at play.
Again I don't know why you're making the point that FLOPS performance is what I think is all that matters when I have clearly never said anything of the sort..
I did however attempt to put across that with regards to Cell, aside from the high FLOPS performance, the processor (overall) is actually very good at much more general computing which is targetting towards game systems, mainly because of the scope for parallellisation and also because it's bloody fast... The FLOPS metric is meaningless without additional information to provide context.. In Cell's case we can see how good the performance is, not because of FLOPS but because it's been tested, put through it's pacing and is already showing incredibly positive performance in game-related processing (with respect to other CPUs) whilst also having the scope and capacity for such and incredibly degree of optimisation (e.g. more/efficient micro & macro-parallelism, greater use of low-level SPE intrinsics etc..) above and beyond what has been done at such an early stage..

I don't think anyone can deny that..
 
I'm not sure why people still throw toys out of the cot when it comes to flops and Cell. It's been shown in a number of contexts now that relative floating point performance and relative performance are two different things - in a good way for Cell. The key to Cell's performance advantages seems to have as much if not more to do with memory and bandwidth - its memory model - as with computational horsepower. We've seen numerous examples now where Cell has outperformed other processors by a larger factor than a gap in paper floating point capability would suggest, in some cases by a much larger factor, and even in workloads we initially thought would simply suck on Cell and that appear have little relevance to FP capability. Ditto, a workload might not get anywhere close to using Cell's theoretical floating point capability and still significantly outperform implementations on other processors.
 
The key to Cell's performance advantages seems to have as much if not more to do with memory and bandwidth - its memory model - as with computational horsepower.
Yep. Flops stuck as the defining concept, perhaps because it was a metric people were already familiar with. But Cell's main strength is probably data throughput. Whether you're doing Float maths or Int maths, or no maths at all, it can mash through data. Performance computing is all about getting the data into the CPU for it to crunch, and Cell's very good at that.
 
I don't dispute that some sections of code would be difficult to parallelise as for example, it would be incredibly difficult to get good performance from a game engine where the core logic and physics resided on seperate threads (since a very high degree of synchronisation between the two is necessary) but this doesn't stop you from finding other elements from the overall system which can be split off into seperate threads, allowing you to pick up your performance gains that way.. Rendering, IO processing, networking etc can all be split off and processed independantly where some aspects don't necessarily even have to be processed every frame.. Also there will be areas within these which can be broken down even further and even more performance can be gain from doing this (an example would be having several threads for rendering..)
It's very rare (if not impossible) to find a situation where you couldn't gain at least some kind of performance from multi-threading in such large, complex and real-time systems.. Like I mentioned earlier you just need to be smart about how you go about it..

I think this ties in nicely to what im saying. I agree that yes, you can always make use of your extra cores and extra threads in some way, you would of course never rely on one single thread in a multithreaded architecture. However the point im making is that there may be one thread which cannot be further split out which is a bottleneck to the rest of the game. i.e. your only as fast as your slowest thread. Yes you can save time by running other less critical threads in parralell but if they arn;t heavy lifting threads then you may not gain much and you will still be constrained by your primary thread.

For example, if you have one primary thread that limits the overall speed of the game and then a bunch of others for things like audio, network physics etc..., the primary thread will run faster on a single core of the AX2 than it will on Cells PPE. So as long as the other processes arn't too much for the second core to handle to the point were the primary thread is held back, the AX2 will be faster overall. I dont know how common that situation would be in a games environment were CPU rendering isn't a factor but im just pointing it out as an example were single threaded performance is important.

I discount rendering btw because it can be assumed that whatever Cell is doing in that area could simply be transferred to a more powerful GPU in a different system. So im just comparing performance in the areas were the CPU is the bottleneck.

In Cell's case we can see how good the performance is, not because of FLOPS but because it's been tested, put through it's pacing and is already showing incredibly positive performance in game-related processing (with respect to other CPUs) whilst also having the scope and capacity for such and incredibly degree of optimisation (e.g. more/efficient micro & macro-parallelism, greater use of low-level SPE intrinsics etc..) above and beyond what has been done at such an early stage..

I don't think anyone can deny that..

Im not sure thats true. Yes we have seen Cell excel in some applications but none of them have been applicable to a realworld gaming scenario as far as im aware (again discounting graphics rendering which is not a CPU bottleneck in other systems). Cell seems to be very good at some scientific applications, especially those that require massive vector perofrmance or that fit into the LS but as a CPU for a games system I don't think we have seen much evidence of its superiority yet.

Thats not to say I don't expect to see any demonstrations but just that I don't think we have seen any yet. If its going to happen then I guess the areas where we will see it will be in AI and Physics beyond that which any other system is capable of. What else? Perhaps better Animation and more world activity?

If Cell reaches peaks in any or all of those areas that is simply not possible on a modern dual core x86 then I would consider that compelling evidence. But then the question moves on to 4 and 8 core systems. How long before anything Cell can do game wise from a none rendering perspective can be done on a multicore x86?
 
The current disabled SPE won't be an issue going forwards, though redundancy for yields on massive multicore will likely be a reality, same on GPUs.

ahh, I'm sorry. I meant the PS3. Not PS4. I can't edit my post to reflect that. A PS3 with four CELL's and one RSX v. a 360 with 4 Xenos's and one Xenon.

I'm not sure on the direction Sony will go with the PS4. I will say however, I highly doubt they would add 4 cpu's in the system. They will rather just add a more power Cell than adding 4.

I really don't understand what all this focus on CPU's though. When were talking about next gen consoles after this gen, I assume we are in the era of 1080p rendering. Mpst of the focus should be all on the GPU IMO.
 
However the point im making is that there may be one thread which cannot be further split out which is a bottleneck to the rest of the game. i.e. your only as fast as your slowest thread. Yes you can save time by running other less critical threads in parralell but if they arn;t heavy lifting threads then you may not gain much and you will still be constrained by your primary thread.
Though true in theory, I'd have to wonder about that single thread and the whole game engine design. There's a great deal that can be split off into seperate 'modules'. If you think about it, animation, physics, AI, audio, etc. can be moved onto separate threads. What that leaves is control code - reading IO, main loops, UI, etc. None of that should be too demanding. Now if you are unable to separate your AI or animation from the main loop, that single thread will dominate. That strikes me as a design issue though.

These super-strength monolithic cores are taxed on the components that can be separated onto other processors. The moment you remove physics, AI and animation from the Dual Core Pentium, it really hasn't got a great deal of work to do and it's OoO and issue width and such just aren't much use.
 
I really don't understand what all this focus on CPU's though. When were talking about next gen consoles after this gen, I assume we are in the era of 1080p rendering. Mpst of the focus should be all on the GPU IMO.

Depends on the roles the GPU can/will play. For pure rendering, if we see smaller returns for greater power, the focus would presumably shift to other areas in order to push things forward and differentiate games. We arguably have much further to go in areas other than rendering, so the GPU's importance in that light might depend on how many roles it can adopt besides 'just' doing graphics.

We're already seeing some traditionally graphics-focussed developers (e.g. Carmack) come out and say that we're getting close to a point where technical competency versus artistic competency is becoming less and less important..that it will become more and more difficult to wow based on new rendering techniques alone. In 5 years time we may well be at a point where the big leaps are no longer to be made in graphics rendering. That's not to say improvement cannot still be made past that point, just that it may take a backseat to progress elsewhere (we're arguably still at an infancy stage with things like AI for example, and we still have some ways to go with physics etc.).

I think we'll already be at a point with this generation of hardware where the rendering quality will be so high that if you cannot back that up with believable behaviour (be it environmental, physical, character etc.), then you have to wonder what the point of making things just look realistic is.
 
Depends on the roles the GPU can/will play. For pure rendering, if we see smaller returns for greater power, the focus would presumably shift to other areas in order to push things forward and differentiate games. We arguably have much further to go in areas other than rendering, so the GPU's importance in that light might depend on how many roles it can adopt besides 'just' doing graphics.

We're already seeing some traditionally graphics-focussed developers (e.g. Carmack) come out and say that we're getting close to a point where technical competency versus artistic competency is becoming less and less important..that it will become more and more difficult to wow based on new rendering techniques alone. In 5 years time we may well be at a point where the big leaps are no longer to be made in graphics rendering. That's not to say improvement cannot still be made past that point, just that it may take a backseat to progress elsewhere (we're arguably still at an infancy stage with things like AI for example, and we still have some ways to go with physics etc.).

Well said. :smile:
 
I really don't understand what all this focus on CPU's though. When were talking about next gen consoles after this gen, I assume we are in the era of 1080p rendering. Mpst of the focus should be all on the GPU IMO.
GPUs on there own aren't much good. 4 Xenoses in XB360 would spend most of their time idle, because they've only got 20 GB BW to use, shared with CPU. 4 Cell's in PS3 would be a lot more effective if you can fit the workloads and data in LS, and pass data between SPEs. They'd be using their internal BW.

Consider a console with 2 G80s. You'd need a crazy amount of BW to feed them, and that means expensive RAM, which is a serious issue. The price of including fast RAM is often overlooked, but that's why we have 128 bit busses on these consoles when GPUs are at 256 bit and over. You're not going to have the memory available to feed multiple uber GPUs in a console. CPUs on the other hand can do a lot of work in less BW, and if you can turn them to creating content procedurally, can save RAM. The only point multiple GPUs make sense is if they're not rendering but providing CPU-type tasks, as rendering is so BW intensive.
 
I think this ties in nicely to what im saying. I agree that yes, you can always make use of your extra cores and extra threads in some way, you would of course never rely on one single thread in a multithreaded architecture. However the point im making is that there may be one thread which cannot be further split out which is a bottleneck to the rest of the game. i.e. your only as fast as your slowest thread. Yes you can save time by running other less critical threads in parralell but if they arn;t heavy lifting threads then you may not gain much and you will still be constrained by your primary thread.

Chances are it'll be the vector math heavy processing that will require the most "heavy lifting".. Fortunately this tends to be the easiest area to parallelise..

Im not sure thats true. Yes we have seen Cell excel in some applications but none of them have been applicable to a realworld gaming scenario as far as im aware (again discounting graphics rendering which is not a CPU bottleneck in other systems). Cell seems to be very good at some scientific applications, especially those that require massive vector perofrmance or that fit into the LS but as a CPU for a games system I don't think we have seen much evidence of its superiority yet.

The benefits of Cell are as Shifty put it in the memory and bandwidth area. Even non-vector processing can be pushed through the chip in much greater volumes than was otherwise possible on conventional processors.. The trick is making your algorithms and computations Cell-friendly which is easier to do than many believe considering the scope for optimisation (even the smallest optimisations can give you massive gains over processing the same computational load on (and even optimised for) more conventional architectures)..
Simply put, Cell is a throughput monster and the question isn't "how can one take the computational load of a PS2 game and make it run faster on Cell?" but "how can one take a much greater computational load that was ever previously possible on a console game, using sophisticated processing subsystems (advanced animation) and data heavy computations (iterations of comptation over very large data sets), and set it up in a way that it maximises the throughput of the Cell?"..
This is the big question that will make the difference in showcasing the full capabilities of the system..

Any system or sub-system that is game-trivial (networking, IO, general game logic)
will not be a problem for the Cell any more than it has been for games developed in the past on the EE and its ancestors, and thus could never truely bottleneck the system enough to matter..

Thats not to say I don't expect to see any demonstrations but just that I don't think we have seen any yet. If its going to happen then I guess the areas where we will see it will be in AI and Physics beyond that which any other system is capable of. What else? Perhaps better Animation and more world activity?

We've seen plenty.. The WarHawk demonstration in GDC last year, the Edge demonstration this year.. Even some of the computer vision stuff that some of the guys i'm working with have been doing is pretty darn incredible to say the least..

If you haven't seen much then you haven't really been following all that well..

If Cell reaches peaks in any or all of those areas that is simply not possible on a modern dual core x86 then I would consider that compelling evidence. But then the question moves on to 4 and 8 core systems. How long before anything Cell can do game wise from a none rendering perspective can be done on a multicore x86?
Cell has already superceeded modern x86 dual-core CPUs in so many areas directly and indirectly related to games.. There are many processes even that allow Cell to outmatch 4 and 8 core system purely based on the scale factor of concurrently threads available.. The evidence is here and its compelling.. If your waiting for a game however that showcases dramatic sophisticated processing at the capacity that only the Cell could give then your in for a pretty long wait because we're only at the start of the learning curve.. But then again if you're looking for the same thing for a 4 core x86 CPU also then the same thing applies..

It's funny how some people seem to represent this attitude of impatience.. It's like "yeah we know Cell can be used to process thousands and thousands of rigid bodies in real time... and yes we know that Cell is capable of processing over 700+ independant animation systems in a scene concurrently.. and yes we know that Cell can process sophisticated computer vision algorithms whilst handling a multitude of other tasks on the go but that doesn't mean Cell is better than my AX264 until I see what it can do in a game..!!"

If you really don't have any idea of the potential Cell offers in terms of scope for games development then just sit tight and wait and see much more elaborate showcases of it..
Granted not every game will but when have we ever had a console where the technology is pushed to the limits from when you boot up the game to when the credits roll at the end..?

As they say "the proof is in the pudding" and the pudding is in the oven.. ;)
 
GPUs on there own aren't much good. 4 Xenoses in XB360 would spend most of their time idle, because they've only got 20 GB BW to use, shared with CPU. 4 Cell's in PS3 would be a lot more effective if you can fit the workloads and data in LS, and pass data between SPEs. They'd be using their internal BW.

I'm enjoying reading this thread, as I suspect it will be about as applicable as speculation about the PS3 back when the PS2 was released. ;) Can I play, too?

If there are still gamedevs reading this, I wonder which would be better received:
more memory?
more bandwidth to the GPU?
more bandwidth to Cell?
more bandwidth between Cell and GPU?
more GPU computational resources?
more Cell computational resources?

[y'all can pretend that's divided up in a sensible way, too -- include an increased ROP count with increased GPU bandwidth, that sort of thing]

I'm guessing that the balance between bandwidth and flops in the GPU already favors flops, and that the OS-reserved memory size might be a source of obvious improvement.... [or, to make it more applicable to the conversation underway, that the gpu probably does deserve a crack at improvement as Cell is already pretty darned good, but that the improvement is likely to be less about computational ability and more about bandwidth]
 
I vote for lighting, animation, and physics - in that order. Whatever combination of chips (cpu/gpu) can be used to produce the most realistic displays in those areas is probably what we'll see next generation. The age of flops and polygons is over!
:D
 
Back
Top