Predict: The Next Generation Console Tech

Status
Not open for further replies.
Why not? Retail prices have little to do with manufacturing costs, especially when you consider said $99 players are often loss-leaders.
http://www.digitimes.com/news/a20100325PD203.html
That estimate is off, it overestimates the bd-rom cost while it probably underestimates some of the other costs. Sony themselves said that it's a double digit dollar loss for the PS3 that could well be single digits based on the currency exchange rate fluctiation.
I also doubt newegg is selling much more advanced 4X BD-ROM's for $59 at a loss. isuppli only gives estimates, and they can be way off, just like vgchartz.
 
Aaron please take it down a notch - I know it's only type, but there's no need to 'yell,' so to speak. Some people do think that the Cell architecture has redeeming qualities, and has been a net positive contribution for computing. I think it speaks to the benefits of the architecture that on a Flop/watt basis, clusters built around it are still the 'greenest' in the Top500. And I say "still" as an achievement in its own right, since as we know the architecture is one that has essentially been frozen in time.

The "power" of the x86 competitors - I mean quite obviously - stems in large part simply from, in Intel's case at least, their edge on all competitors in terms of fabrication and process node, to say nothing of the constant yearly architecting and platform tweaks that go into their architectures. Simply if Cell were fabbed at Intel, right there is a huge leap in both relevance and relative market advantage by the further reductions in TDP and cost, as well as obviously staying increasingly competitive in an evolving landscape. To say nothing again of the fact that the architecture has remained static for like four/five years now. I feel you are attacking the architecture via the context of its industry position, but that does the actual architecting and design direction a disservice IMO. There are plenty of cases where the Cell has found itself as a superior compute alternative to x86.

I'm not saying that this is in the world of consoles per se, or that the project was a 'wise' undertaking, but I mean how much hatred for this architecture is reasonable to have?
 
Last edited by a moderator:
Most of Sony's billions did not go into the Cell, since that was a joint effort.

Cell cost billions, but most of that was indirect you might say. The project itself (purportedly) cost Sony around ~$500M, but the investments into fabs both at Nagasaki and at Fishkill was - and I forget the exact number - but I remember it being around $3-4B total. Now, they've divested themselves of Nagasaki, and Fishkill I think the agreement with IBM there still remains unclear, but it's obvious to assume a loss on those investments.

Remember that the world of semi going into the Cell project looked a hell of a lot different than after the money had been invested and prior to launch. We had the whole 'Prescott' wake-up call on thermals and node shrinks, and clear signs of overcapacity in general begin to emerge.


Let's just in general save ourselves the hassle of analyzing the cost predictions of iSuppli and others.

I think we have more or less blacklisted said docs around here anyway. :)
 
Carl B said:
Let's just in general save ourselves the hassle of analyzing the cost predictions of iSuppli and others.

I think we have more or less blacklisted said docs around here anyway. :)
That's fine with me ;) my point was more to continue the discussion Alstrong started about what could be built now within a reasonable power envelope.
Unreliable sources aside, I don't think that X86 was an option in 2005/6 (and actually before when decisions were made), one low clocked single core X86 would not cut it.
Back in time the CPU spoiled ~40% (rough approximation) of the manufacturers silicon budget. Now the game is different, say you stick together an i3 and a juniper the CPU's "cost" is lower.

Squilliam I completely agree I don't think that manufacturers will put out systems as costly as the ps360 were. Not only cost I think power consumption and it's even more problematic consequence thermal dissipation will also prevent them to go "all out" even if their budget may have allowed for it. That's pretty much Alstrong's point
if i have not lost the plot in this airy discussion :LOL:
Cypress may be over the top in this regard.

In regard to X86 relevance I'll stick with the above system to give some meat to the point Aaron was making. We're speaking of a system worse of 1.5 billions of transistors, I remember reading that an Intel nehalem core (vs un-core according to Intel parlance) is somewhere between 45 and 50 millions of transistors, IBM xenon were below 30 millions. Say IBM offers something more performant than Xenon ~40millions transistors does it offset the X86 advantages. Using today process a 10 millions difference in transistor count is close to peanut. I think that's the heart of Aaron argument, you can peak an "off the selves" X86 that is likely to be more performant, that comes with best tools, compilers, etc. Even if the X86 comes as a bit more expensive solution on per chip basis, you save a lot of time and money elsewhere. What Aaron means is that now that silicon budget are healthy X86 are indeed tough to beat on overall value.
I don't know if manufacturers will go that route, neither I think Aaron pretend to know but I think he tried to make the X86 advantages clear but somehow the message got a bit lost.
 
Last edited by a moderator:
Aaron please take it down a notch - I know it's only type, but there's no need to 'yell,' so to speak. Some people do think that the Cell architecture has redeeming qualities, and has been a net positive contribution for computing.

Cell has contributed effectively nothing. Its largely a rehash of already known deadends.


I think it speaks to the benefits of the architecture that on a Flop/watt basis, clusters built around it are still the 'greenest' in the Top500. And I say "still" as an achievement in its own right, since as we know the architecture is one that has essentially been frozen in time.

hard to say if they are the 'greenest' since we don't really know how good they are at actually delivering the performance on real codes. And the architecture is effectively frozen in time because its effectively designed in such a way that any change breaks the whole code base.

To say nothing again of the fact that the architecture has remained static for like four/five years now. I feel you are attacking the architecture via the context of its industry position, but that does the actual architecting and design direction a disservice IMO. There are plenty of cases where the Cell has found itself as a superior compute alternative to x86.

I'm not attacking cell via its industry position. It has been a bad dead end design from day one. It isn't scalable, it takes significant programming effort to even use, and has been completely rejected by the industry. I was saying this on day one along with many other people with computer architecture knowledge that looked at it. It takes the viewpoint of a programming model that wasn't accepted in the 1970s/80s and fosters it upon the world. The die area would of been better spent if they removed all the SPUs and replaced them with comparable area of PPUs.


I'm not saying that this is in the world of consoles per se, or that the project was a 'wise' undertaking, but I mean how much hatred for this architecture is reasonable to have?

I don't have any hate for it. Its merely a bad architecture designed in a vacuum with no knowledge nor learning from things past. The whole model was flawed. That model caused them to be late. For the software to take a lot of time and resources to develop, the majority of which is also dead end because there are better ways to do things that aren't so bound by such a broken architecture. Cell is part of the defining reason why Sony went from first to last in consoles.
 
The die area would of been better spent if they removed all the SPUs and replaced them with comparable area of PPUs.
While I partially agree with you on CELL being for many reasons a dead end you really don't want to replace the SPUs with PPU cores. They are easier to program for but they are so insanely slow.. I am sure there are in order cores around that are much better than a PPU. An OOOE core or two would make me happy though.. :)
 
Yeah, I don't think Cell contributed nothing. I think the security "engine", the way Cell co-operates with RSX to implement high quality AA are very valuable. In R&D, it is not uncommon for people to revisit old approaches. I think it is amazing that they could pack so much into so little space (e.g., running DVR in parallel with a resource hungry PS3 game).

I don't think replacing SPU with PPU is necessarily better (Doesn't solve the memory wall and bandwidth issues). Since it has been deployed in RoadRunner, I also don't think one can say Cell doesn't scale either. You need to specify the applications.
 
I don't think replacing SPU with PPU is necessarily better (Doesn't solve the memory wall and bandwidth issues). Since it has been deployed in RoadRunner, I also don't think one can say Cell doesn't scale either. You need to specify the applications.

being deployed in a super esp a super cluster doesn't mean your architecture scales. It means that someone built pcie onto the side of it.

And they would of probably been better off using GPUs, more performance in the limited workloads SPUs help with and strangely enough, likely a much better dev environment.
 
Well, scaling is a big word. You can usually scale at a different level. From the application's point of view, as long as the speed up is great, the scientists got what they paid for. Some of the Cell apps scale superlinearly because of the way memory is accessed in Cell.
 
Cell has contributed effectively nothing. Its largely a rehash of already known deadends.

I disagree with you on this though. Now whether you want to say heterogeneous is a boondoggle, in-order is a waste, and the memory model was likewise a bad move, here was an architecture aimed for the mainstream that brought them all to the fore, and I think honestly what matters most is that cases can be made for the positives of each of these facets as well.

When you consider the programming model and the markets it was aimed at, indeed if GPGPU hadn't have come on exactly when it did, Cell would have had the highly parallel commodity compute market all to itself essentially - and even then Cell competes very well in a number of those areas. For a cheap cluster to run some physics ops for example, chained PS3's have become fairly commonplace. And although one may want to qualify their placement in the Top500 (both for green and otherwise), I think it's overly cynical to assume that it's all Flops and no utility - I believe papers have already emerged from some of the work accomplished by Roadrunner before its being committed to nuclear sims 24/7 for example.

The conversation seems almost redundant in a sense because the literature and reports are clearly there of Cell proving itself across a number of work types, particularly in signal processing and financial modeling. I would argue in fact that if the financial crises hadn't come and sideswiped the industry when it did, IBM would have had a good bit of success selling the XCell8i's into the financial and energy spaces, where they were being targeted. But with the floor falling out of both at exactly the time of launch, well...

So far as IBM itself is concerned I think Cell has been a huge saving grace, as it got them in the many-core mindset much earlier than they may have been otherwise, have made them active in OpenCL as a result (which supports Cell it should be mentioned), and are supposedly using a good bit of their experiences there with HPC architectures going forward. If they had gotten what they had originally pushed for vs the SPE model, it would have been just another non-event without even this argument to be had as to its merits.

I'm not attacking cell via its industry position. It has been a bad dead end design from day one. It isn't scalable, it takes significant programming effort to even use, and has been completely rejected by the industry. I was saying this on day one along with many other people with computer architecture knowledge that looked at it. It takes the viewpoint of a programming model that wasn't accepted in the 1970s/80s and fosters it upon the world. The die area would of been better spent if they removed all the SPUs and replaced them with comparable area of PPUs.

I disagree that it isn't scalable - on the contrary, SPE-offloaded tasks are almost linearly so. As for rejection, I suppose it is all relative. It is not omnipresent, no, and you and others were saying as much back in the day. But it also made its way into the first Petaflop computer on Earth, and by the same token I doubt you would have believed that would come to pass had someone posited such back then, along with the decent traction for the of-the-shelf clustering in academia. It has failed, but it is not an epic failure - it claimed some laurels before the market squeezed it out, and I think even some of its critics have gone on to adopt some of its design directions in their roadmaps.

That model caused them to be late. For the software to take a lot of time and resources to develop, the majority of which is also dead end because there are better ways to do things that aren't so bound by such a broken architecture. Cell is part of the defining reason why Sony went from first to last in consoles.

I think Blu-ray had more to do with the above than Cell, personally. Cell was late(ish), large, and hot - but Blu-ray was way behind schedule.
 
Last edited by a moderator:
RoadRunner the super computer with with 12,960 IBM PowerXCell 8i and 6,480 AMD Opteron dual-core processors?

What is the reason each Opteron has a Cells attached to it as in an Opteron cluster with Cell accelerators or such, just wondering.
 
So far as IBM itself is concerned I think Cell has been a huge saving grace, as it got them in the many-core mindset much earlier than they may have been otherwise, have made them active in OpenCL as a result (which supports Cell it should be mentioned), and are supposedly using a good bit of their experiences there with HPC architectures going forward. If they had gotten what they had originally pushed for vs the SPE model, it would have been just another non-event without even this argument to be had as to its merits.

IBM has already been in this space with their bluegene products for quite a while.


I disagree that it isn't scalable - on the contrary, SPE-offloaded tasks are almost linearly so. As for rejection, I suppose it is all relative. It is not omnipresent, no, and you and others were saying as much back in the day. But it also made its way into the first Petaflop computer on Earth, and by the same token I doubt you would have believed that would come to pass had someone posited such back then, along with the decent traction for the of-the-shelf clustering in academia.

making it into supers really isn't that big of a deal, be cheap, be somewhat efficient, be cheap, have an interface to IB, be really really cheap. Did I mention be cheap?



I think Blu-ray had more to do with the above than Cell, personally. Cell was late(ish), large, and hot - but Blu-ray was way behind schedule.

Even with BR being behind schedule, the ramp to get anything usable out of cell was long and painful. The problem being that the cell programming model is extremely limited which requires significant acrobatics from the programmers to get things to work.
 
RoadRunner the super computer with with 12,960 IBM PowerXCell 8i and 6,480 AMD Opteron dual-core processors?

What is the reason each Opteron has a Cells attached to it as in an Opteron cluster with Cell accelerators or such, just wondering.

basically, they use the opteron to do the control flow, communication, etc and just offload matmuls(etc) to the cells. The PPU becomes a glorified DMA engine bringing data to/from the main memory to the cell memory where it is DMA'd yet again in smaller chunks into/out of the SPUs.
 
RoadRunner the super computer with with 12,960 IBM PowerXCell 8i and 6,480 AMD Opteron dual-core processors?

What is the reason each Opteron has a Cells attached to it as in an Opteron cluster with Cell accelerators or such, just wondering.

Because the PPU in the Cell is weak, it's used to feed the spu's with data as well as handling communications between clusters. The PPU is the weak spot in Cell, not the SPU's, if you listen to the US Army, US Air Force and Nuclear Physicists at LLNL. If you listen to Aaron Spink, SPU's are a waste.
 
Last edited by a moderator:
IBM has already been in this space with their bluegene products for quite a while.

Yes they have, but I believe that future evolutions in the HPC space are going to be leveraging the lessons learned with Cell, and I would imagine more specifically we'll be seeing a take on the SPE's re-envisioned. Anyway, David Turek has alluded to as much - we'll see soon enough either way.

making it into supers really isn't that big of a deal, be cheap, be somewhat efficient, be cheap, have an interface to IB, be really really cheap. Did I mention be cheap?

At $125M though, I wouldn't say it was cheap, and the XCell8i wasn't commodity by any stretch. It certainly wasn't an NVidia (or PS3) cluster in terms of price. I agree it has to be 'somewhat' efficient; I think I would just take it further and say that the architecture can be very efficient given the proper workloads, and that translates to HPC very well. I think it speaks to the design that the albatross was the PPE from the start and not the SPEs - if there had been a decent 'control' core from the outset, Roadrunner could have had all of those Opteron blades supplanted as well, and the architecture would have gained even more on the price and efficiency metrics, as well as performance likely.

I know which one is more credible.

The thread is for discussion, not personal attacks guys. One day's individual voice is yesterday's Air Force CS guy or tomorrow's DOD supercomputer liaison; let's just stick to debating the technology and stay out of arbitrary attributions of credibility.
 
Last edited by a moderator:
Some points you missed:

1. 2015.
2. GPUs of 2010 are in the 2TFLOPs range; do we expect a 25x FLOP increase in less than 5 years?
3. GPU architecture is very far away from running gamecode.
4. CPUs are still lagging back in the 200GFLOP range.
5. Yes, OOOe is irrelevant for significant amounts of FLOPs (especially graphics which when combining programmable FLOPs with non-programmable ones found in TMUs and such is already in the tens of TFLOPs) but ...
6. but ... the game loop, as non-FLOP intensive as it may be can hold you back a lot.
7. And parallelizing it isn't so easy/efficient.
8. Creating a parallel game loop on GPU-like "cores" would be a nightmare. Or even SPEs.
1. That's less than 5 years away and with motion controls just coming out, this generation will be extended.
2. Even if it only progresses to 10 GFLOPS, the percentages will still be very close.
3. GPGPU's are new, Fermi just came out, give it time.
4. Yes, that's why the push for GPGPU.
5. Agreed
6. Even current generation's cpu's aren't holding them back. When you compare the differences of console games vs pc games, the difference isn't in AI, it's in graphics, and in some cases, physics when used with another (nvidia) GPU to run physics. There is no evidence of anything CPU bound holding back consoles.
7. Yes, but the effort started a while ago, the Source engine was multithreaded and that was in 2004 I think.
8. Subsystems are already being moved to separate threads, so you can compute AI, physics all simultaneously to get the benefit of multithreading. Obviously it won't scale 100%, maybe not even 50%.


The market is growing decidedly in the area of mobile and "arcade" like sectors. Likewise a slew of software is done by smaller studios. Not to mention development expectations are increasing while turn around rate has stabilized. There is a premium on turning games around, on time and on budget, not getting the most out of esoteric hardware.
The current console CPU's are fast enough to run whichever downloadable game is thrown at them. When making their console, MS listened to Cliff B, not Jonathan Blow (Braid) and it'll remain that way. The big money makers are always the big name titles.

That said I do think we may see some compromises where there will remain a small number of very efficient, fast, serial oriented CPUs (like x86 OOOe processors) and a consolidation of the "FLOP" resources with many, very simple, cores ala GPUs. Performance per mm^2 is very high for GPU cores so if simple, peak FLOPs what you are going for that is the direction you would want to go.

A 2nd or 3rd generation Llano style CPU (a handful very fast OOOe CPU cores--meets the needs for the serial gameloop, "deadline non-efficient code," indie devs, etc) with the vector built on (could use extensions, on the same die similar to old-style ondie FPUs; giving you your high peak FLOP performance on die as well as be a setup and/or post processing monster for the GPU as well as physics and such if the libraries ever catch up) and then a normal GPU. Down the road 5-7 years after this style of system we could see single chip solutions with a fast OOOe core(s) on a sea of GPU styled cores.
I don't know about x86 since Intel makes it so expensive, but that's the idea here. Have 2-4 small GP cores in a sea of GPGPU cores.


Anyhow, after reading all the PS3 owners shrug over losing Linux because it was piss slow at basic tasks (like web browsing), I am not sure how argueing going for even simpler, more basic, cores than the PPE in the Cell processor and how serial performance isn't important ... my 1.4GHz Core Solo netbook runs FF faster ;)
That's because of a lack of optimization and only 256 MB memory. Your netbook probably has at least 1GB.


While ND and DICE may not need a faster main processor I think we have heard many developers note that speeding up their core loop and having some "forgiveness" for some bad code when crunch hits could really make a difference in a lot of games. Just because the industry is moving toward one direction doesn't mean it should be done overnight--probably the biggest problem with CELL. Sony tried to use their market share to force the industry in a particular direction. But they didn't anticipate the strength of the competition, the importance of tool chains, were late (and half baked), and the industry didn't buy into their vision. Right direction, but wrong road it seems.
No cpu in 2003-4, when next gen decisions were made, would be better than the cell for a console. I agree with the rest about tool chain etc, Sony was really unprepared for this.
 
I just wanted to add that without cell (and the awesome programers(most of all)) we might not have had the !amazing! anti-aliasing quality (and with less of a hit?) we have in god of war and hopefully in future titles.

Plus I think it was a good step... a learning experience, and it certainly has it's positives. You've got to give them props for taking technological gambles. We'll see where Sony goes from here.

Seriously though... best IQ this gen :D Thanks Santa Monica!

Anyways... just my opinion.
 
That said I do think we may see some compromises where there will remain a small number of very efficient, fast, serial oriented CPUs (like x86 OOOe processors) and a consolidation of the "FLOP" resources with many, very simple, cores ala GPUs. Performance per mm^2 is very high for GPU cores so if simple, peak FLOPs what you are going for that is the direction you would want to go.

I think we actually agree on this to a decent extent. I just think that the FP/GPU cores will be concentrating on the graphics side for the most part.

I agree with the rest about tool chain etc, Sony was really unprepared for this.

But that was the problem with cell. It was a totally different programming model with significant restrictions. If they had made it anything but a control/data store memory model, it would have been much easier to adapt tools and programs too it, but instead you effectively have to jump through hoops to get anything outside of a matmul to run on it.
 
But that was the problem with cell. It was a totally different programming model with significant restrictions. If they had made it anything but a control/data store memory model, it would have been much easier to adapt tools and programs too it, but instead you effectively have to jump through hoops to get anything outside of a matmul to run on it.
The SPE ISA is general purpose. It's in no way restricted to "matmul" duty. The memory model is the difference here, but it's also the trait that makes SPEs small and scalable enough that you can pack a whole bunch of them on one chip, and not lose manufacurability nor clock headroom.
 
Status
Not open for further replies.
Back
Top