Predict: The Next Generation Console Tech

Panajev2001a · Nov 23, 2009

upnorthsox said:
Again, Larrabee cores are also limited to 256k so if devs don't like SPU LS's they ain't going to like Larrabee either.

Developers will have to learn to divide their data in chunks that can be worked on in parallel and that there is enough work to do on each chunk, but still... if you can user cache hints and/or lock cache lines to also avoid trashing the cache for transient data you have the benefit of both predictable latency and not having to manage data and instructions (somewhere Mike Acton is giving me a pitiful look for mentioning the two as separate items

...) manually.

Panajev2001a · Nov 23, 2009

Shifty Geezer said:
I think it only makes sense if Larrabee turns out to be the bee's knees as a GPU. If the programmability pays huge dividends, and the tools make it easy to harness, chosing to use Larrabee as a standalone GPU makes sense, and then you leave Cell to do the program code. That's not really what LRB is about though. If you're going to use Larrabee at all in a system, like yourself I think you'd want just LRB. There is still a case for a Cell_LRB system though, in this current era of unproven tech and guesswork!

If you have a high clocked 32 cores LRB GPU... you want a CPU that does not suck as task parallel workloads (maybe a dual or quad core CPU with high single thread performance for each of the available cores).

A 32 nm Sandy Bridge based CPU would be quite nice for PS4 if paired with LRB.

My PS4 system would have a UMA set-up... Considering the huge internal bandwidth of LRB (ring-bus) and the fat local caches per LRB core (256 KB, rendering everything to a local tile) as well as the fact that PS4 will likely push 1080p as standard resolution instead of going to PC like resolutions (aside from whatever 3D TV's will demand of PS4)... maybe GDDR5+ or better XDR2 (unless GDDR5+ and GDDR6 arrive quite soon and XDR2 is not competitive with them by the time PS4 architecture is finalized).

I still think that RAMBUS develops amazing technology and their efforts with PS2 and PS3 prove that.

Acert93 · Nov 23, 2009

Shifty Geezer said:
If Larrabee is viewed as a versatile GPU rather thana CPGPU hybrid, a Cell+Larrabee PS4 would allow developers to port existing code and practices, provide BC, and still use Larrabee as GPU. If you lose Cell completely, developers are starting completely from scratch yet again!

Hogwash!

Almost all software on the PS3 is multiplatform. Only PS3 exclusive developers are losing anything. As for GOOD design practices, memory management, etc that information won't be loss.

What people are argueing for is a fascination with Cell and all its problems and asking Sony to stupidly create a console with 3 distinct CPUs :!:

How about that for blunt?

So answer this question: Why a CELL+Larrabee and not Larrabee+Larrabee?

You could use the Larrabee cores however you want--not just graphics. Why tell developers, "You have 2 CPU types and 1 GPU type, you must use them in that way" instead of "Here is a general pool of Processors--go at it!"

upnorthsox means the SPEs are there for CPU workloads, not graphics rendering. Leave that to whatever GPU is used, nVidia, AMD or Larrabee.

And my point is if you go with CELL and Larrabee, CELL (SPEs) becomes unecessary--even a negative.

3 CPU types is far too much specialization and complexity.

The CPU aspect is well known and understood at this point. Writing Larrabee code is a complete unknown. At best you just chuck shader code at it like any other GPU.

If the SPEs are well known and understood then Cell is a failure (look at all the CPU limited software on the PS3). You would argue as from a different thread: No it isn't! The issue is lead platforms and utilization

Just because something is well known doesn't make it well utilized.

That is where unification goes: Better utilization through simplicity. As for Larrabee being "Unknown" it uses x86 cores similar to the old in-order Pentiums. The vector extensions are new, and having TMUs next to more traditional CPU cores is new, but the reality is it a whole lot of x86 cores that can also run GPU shader code. Nothing hard or new about that--but extracting peak performance will be hard.

Same matra as Cell. The difference is on the PS3 you have a 1xPPE, 7xSPEs, and 1xRSX with 24 Shader "Cores" and 8 Vertex "cores". I don't think developers want that level of complexity with multi-million lines of code and 2 year development windows.

Because the Larrabee's are busy rendering graphics.

If you only have Larrabees they won't all be rendering graphics.

Anyhow, the question was addressing, imo, the stupid idea of Sony using CELL and Larrabee.

I still haven't see a good arguement to use CELL for a CPU and Larrabee as a GPU when you could JUST use 2xLarrabees.

But IMO a Larrabee only system at this point is going to be as bottlenecked by code as PS3 was. Not quite so bad, because Cell has started everything thinking many-core and parallelising tasks, but still.

I don't disagree you won't have performance bottlenecks. And I don't think Larrabee cores per mm^2 would be faster than SPEs when both have optimized code.

But this isn't a science project. There is a lot to be said to have 1-resource type where code runs everywhere and optimize what needs to be optimized--keep it simple stupid. The bottleneck this generation is all those "lazy developers" -- giving them these really complex heterogenous CPUs and complex GPUs isn't smart.

As it is GPUs have eaten a lot into SPE work. Practically Xenos, on the same budget as RSX, does the same work as RSX+some SPE resources.

Why have 3 different CPU types that share the same workload capabilities? That is a LOT of specialization.

As for changing LS to cache--you lose a ton of performance. Why not then change the SPEs to PPCs as well? At least then you can share the code between the PPE and SPEs

Oh wait, that is essentially a standard CPU (PPC+Cache) with a Vector Unit. Sounds a lot like... Larrabee!

Shifty Geezer · Nov 23, 2009

Joshua Luna said:
So answer this question: Why a CELL+Larrabee and not Larrabee+Larrabee?

I've already answered that. I'm not going to get into a big debate because I'm not siding with Cell+Larrabee, but if, hypothetically, LRB turns out to be a really good GPU, you'd want it, and if you want code portability with what you've developed this gen, you'll want Cell, and so there's an argument for including both cores. Of course one could choose Cell and a different GPU instead, or LRB only. All are options.

You could use the Larrabee cores however you want--not just graphics. Why tell developers, "You have 2 CPU types and 1 GPU type, you must use them in that way" instead of "Here is a general pool of Processors--go at it!"

If you have Cell and LRB in a box, and the devs use LRB for game processing, what exactly are you going to rending the graphics on?

If the choice is made to include Cell as an architectural advance on the current system, then whatever other processor is in there is going to have to do graphics, because Cell isn't up to it.

But this isn't a science project. There is a lot to be said to have 1-resource type where code runs everywhere and optimize what needs to be optimized--keep it simple stupid. The bottleneck this generation is all those "lazy developers" -- giving them these really complex heterogenous CPUs and complex GPUs isn't smart.

You're preaching to the converted, man! I was in support of the single core back with Cell's announcement! I advocated Cell everywhere as a scalable architecture; a single hardware platform for consoles, mobiles, TVs etc, that'll run whatever code and scale to the purpose. LRB fits that job nicely too. It's the future. May not happen next-gen though, just as it didn't happen this gen, because other solutions will offer better returns in other areas and the simplicity factor isn't so valued as performance or BC or cost or goodness-knows-what other excuses are raised!

Acert93 · Nov 23, 2009

Shifty Geezer said:
I've already answered that. I'm not going to get into a big debate because I'm not siding with Cell+Larrabee

Then why the contradictory responses to my point? My post was addressing the Cell+Larrabee angle some were taking.

I agree, if you have a Cell+Larrabee system, you should use Larrabee mainly for graphics. But that wasn't my point. My point is why would anyone ever WANT a Cell+Larrabee configuration?

That is a frankestein design. I can understand a Cell+GPU design as they are still quite different. I can understand an OOOe CPU + GPU. I even "get" a Fusion design+GPU (moving a shader array onto the CPU die). But SPEs and Larrabee cores are too similar and adds too much complexity. I don't think this system would justify the complexity.

You're preaching to the converted, man! I was in support of the single core back with Cell's announcement! I advocated Cell everywhere as a scalable architecture

The problem, of course, is Cell is "2 cores" and lacks performance in basic graphics (like texturing).

As for Larrabee, don't get me wrong, I think it is going to have a HARD time competing on performance. For mm^2 AMD/NV may have a 4x performance gap. That is a big difference in "pretty pixels." I also think the die of "one huge core" can compete with 2 similar sized chips in terms of cost or performance are wrong: A single big chip is going to have lower yields (everything from a critical bad spot, less effective redundency, to wafter utilization, especially at the edges) and will have a harder time managing power and heat issues. So splitting the silicon budget across 2 chips makes sense, at which point the CPU/GPU split works well (or 2xLarrabee). I don't think we are quite at the point of a single chip/single core can make up enough room in unproven "gains of efficency" to outduke the proven CPU/GPU model.

My only contention is if you use the CPU+GPU model I don't think developers will like 3 or 4 CPU types running around. Keeping it as simple as possible while keeping performance competitive is the desirable zone IMO.

Now if Sony can turn out a Cell-a-bee I think a Cell+Cellabee makes some sense.

AlNom · Nov 23, 2009

Joshua Luna said:
My only contention is if you use the CPU+GPU model I don't think developers will like 3 or 4 CPU types running around. Keeping it as simple as possible while keeping performance competitive is the desirable zone IMO.

In which case, they'd all probably prefer something closer to the PC setup.

(And now rewind to 2001...)

Yemeth · Nov 23, 2009

nAo said:
That would actually upset developers

Hello,

as someone who is in the industry (and who has not talked about the PS3 in the quite vocal "it is different = it is wrong" way), what are your thoughts?

I am especially interested in two things (from the other contributors too):

a) Your favorite solution (ignoring cost etc.)
b) Your business solution (looking at cost, developer support, usability, hardware reliability etc.)

Because, IMO, the most interesting thing is to discuss about the gap between these two solutions, finding something which can be the perfect compromise.

What I could never understand is that companies made such decisions that you have to ask yourself if they learned anything from the past (if they do not learn from the mistakes of others, at least, pretty please, learn from your own mistakes).

Cheers,

Mijo

Acert93 · Nov 23, 2009

AlStrong said:
In which case, they'd all probably prefer something closer to the PC setup. (And now rewind to 2001...)

Just rewind to 2005

And most developers still find the "PC like setup" easier. As it stands both consoles are very PC like in a lot of ways. And we have heard the complaining about Cell which, according to some in this thread, isn't really different from a processor with Cache. If those lazy devs have a problem with LS, what are they going to think when you toss even more complex environments at them and say, "Now you need to use 64 cores, 12 of one kind, 12 of another, and 40 of another"

I think movement is back to a more uniform design, but someone is going to have to prove it works first on a competitive field. Until then the CPU+GPU model is very familiar, VERY mature, and produces good results.

Shifty Geezer · Nov 23, 2009

Joshua Luna said:
I agree, if you have a Cell+Larrabee system, you should use Larrabee mainly for graphics. But that wasn't my point. My point is why would anyone ever WANT a Cell+Larrabee configuration?

Because Cell is the best CPU ever that runs all my current code, and LRB is the best GPU ever that is lovely and flexible, so I want a system with both in.

It is not an argument I subscribe to! I think AMD would provide a better device as a discrete performance graphics part, coupled with Cell (if you're having Cell in there). However, it is a legitimate argument why someone would want Cell and LRB in PS4.

Acert93 · Nov 23, 2009

Shifty Geezer said:
Because Cell is the best CPU ever that runs all my current code, and LRB is the best GPU ever that is lovely and flexible, so I want a system with both in.

It is not an argument I subscribe to! I think AMD would provide a better device as a discrete performance graphics part, coupled with Cell (if you're having Cell in there). However, it is a legitimate argument why someone would want Cell and LRB in PS4.

Fine, a developer can think that. But I don't think we can ignore history. Look at what happened this generation. Budgets will be bigger, staff larger, deadlines no bigger, and something has to give. Tools, hardware, whatever.

Adding another level of explexity wouldn't make the ERPs, nAos, and Carmacks of the world cry--but it won't result in better products, either.

The biggest mistake a lot of people made at the beginning of this generation while discussing hardware was constantly abstracting hardware from the application environment. How many sub-performing games did we need to see before a consensus was, "Cell is 3x times faster than Xenon on paper, but in practice getting EQUAL performance from Cell is HARD. Lets not even talk about getting that 300% better performance."

If you listen really, really carefuly to a couple devs hear, read their blog, read their PPTs, you will see that a lot of them are moving a direction contrary to doing PPE+SPE+GPU. And the handful I am thinking of like Cell and praise it. They may not get exactly what they want in 2012, but I don't think any of them based on what I am reading think adding yet another CPU (ISA, memory structure) to manager and exploit makes a better game.

Maybe I am too simplistic, but this is how I see it: HW is there to solve a problem. Throwing in "yet another core" is a solution looking for a problem.

GPUs become popular because there was a problem (3D gaming) that CPUs couldn't solve adequately.

Generalized GPUs are trying to solve a problem (CPUs are so general they have few execution units per area) that CELL is addressing. Cell may be better in a LOT of workloads, but as these GPUs become more generalized (+directly attached to those graphics tasks, like texturing, that Cell/bare-bone x86 suck at) I don't think it makes any business sense to have this middle ground, especially in a console part.

BUT that doesn't mean we won't see Cell+High End GPU. Could be a killer system. Then again if Fermi does everything NV hopes, in 2-3 years the "Next-Fermi-like-Jump" could make a lot of the SPE workloads that justify its existance next to the PPE redundant.

Removing redundant parts is good (TM).

Weaste · Nov 23, 2009

Joshua Luna said:
As for the 256KB limit, besides not being cache (more work to manage LS), increasing the size means lowering LS performance.

I'm not a hardware engineer, so are you saying that there is no physical way to get half or even a full megabyte of local store with the same performance characteristics on a hypothetically modified SPE?

MfA · Nov 23, 2009

Removing redundant parts is good ... dedicating huge amounts of silicon to wide superscalar speculative processors with huge caches is bad.

A great big honking ~10 core CPU with ~10 MB cache is certainly not the ideal way to spend ~50% of the silicon budget in the next generation IMO. If you really want only CPU+GPU then putting a compact 2-4 core processor somewhere in the corner of the GPU core(s) would be more appropriate IMO.

Personally I'd like to see a narrower throughput/area optimized architecture like SPEs to continue to exist though ... because I think fundamentally they make a lot of sense, I think they strike the best balance overall. If SPEs had just had a small bit of hardware support to hide the local storage behind an illusion of flat memory I think it would have been in much better shape. Something like COMIC for instance could be implemented far more efficiently with a small TLB than in pure software.

corduroygt · Nov 24, 2009

How about using LRB as a CPU since it's really multipurpose and I'm not quite sold on its gpu capabilities, and using a GPU from one of the 2 proven GPU vendors, most likely NV for BC purposes. I have a feeling a 16 core LRB should be able to emulate cell.
Or...
If LRB comes through as an awesome GPU, you could have one CPU-like LRB (less cores but more cpu like capability in the expense of rasterizer) and one GPU LRB with more GPU-like cores. Later they could put both in the same die for the PS4 slim, and Intel is by far the best Manufacturer of chips with regard to power/heat/process shrinks.
If BC emulation is indeed impossible with this hardware, then the idea of PS3 BC accessory that could be plugged into a port like the PS2 HDD would be the best idea. BC is one of those things that CAN be optional, unlike a Hard Drive which should be mandatory.

Crossbar · Nov 24, 2009

History has shown that it is a very good thing to have control of the manufacturing of the CPU and GPU because you can take them to whichever foundry that offer the best price. Other arrangements have often turned out very costly. So having somekind of royalty based IP contract seems to be the normal way of handling this today, where the IP is working on a process that is supported by several foundries/companies.

So here is a question to everyone of you who think that some of the next generation consoles will be based on a processor from Intel. How could such an arrangement be achieved given that a lot of intels CPUs edge lies in their cutting edge process technology that no one else has?

I am not trying to patronize anyone, it´s a serious question, because I cannot see how you could arrange it so you will be able to fully take advantage of all possible cost-reductions from future die shrinks and such, possible cost-reductions that can not be foreseen nor given a price today. If you are depending on one single manufacturer for one major component how can you be assured that he will not be taking advantage of you. There is a lot of trust involved in such a contract and business does not work that way. Microsoft tried it once with intel and nvidia and it was costly.

Maybe this should be a separate thread, but I post it here because everyone suggesting intel CPUs are posting here. Mods, feel free to move it.

jlippo · Nov 24, 2009

Weaste said:
I'm not a hardware engineer, so are you saying that there is no physical way to get half or even a full megabyte of local store with the same performance characteristics on a hypothetically modified SPE?

One of the reasons for them to choose 256kB was that fetch latency was only 6? cycles.
Bigger the local store would be the higher the latency would be, there were some early IBM documents that stated this clearly.

Currently they do use SRAM, but DRAM might be interesting solution as they would fit more memory in same area and perhaps get decent enough performance for larger LS.

Crossbar · Nov 24, 2009

jlippo said:
One of the reasons for them to choose 256kB was that fetch latency was only 6? cycles.
Bigger the local store would be the higher the latency would be, there were some early IBM documents that stated this clearly.

Some of that latency should be attributed to signal progress latency which should come down some with smaller geometries, remember Cell is designed to work at 4GHz. I think they could up the LS at the current 45 nm process to let say 512 kB without introducing additional latencies.

A more interesting question in my opinion is if the net gain from uping the LS is more than using that extra die space for some more SPUs.

Gubbi · Nov 24, 2009

Crossbar said:
Some of that latency should be attributed to signal progress latency which should come down some with smaller geometries.

You've got that backwards. As geometry decrease, signal propagation delays increase.

The area you can reach in any given timeinterval is proportional to the pitch squared. As circuitry has less than perfect scaling when moving to a smaller process, the relative signal propagation delay increases.

Cheers

Prophecy2k · Nov 24, 2009

Crossbar said:
History has shown that it is a very good thing to have control of the manufacturing of the CPU and GPU because you can take them to whichever foundry that offer the best price. Other arrangements have often turned out very costly. So having somekind of royalty based IP contract seems to be the normal way of handling this today, where the IP is working on a process that is supported by several foundries/companies.

So here is a question to everyone of you who think that some of the next generation consoles will be based on a processor from Intel. How could such an arrangement be achieved given that a lot of intels CPUs edge lies in their cutting edge process technology that no one else has?

I am not trying to patronize anyone, it´s a serious question, because I cannot see how you could arrange it so you will be able to fully take advantage of all possible cost-reductions from future die shrinks and such, possible cost-reductions that can not be foreseen nor given a price today. If you are depending on one single manufacturer for one major component how can you be assured that he will not be taking advantage of you. There is a lot of trust involved in such a contract and business does not work that way. Microsoft tried it once with intel and nvidia and it was costly.

Maybe this should be a separate thread, but I post it here because everyone suggesting intel CPUs are posting here. Mods, feel free to move it.

I think it's a very valid point you make here and is completely relevant to the discussion at hand imho.

In response to your question though, I'd make the assumption that since Larrabee is an unproven "first" outing into the discrete GPU space for Intel, along with the fact that NV & ATI pretty much have a stranglehold on the descrete GPU market, I would imagine that Intel will be clambering hand-and-foot to get Larrabee into a console as it could potentially be "Trojan Horse" they need to ensure Larrabee a sccessful launch.

Based on that assumption I would imagine that Intel will be trying to bend over backwards for Sony or MS, offering them all kinds of "too good to be true" deals in order to get Larrabee into as many consoles as possible (and in turn get the development community at large comfortable with developing for it's rather unique architecture).

All assumption mind you... but if I was Intel I'd practically be looking to give LRB away

DuckThor Evil · Nov 24, 2009

Crossbar said:
History has shown that it is a very good thing to have control of the manufacturing of the CPU and GPU because you can take them to whichever foundry that offer the best price. Other arrangements have often turned out very costly. So having somekind of royalty based IP contract seems to be the normal way of handling this today, where the IP is working on a process that is supported by several foundries/companies.

So here is a question to everyone of you who think that some of the next generation consoles will be based on a processor from Intel. How could such an arrangement be achieved given that a lot of intels CPUs edge lies in their cutting edge process technology that no one else has?

I am not trying to patronize anyone, it´s a serious question, because I cannot see how you could arrange it so you will be able to fully take advantage of all possible cost-reductions from future die shrinks and such, possible cost-reductions that can not be foreseen nor given a price today. If you are depending on one single manufacturer for one major component how can you be assured that he will not be taking advantage of you. There is a lot of trust involved in such a contract and business does not work that way. Microsoft tried it once with intel and nvidia and it was costly.

Maybe this time Intel would be willing to offer a better deal, if they feel it's important for them to expand their business into the console realm, pushing Larrabee might make them feel that way.

grandmaster · Nov 24, 2009

Considering that Intel's anti-trust methods involved huge inducements to the likes of Dell etc, they're clearly no strangers to incentivising companies to make sure their chips are looked upon favourably.

Predict: The Next Generation Console Tech

Panajev2001a

Panajev2001a

Acert93

Artist formerly known as Acert93

Shifty Geezer

uber-Troll!

Acert93

Artist formerly known as Acert93

AlNom

Moderator

Yemeth

Acert93

Artist formerly known as Acert93

Shifty Geezer

uber-Troll!

Acert93

Artist formerly known as Acert93

Weaste

MfA

corduroygt

Crossbar

jlippo

Crossbar

Gubbi

Prophecy2k

DuckThor Evil

grandmaster

Similar threads