Practicality of a CPU/GPU hybrid in consoles

nonamer · Jan 28, 2007

How feasible are CPU/GPU hybrids as the sole processing unit of a console? This is clearly going into the [next] next generation of consoles so we need to project to the 2010-2012 time period. So we have some clear limitations we need to think about. It must be both performance competition but still cheaply fabricated. This clearly pushes this thing into the 400-500 mm^2 region, since that's the combined size of a CPU and GPU right now and is still within reason, but that's still pretty damn big and possibly out of reach. Anyone who pulls this stunt I think will be stuck with a $600+ pricetag at launch no matter what, if not even more if they want some other expensive feature in the console. It would also need to be at the very least fabbed on a 45nm process. Even so, that's not that impressive seeing how we are already starting to move to that process, or at least Intel is. The rest of the industry may take till 2008 at the earliest. It would be very desirable to see this thing at the 32nm node or a in-between node. Probably not possible for a 2010 launch, but would be possible if it is more like 2011 or 2012.

The most important question is what exactly is this thing suppose to accomplish that separate CPUs and GPUs can't. An immediately obvious benefit is that you can have an insanely fast bus from the "CPU" to the "GPU", which allows for a degree of coordination not seen before. The problem is figuring out what this entails. Possibly you can have a very large scratchpad eDRAM or ZRAM that serves as both a framebuffer and a low latency memory pool. I also imagine that you can do physics on this super-CPU on a level you couldn't before, but this may not apply to Cell. I'm sure other neat tricks and other general improvements will be possible too. The last question is performance. There would be no way to predict this accurately, but I imagine it would be in the TeraFLOPS region, utilizing billions of transistors.

I still can't help but think that this is borderline nuts though. With a chip that big, you better have pretty good fabs and can scale it down fast. Either that or the consumers learn to accept $600 consoles that stay at that price range for 2 or more years (basically the PS3 is completely legitimized and then some). Maybe Sony would try it if the PS3 is a major success, but I'm not sure if MS is willing to do that.

Lazy8s · Jan 28, 2007

Combining the main processors onto one chip, even if it were relatively large, should only serve to make the console more affordable.

While fully programmable execution units could service both CPU and GPU workloads, specialized processing units for graphics tasks like tiling and texturing could likely still be needed for competitive performance.

For graphics, a two chipset solution might be best considering the current state of semiconductor technology. RAM chips with on-die logic for tiling might minimize the costs of binning, while shader/logic chips with on-die memory could process the tiles and minimize unnecessary calculation and memory access.

Shifty Geezer · Jan 28, 2007

nonamer said:
How feasible are CPU/GPU hybrids as the sole processing unit of a console? This is clearly going into the [next] next generation of consoles so we need to project to the 2010-2012 time period. So we have some clear limitations we need to think about. It must be both performance competition but still cheaply fabricated. This clearly pushes this thing into the 400-500 mm^2 region.

You've answered this topic yourself! Why didn't they do that this time around? Because the cost of an enourmous slab of silicon like that would be crazy, especially when you factor in defects. For any CPU+GPU combination, you could split it into two chips at a fraction of the cost. High speed communications probably isn't needed too much. You won't need hundreds of GB/s as much of the GPU work has to work from cached, complete and addressable data. The Cell<>RSX interconnect shows high bandwidth external busses are doable, and those will improve with speed over time. Thus in the future you could have 2 separate dies with 100 GB/s interconnect if you really want that, and have other options like split memory pools to get more system bandwidth.

I don't see combined CPU+GPU ever being a consideration for a console short of later cost reductions.

Arun · Jan 28, 2007

If a console is designed around being launched at a reasonable price point (*cough*), then it would make sense to unify the CPU and GPU on a single die, imo. Think of it as one 250mmÂ² chip rather than two 125mmÂ² ones, which remains quite reasonable given a bit of redundancy. In the current scheme of things, it'd be one 500mmÂ² chip rather than two 250mmÂ² ones though, which obviously doesn't work quite as well!

I'm skeptical it'll happen anytime soon, but I'd argue it does make some sense if the architecture is properly designed around it. Better collaboration between the CPU and the GPU *is* the future, arguably.

Uttar

Shifty Geezer · Jan 28, 2007

That's true if you're aiming for well below cutting edge power though. Typically the super-power launch hardware needs the largest dies possible at launch at the economic threshold of the current manufacturing node. If we take the current designs, 300mm^2 seems to be about the big chip size. For a unified CPU+GPU you'd need that 300 mm shared between both, and a substantial drop in power. Unless that unification brings you super-dooper performance boosts due to CPU<>GPU collaboration, you're console is going to look much weaker next to your rivals with 600 mm^2 of transistors shared between two chips.

Then again, maybe the days of cutting edge tech are numbered, and in future a more economical, less powerful standard becomes the norm?

Capeta · Jan 28, 2007

Assuming MS and SONY continue along the same path that they're going, their consoles will still consist of a fairly large but separate CPU and GPU. I think for backwards compatibility sake they'll stay with the same architectural design and just increase the number of PPEs, SPEs, shader units, etc.

Nintendo may take the single chip CPU+GPU model and stay within a certain die size to keep costs down since they likely will not be aiming for the most powerful console. Also Nintendo has already said they would like to continue to keep the size of the console small so smaller chips would help in that regard.

Lazy8s · Jan 28, 2007

The PS3 seems to already be a start down the path with Nintendo, though not as extreme and not with the cost benefits, of somewhat less risky, trailing edge processing designs.

Microsoft might be going it alone in the all-out performance market next time.

rounin · Jan 28, 2007

Wasn't this done on the PS2 slims?

Npl · Jan 28, 2007

Lazy8s said:
The PS3 seems to already be a start down the path with Nintendo, though not as extreme and not with the cost benefits, of somewhat less risky, trailing edge processing designs.

I disagree.
It was scheduled to be released within half a year with the XBox360. It has a very risky CPU, just think about the lost investments if Cell turns out to be a failure outside of PS3 (not saying tthat I think it will) - theres alot stuff that couldve been made simpler if PS3 wouldve been the only target. Die-sizes are bigger than XBox360`s. Bluray which surely had alot of hurdles.

Now for a more interesting discussion: Perhaps having 2 Hybrids instead of CPU+GPU (with roughly the same total transistor-count) would bring some benefits, aside from having 1 component (+ manufacturing line) instead of 2. Think about "unified shaders", but instead you get 2 cores than can both act as (Vector-oriented)CPU or GPU - just as you need it. For example that theorectically could allow "classic" rasterized Games to be run with 1CPU & 1 GPU (maybe even dynamically loadbalanced as 2 CPUs or 2 GPUs) and others to run via raytracing. Im speaking of theoretical possibilities here, I dont even want to think about the troubles such a Hybrid would bring

Shifty Geezer · Jan 28, 2007

Wasn't this done on the PS2 slims?

Yes, but that's different as it's consolidating old tech to make it cheaper. For comparison, if the PS2 launched in 2004 it could have launched with CPU+GPU, and with technology 4 years 'out of date.' Likewise, if PS3 was to launch with CPU+GPU combined with current specs, it'd need to launch around 2009-2010 to be able to get them onto a cost effective chip.

Acert93 · Jan 28, 2007

I think before we would ever see a CPU/GPU hybrid at a console launch (read: Something of the PS3/360 ilk, not Wii, in relative terms) I think the first steps would be evolutionary: the process of certain CPU elements being implimented on the GPU level and the movement of certain GPU elements to the CPU.

Arguably we have already seen exactly this, but it appears there are a lot of steps remaining between here and the "goal". And would a pure hybrid ever be good enough as an all around processor to outperform 2 discreet architectures that have some overlap and specially designed high-speed interfaces? Maybe when we get to the point or realtime raytracing and GI, but we are a decade or more from such?

At the rate we are going, even if there is a point where we see solutions that are multifaceted, could we not still see multi-chip designs? e.g. If Cell had been acceptible for graphics and processing of game code, we probably would have seen 2 Cell chips and not 1 2PPE/16SPE monster. Which makes the arguement: if there will always be 2 or more chips due to die costs, why not specialize cores for their primary task while allowing significant overlap?

A lot of questions, a lot of possibilities to design around.

Capeta · Jan 29, 2007

Well a PPU chip could be added to a console but now you have 3 chips. I guess instead of having a large expensive CPU that handle a lot of physics you could have a cheap small CPU and a cheap small but separate PPU. Say a CPU that was 100mm^2 and a PPU that was 80mm^2. I could see these two being on the same package using separate dies which may be cheaper than a single CPU that was 180mm^2. In terms of performance I wouldn't know which one would be better.

Acert93 · Jan 29, 2007

Would a fixed use chip, like a PPU, be a worthwhile investment? Maybe Cell2, Terra-Scale, GPUs, etc won't offer the same performance as a dedicated PPU, but they would obviously be much, much more versatile and do many things well instead of only do one thing.

blakjedi · Jan 29, 2007

what is the expandability of Xenon in the future... i cannot see where they can go from here...

Capeta · Jan 29, 2007

blakjedi said:
what is the expandability of Xenon in the future... i cannot see where they can go from here...

More PPEs and larger cache.

AlNom · Jan 30, 2007

blakjedi said:
what is the expandability of Xenon in the future... i cannot see where they can go from here...

Well, what would stop them from appending special purpose cores to the basic tri-core design? They could still improve clock speeds and increase the main cache and add more functionality (complete dual thread instead of just dual issue? I never understood what was going on there).

TheChefO · Jan 30, 2007

Joshua Luna said:
Would a fixed use chip, like a PPU, be a worthwhile investment? Maybe Cell2, Terra-Scale, GPUs, etc won't offer the same performance as a dedicated PPU, but they would obviously be much, much more versatile and do many things well instead of only do one thing.

Wouldn't this depend on how much time cpus at the end of this gen are spending doing physics work?

If a large enough chunk of this time was dedicated to physics it would seem worthwhile to invest in specialized silicone for the task. Similar to MS's decision to include edram to offload the bandwidth requirements of the shared system ram.

Isn't physics the biggest beneficiary of cells design for use in next gen games?

Sure not every game will make use of this tech but then some games don't need bleeding edge tech in this regard anyway. As long as it didn't take up too much of the transistor budget I think a ppu would be a great way for MS to go. CELL is already designed to handle heavy physics calcs so I would think in Sony's case this would be a waste of their time/budget.

Acert93 · Jan 30, 2007

Capeta said:
More PPEs and larger cache.

That would be insane.

Xenon works (3 general CPU cores, 1 "large" shared cache) because there are so few cores.

Now imagine in 5 years, with the typicaly 8x increase in transistor budgets, trying to fit 24 cores one a chip with 1 memory controller and 1 large shared cache.

The I/O issues and inner chip communication are huge, huge hurdles that won't be easily overcome--and just going the Xenon approach would be utter disaster for utilization.

I recently posted some links to what Intel has in mind. They are looking at using nodes or clusters. The chip would have 32 processors in 8 clusters. Each core would have 4 threads and access to shared 512KB cache; each 4 core node would also have a 3MB of cache connected (24MB for all 8 nodes) on a ringbus. Each node, importantly, will have its own memory controller at 12.8GB/s for a total of 102.4GB/s of system memory bandwidth. While I am not suggesting any console will see such a chip, the point would be this: Note the strong emphasis on inner chip communication as well as access to system memory.

Terra-Scale and Cell are other design choices that put a significant emphasis on overcoming communication and memory bottlenecks. E.g. Cell has the extremely fast EIB bus and Local Store for SPE memory (and toss XDR in, with the low pin count and high bandwidth for good measure).

TheChefO said:
Wouldn't this depend on how much time cpus at the end of this gen are spending doing physics work?

If a large enough chunk of this time was dedicated to physics it would seem worthwhile to invest in specialized silicone for the task.

Why would it be worthwhile?

Do all games require extremely intensive physics?

Is physics even the largest user of resources?

Do physics chips even work? Are physics chips better (power & FLEXIBILITY) at physics than other solutions (Cell, GPUs, etc) i.e.

They are an unproven concept with little to no funding. A "pie in the sky" chip with meager R&D is going to be exactly like the Ageia chip: old process (130nm?), low frequency, and poor performance.

Similar to MS's decision to include edram to offload the bandwidth requirements of the shared system ram.

eDRAM has been around a very long time. Infact Wii and the Xbox 360 use eDRAM, and last gen the GCN and PS2 used eDRAM. And here is the kicker:

Every game utilizes the eDRAM and benefits from it.

It is like argueing that system memory offloads bandwidth and content storage.

Isn't physics the biggest beneficiary of cells design for use in next gen games?

Why don't you tell us if it is?

Obviously some are obsessed with shoehorning Cell as only good at certain tasks, but that is superfiscial. One need only look at all the code Sony has generated to see how broad Cell can be used. e.g. They have a hugely impressive demo for pathfinding. Likewise they have demonstrated raytracing on Cell, which according to the benchmarks as well as others here (like mhoustan) is quite impressive at.

If developers are to be believed, by the end of the PS3's lifecycle most tasks will be accomplished on the SPEs. They take new approaches to work with, but they are very fast and have extremely fast memory. So while physics, sound, geometry, path finding, compression/decompression, post processing and graphics tasks (like raycasting for volumetric clouds) may be the low hanging fruit, AI, renderer, and other tasks are not far behind.

Just what we see being used now is far outside the scope of "physics" -- and it isn't the biggest beneficiary. Actually, many have argued that physics is not inately parallel -- at least not how it has been accomplished in most software to this point.

And minor factoid: I believe Epic had commented that Cell was easily comparable to the PhysX in performance. Interestingly they have a very similar design (the Ageia chip has 1 main processor, I believe is MIPs, and I believe 8 smaller vector processors).

Sure not every game will make use of this tech but then some games don't need bleeding edge tech in this regard anyway.

While you may not be saying this, your comment comes across as: If it isn't using physics it isn't cutting edge.

And here is why it comes across as such: By including dedicated silicon that ONLY can be used for physics you are TELLING developers: Do physics, or your game WON'T be bleeding edge because, put simply, there is no performance for anything else as the silicon budget is spent on the PPU.

There is more to games than physics. I would take cutting edge AI over physics.

As long as it didn't take up too much of the transistor budget I think a ppu would be a great way for MS to go.

You have concluded what you have suggested without providing any evidence for why it is the correct conclusion

Even in the above statement there is a major assumption: Transistor budget.

Can MS get a performant PPU that is on par with Cell or other solutions for physics with a very small transistor budget?

If it were possible Sony and Nintendo could just add one in!

CELL is already designed to handle heavy physics calcs

Was it designed to do this?

so I would think in Sony's case this would be a waste of their time/budget.

Not if it is as cheap as you are suggesting.

Btw, take a long look at Xenos and G80. Unified Shaders. Why? More flexiblity and higher utilization. Lets assume for a moment that a Xenos and G80 ALU is slower at vertex work than a G70/R580 vertex ALU. Lets say horribly slower -- by 50%.

So in theory G70/R580 walk away in vertex work, right? Nope, for the reason that G70//R580 have 8 vertex ALUs (or ~ 15% of their ALU resources) for vertex work, whereas Xenos and G80 can dedicate 100% of their ALU resources to vertex work. So instead of being 2x as slow, they are MUCH faster.

So while G70/R580 are still twiddling away ar vertexes with their "uber specialized vertex processors" Xenos and G80 have those SAME ALUs already working on other tasks.

The same principles apply to a PPU. Even if it was 5x faster, so what? You did physics 5x faster? But Cell is also accellerating AI and Path Finding, Sound, Geometry, Post Processing, the Renderer, Compression/Decompression, and so forth. Physics may only take up 20% of your execution time. Then what? You spent 30% of your silicon budget on it--and it sits idle 80% of the time.

And in some DEMANDING games it isn't even used at all.

When we actually see a working PPU come to market that, per transistor, can squash Cell by a factor of 10, it may become interesting. Until then there are many, many areas in game code that are currently too slow to go this route. And as it stands we don't even know if a fixed function PPU will be versatile enough to meet the demands of tomorrow and the problems developers see. Take fixed function graphics as a PERFECT example of how having inflexible hardware keeps developers in a box.

Capeta · Jan 30, 2007

Joshua Luna said:
That would be insane.

Xenon works (3 general CPU cores, 1 "large" shared cache) because there are so few cores.

Now imagine in 5 years, with the typicaly 8x increase in transistor budgets, trying to fit 24 cores one a chip with 1 memory controller and 1 large shared cache.

The I/O issues and inner chip communication are huge, huge hurdles that won't be easily overcome--and just going the Xenon approach would be utter disaster for utilization.

I recently posted some links to what Intel has in mind. They are looking at using nodes or clusters. The chip would have 32 processors in 8 clusters. Each core would have 4 threads and access to shared 512KB cache; each 4 core node would also have a 3MB of cache connected (24MB for all 8 nodes) on a ringbus. Each node, importantly, will have its own memory controller at 12.8GB/s for a total of 102.4GB/s of system memory bandwidth. While I am not suggesting any console will see such a chip, the point would be this: Note the strong emphasis on inner chip communication as well as access to system memory.

Terra-Scale and Cell are other design choices that put a significant emphasis on overcoming communication and memory bottlenecks. E.g. Cell has the extremely fast EIB bus and Local Store for SPE memory (and toss XDR in, with the low pin count and high bandwidth for good measure).

I can't see why this would not be feasible. If you look on the PC desktop side, we'll likely be seing 8 core CPUs not too long, we already have quad cores. Most likely we'll see Xenon2 with about 6 PPEs and more cache fabbed on 45 or 32nm. I guess for Xenon3 they might move to clustered cores but that won't happen in 5 years, more like 9 years time IMO. 8-way SMP servers do exist and since dual and quad core CPUs today are basically 2-way and 4-way SMP, I don't see the need for Xenon to go to clustering with less than 8 PPEs.

TheChefO · Jan 30, 2007

Joshua Luna said:
When we actually see a working PPU come to market that, per transistor, can squash Cell by a factor of 10, it may become interesting. Until then there are many, many areas in game code that are currently too slow to go this route. And as it stands we don't even know if a fixed function PPU will be versatile enough to meet the demands of tomorrow and the problems developers see. Take fixed function graphics as a PERFECT example of how having inflexible hardware keeps developers in a box.

First of all thanks for the lump on my head!

Problem: MS has to come up with a box with comperable performance to cell2 in the next five years.

Options:

A) license Cell2 tech
B) extend Xenon architecture
C) license a new AMD/Intel/IBM cpu design
D) add cell functionality with multiple chips

What types of operations does Cell hold an advantage over traditional architecture?

Physics
Graphics processing
Decompression
Sound
Tesselation
Raytracing
etc.

If one believes cell can do these tasks in realtime effectively for in game use and have enough time left to accomplish other tasks necessary for a game then you'd have to consider these abilities as crucial for their next cpu design.

Choices:

A) They would have to pay Sony/IBM/Toshiba an unrealstic fee to license this tech as Sony will not want MS to be able to compete head to head on price and will squeeze them on this. Or perhaps KK invites Bill over for milk and cookies and decides cell2 adoption is more important than ps4 marketshare.

B) As explained in Joshua's post, there are many issues that would need to be addressed in order to have this design scale to comperable performance to cell2 and would likely cost significantly more. Advantages being this design would be backward compatible, and would be more versatile.

C) Expensive and risky. For traditional pc conversions it would be ideal to get a psuedo athlon or Core2duo but getting either company to design a custom version of these chips that they would feel comfortable licesnsing the design to MS for a reasonable sum is unlikely. If they want amd/intel they will most likely have to buy the finished fabbed chips which would be cost prohibitive (xbox1). IBM could design yet another custom chip for MS to compete with cell2 performance. But again with these designs there will be some tasks that cell will simply excell at and in order to get comperable performance out of these cpus would require a much larger chip than cell2 due to cell's architecture advantage of cross chip bandwidth communication and streaming design.

D) Assuming a ppu design could compete with cell2 in physics calculations I'd say this is the most cost effective way to attain performance parity (+/-100%) with cell2. The architecture from what I've read is similar to cell. If this is true, one would imagine it may also be good at other tasks that cell currently has an advantage in like raytracing. Not to say a ppu is equal to cell just that it may help enough in these tasks to bring system performance to competitive levels.

Of the advantages that cell has over traditional architecture, Raytracing and Physics seem to be the only ones that cannot be effectively attained by the simd units or gpu already in xenon. Both of these operations rely heavily on fast/wide internal bandwidth and fast dsp-like processors. The ppu design provides that. The ppu design/company I imagine would also be cheap to buy at this point.

Cost - Performance

Of the options that MS has I'd say the best bang for the buck would be a combination "B" and "D".
Assuming they wait till 32nm process is available and they have a similar die size budget to xbox360, the design would be:
6 core 2mb xenon
expanded ppu 8x over existing design
xenos expanded 8x transistor/die size + future design considerations

This would place them in the die range of the existing xbox360 design currently.

If MS decides to launch in 2010 I don't believe 32nm will be ready for cheap mass production yet. In that case I don't think it would be the best way to go and they would probably be very gpu centric in their design with the hope that it could offload some physics/raytracing work from the cpu.

This post assumes of course that a ppu is capable of similar performance to cell in physics and raytracing.

Why Physics?

Physical interaction with the game world is one area that should not be overlooked. Graphics can improve to the point of being indistinguishable from a photo but if those graphics do not behave realistically in their world then the illusion is broken. Having accurate enough physics for game world objects is important. Detailed collision detection and physics based animation are two areas that could be used to take gaming to the next level but only when hardware is available to handle these tasks in realtime along with everything else it must do. Will some games not benefit from advanced physics abilities? Sure. But these games typically are not pushing the technical limits of the system in the first place.

MGS4
Heavenly Sword
GTA4
GT HD
Halo3
Gears of War
Madden
Forza2
PGR3
Motorstorm

All of which would benefit from improved physics ability and I would consider all of these to be system seller AAA titles. Exceptions exist of course namely RPG's but even they could use physics to improve the game world as well. Point is the titles that define and sell the console would/could/do benefit from improved physics. Not all games - but the ones that count.

AI:

I agree AI is important but improvements in this field seem limited by the code not by hardware and the need for improvements in this regard is somewhat held back with the emergance of online play.
If a developer decided AI was much more important for his game than physics could he not use the ppu for help in this regard?

Practicality of a CPU/GPU hybrid in consoles

nonamer

Lazy8s

Shifty Geezer

uber-Troll!

Arun

Unknown.

Shifty Geezer

uber-Troll!

Capeta

Lazy8s

rounin

Npl

Shifty Geezer

uber-Troll!

Acert93

Artist formerly known as Acert93

Capeta

Acert93

Artist formerly known as Acert93

blakjedi

Capeta

AlNom

Moderator

TheChefO

Acert93

Artist formerly known as Acert93

Capeta

TheChefO

Similar threads