Predict: The Next Generation Console Tech

Gubbi · Nov 28, 2009

fehu said:
what about a reworked power7+ with 4 traditional cores and 4 spe put there for compatibility and to not waste all the matured know how?

For a PS4 chip, just put 8 SPEs in there, in 22nm they will be tiny, - and they might even prove useful. You'd get 100% BC right there.

Then fill up with regular high performance (3+ wide, OOO) CPU cores with wide vector units (256/512 bits), developed with a dogmatic focus on power consumption (which could make P7 a poor fit). Each core should be accompanied by at least 1MB of last level cache.

Cheers

Blazkowicz · Nov 28, 2009

I agree so much on that one. while more complex than just leaving the SPEs out, it allows BC, reuse of code, some new code while not going over the top (32 SPE are redundant next to a Fermi or likewise GPU)

multiplaftform developers that don't want to touch SPEs can use SPE-aware middleware, then no big deal, there's that huge regular CPU and GPU gigaflops pools as on PC, X360 and X720.

what may kill the idea is, do console vendors want BC? they are as likely to not put any BC on the new consoles, because of internal competition.

Shifty Geezer · Nov 28, 2009

Blazkowicz said:
I agree so much on that one. while more complex than just leaving the SPEs out, it allows BC, reuse of code, some new code while not going over the top (32 SPE are redundant next to a Fermi or likewise GPU)

Any use of the GPU for GPGPU work means less graphics performance. And if you double up the GPU for graphics and non-graphics processing, you won't need vector processing on the CPU. So any console looking to utilitise GPGPU should just go with a strong conventional CPU to support it. The only reason to have a Cell-like CPU with stream processing is if the GPU will be completely preoccupied rendering the visuals.

As I see it, there are three identified workloads: Graphics rendering with its pecularities of memory access; vector workloads for specific tasks with high throughput; and general workload, 'integer', 'control code' which can also include jobs that could be optimised for vector processing but the cost means developers don't want to. I see three system-design options :

Cell-like heterogenous cores (everyday and vector processing) and GPU (graphics)
Regular CPU (everyday processing) and GPGPU (vector and graphics)
Larrabee-like homogenous cores with graphics hardware (everything)

Any mix of these strikes me as overengineering. Option 1 will of course have the possibility for GPGPU work whatever GPU is fitted, but you wouldn't use it because graphics rendering consumes all its cycles.

Blazkowicz · Nov 28, 2009

option 1 is still schizophrenic no matter what.

mind you if the PS3 didn't exist I believe it should definitely be option 2 ; including 8 SPE is a way to not throw everything out. It is schizophrenic and costs square millimeters, but less so than including PS2 hardware in the PS3.

In any case it's up to developers to use the resources as they're pleased. do physics on the CPU, on the GPU or both.

option 3, maybe for the PS5

Hornet · Nov 28, 2009

I think 6 out of order cores with a fair amount cache would be more than enough to feed the command buffers of the GPU and run decompression, audio processing, physics simulation, AI and networking, even considering next generation requirements. Such a CPU could easily fit in 10^9 transistors, which is going to be cheap at 32nm and extremely cheap at 22nm. This CPU could be paired with a two or three times bigger GPU (Fermi is 3*10^0 transistors). This transistor budget is probably too large at 32nm but completely doable at 22nm.

What tasks do you expect to run on something like 32 SPE? If developers are going to do stuff like fluid and particle simulation, they can use the GPU. Of course using the GPU takes away performance which can be used for rendering, but I doubt this kind of simulation is going to be used in many games and hardware design should take into account the typical usage.

ADEX · Nov 28, 2009

Shifty Geezer said:
As I see it, there are three identified workloads: Graphics rendering with its pecularities of memory access; vector workloads for specific tasks with high throughput; and general workload, 'integer', 'control code' which can also include jobs that could be optimised for vector processing but the cost means developers don't want to. I see three system-design options :

Cell-like heterogenous cores (everyday and vector processing) and GPU (graphics)

Regular CPU (everyday processing) and GPGPU (vector and graphics)

Larrabee-like homogenous cores with graphics hardware (everything)

Any mix of these strikes me as overengineering.

There seems to be an implicit idea that all the tasks SPEs do can be just moved over the GPUs. This isn't true, GPUS can only handle highly parallelisable tasks. If your task can't be split across thousands of threads, a GPU wont be of much use.

CPUs on the other hand can do the same jobs a SPE can do but also not as well, there are many benchmarks that show this.

So, from a performance and power consumption point of view 1 is the optimal design.
However clearly it is not the ease of programming winner and that is Cell's downside.

2 is what most systems are and have been since the early 80s (or even 70s!).

3 wont work, you are assuming that Larrabee cores, being general purpose will be able to handle all the control work. They might be able to run it but just don't expect them to be fast at it, it's better to think of Larrabee as a GPU that just happens to have a CPU's ISA. Even Intel are not pushing this sort of design.

The big thing that could change all this though is OpenCL. It means you can write code once that can run on pretty much any sort of core. This could be a godsend for Cell developers and could mean Sony and possibly even others go with 1.

Hornet · Nov 28, 2009

ADEX said:
There seems to be an implicit idea that all the tasks SPEs do can be just moved over the GPUs. This isn't true, GPUS can only handle highly parallelisable tasks. If your task can't be split across thousands of threads, a GPU wont be of much use.

CPUs on the other hand can do the same jobs a SPE can do but also not as well, there are many benchmarks that show this.

What videogame-related algorithms are suited to a Cell-like CPU but not to a Nehalem-like CPU and at the same time not to a Fermi-like GPU? Can you make some practical examples? Most of the things I've read been implemented on the Cell are easily doable on a modern CPU (AI, physics, sound processing, animation) or GPU (motion blur, depth of field, skinning, particle system). Of course we are talking of parts with a fairly larger transistor count compared to the Cell, but when planning for next generation I don't see the need for a Cell-like CPU, given the advances in GPU programmability and the ability to easily get a fair amount of out-of-order CPU cores in the targeted transistor budget.

SedentaryJourney · Nov 28, 2009

The point is for a given silicon budget a Cell type architecture is better performing than a complex x86 core. I think the GPU should be left as is instead of being some sort of remedy for cpu performance. BTW is fancy OOOE even necessary for a game console assuming somewhat optimized code?

Blazkowicz · Nov 29, 2009

yes it is necessary, as it's an optimisation that can only be done during execution, and the benefits are huge.

Hornet · Nov 29, 2009

SedentaryJourney said:
I think the point is for a given silicon budget a Cell type architecture is better performing than a complex x86 core.?

Better performing at what kind of tasks? Are these tasks performance critical for next-gen videogames? Can these tasks be run even faster on a different architecture Fermi and/or Larrabee? There are a lot of complex questions involved. I don't see the point of having a Cell-like CPU to run stuff like skinning, motion blur or depth of field, especially when paired with a next-generation GPU.

SedentaryJourney said:
BTW is fancy OOOE even necessary for a game console assuming somewhat optimized code?

OOOE/SMT are two different techinques to hide memory latency and improve utilization of execution units. Pretty much every modern CPU design uses at least one of these techniques.

Squilliam · Nov 29, 2009

Shifty Geezer said:
As I see it, there are three identified workloads: Graphics rendering with its pecularities of memory access; vector workloads for specific tasks with high throughput; and general workload, 'integer', 'control code' which can also include jobs that could be optimised for vector processing but the cost means developers don't want to. I see three system-design options :

Cell-like heterogenous cores (everyday and vector processing) and GPU (graphics)

Regular CPU (everyday processing) and GPGPU (vector and graphics)

Larrabee-like homogenous cores with graphics hardware (everything)

Any mix of these strikes me as overengineering. Option 1 will of course have the possibility for GPGPU work whatever GPU is fitted, but you wouldn't use it because graphics rendering consumes all its cycles.

Doesn't AMD already have a very compelling implementation for option 2? Bulldozer sheds a lot of the floating duplication between a GPGPU type setup and the CPU core so as long as the two are on the same die (fusion).

I don't see having both a GPU and CPU on the same die as a problem as the cores themselves are becoming more power dense with every die shrink. A next generation console may not have much more than a 250mm^2 silicon budget before you hit your 100-120W power limit so splitting it into two dies would be counter productive given you have to set aside X% of two smaller dies for the communication bus between them anyway and a smaller die would quickly become pad limited when shrunk which would bring you back to the concept of combining them.

Lets say they have an Xbox next project in the works based upon a fusion concept with both Bulldozer cores + R9xx GPU architecture with tweaks as the current Xbox 360 has a tweaked Xenos unified shader GPU. If we take a release date sometime either at the end of 2011 or the middle/late 2012 they can get samples of the new GPU architecture out to developers in the middle of 2009 a little after they tape out the new GPU kind of of the vein of the old Powermac hardware which were the early development kits for the 360. Then sometime in the middle of 2010 they can start releasing proper development kits based on early Xbox next hardware and fusion prototype designs then they can finalise the design in early 2011, tape out the finished chip and finish the final Q/A stages of development and release towards the end of 2011/start of 2012 using the 28nm bulk process from Global Foundries.

Does this sound plausible to you? I think the telling development recently between AMD and Intel was the fact that AMD doesn't have to pay royalties for using X86. This means they can start considering the use of the X86 I.P in consoles, furthermore AMD would stand to gain in large part due to the fact that games would be optimised to run on their X86 fusion design which would give them a similar advantage that ATI had with the R300 GPU as developers would be releasing software optimised for their hardware alone.

Acert93 · Nov 29, 2009

Shifty Geezer said:
Any use of the GPU for GPGPU work means less graphics performance.

Not if some CPU real-estate for stream-friendly-GP has been re-allocated to the GPU for stream-friendly-GP tasks.

In theory the inverse is true now (Cell is doing GPU tasks for RSX that are done on Xenos alone in the 360). The only issue is that Cell needs to do this to make RSX competitive.

Anyhow, the concept is no different from ones proposed by some here of stripping the vertex units from RSX and giving more pixel power and tieing vertex work more tightly with SPEs (maybe more realestate) for more flexible workloads tied more closely to the CPU for new techniques.

If GPUs prove better at some common physics tasks (or AI, post processing, etc) than CPUs, moving real-estate to the GPU isn't taking away from graphics but only investing in more GPU resources.

Saying using the GPU for GP-GPU assumes GPUs should only be utilized for graphics.

I think that concept will die sooner than later as chips like Fermi blow C2D, PPEs, K10s, and SPEs out of the water in streaming tasks. The question is how much to gamble on it being the future--being GP has a lot of advantages over saying, "Your code must be X big, your data format must be in Y construction, and you need to vectorize it--or your performance will suck."

Shifty Geezer · Nov 29, 2009

Joshua Luna said:
Saying using the GPU for GP-GPU assumes GPUs should only be utilized for graphics...

Yes. As you said earlier, in reality we have a variety of computational problems to solve and a pool of processing resources to solve them. However, typically in a game, developers target visuals above everything else. I can't imagine many developers thinking to cut back on possible graphics to allow for other gameplay features. If they have access to a processor capable of making their game look better, they'll use it for that. eg. Cell is being used as GPU support for that very reason.

Looking at PS3, there is a lot of processing power there, which is almost always thrown at graphics (animation, setup, post, etc.). How many developers have said, 'let's forget the graphics, just use RSX for that, and let's throw Cell at these sophisticated algorithms'? In that same vein, in the future will developers be thinking, 'let's create a new game that doesn't look as good as everything else out there but uses the GPU to calculate some amazing stuff' or 'we've got to aim for the best looking title possible, so let's not worry too much about AI and destructible materials and creating our organic universe'?

Indeed, looking at XB360, how many games are using the GPU to help with processing? The principle interest is just in creating better graphics. And from a business POV that may be the right one, as graphics is one of the strongest selling points.

I think that concept will die sooner than later as chips like Fermi blow C2D, PPEs, K10s, and SPEs out of the water in streaming tasks. The question is how much to gamble on it being the future--being GP has a lot of advantages over saying, "Your code must be X big, your data format must be in Y construction, and you need to vectorize it--or your performance will suck."

That didn't go down too well with Cell and it wasn't so demanding! However, everything is moving in that direction, so it'll be 'old news' and a common problem that all tool creators will be addressing. I dare so though that devs would prefer a current model. A GPU that could do everything asked of it, leaving a simple game engine on a fairly simple processor, would probably be the prefered choice for many developers. That's my guess anyhow.

Squilliam · Nov 29, 2009

Shifty Geezer said:
Yes. As you said earlier, in reality we have a variety of computational problems to solve and a pool of processing resources to solve them. However, typically in a game, developers target visuals above everything else...

We're approaching a point of diminishing returns in relation to graphics aren't we? Once you can render Crysis 3 in 1920 by 1080 with 5-6* the computational resources available to the current generation consoles then how are you going to distinguish your title from the other titles out there, especially with the increasing use of 3rd party engines which vastly lower the bar for all developers to produce appealing visuals?

Billy Idol · Nov 29, 2009

It seems to me that the most important aspect of the next-next-generation console tech is not related to hardware, but to software:

Console manufactures should allocate a lot more money in providing easy-to-use and nevertheless efficient/powerfull developing tools and sophisticated console operating systems right from the start of a new console. Judging this gen (which seems to be...err...next-gen, according to the thread title :mrgreen:

), an easy accessibility to the console and its potential hardware power seems to be of most importance...at least for 3rd party devs!

Shifty Geezer · Nov 29, 2009

Squilliam said:
We're approaching a point of diminishing returns in relation to graphics aren't we? Once you can render Crysis 3 in 1920 by 1080 with 5-6* the computational resources available to the current generation consoles then how are you going to distinguish your title from the other titles out there, especially with the increasing use of 3rd party engines which vastly lower the bar for all developers to produce appealing visuals?

Remember the EA Madden trailer before this gen launched? We're miles from that at the moment, and it'll take a hell of a lot more performance than we have now to reach that. There is loads of room for improvement still and just as this generation doesn't have spare cycles to toy around with, I don't think next-gen will have either.

neliz · Nov 29, 2009

Why do people always take extreme possibilities for the next-gen consoles and not feature on hardware that is available now? A much more likely candidate for a future console.

Why would Sony or Mictosoft take a risk with LRB for instance. the graphics part should (if it does this time) come out in h2 2010 or in 2011. Why would they take a risk with a part that has no realistic release date and risk your own hardware on that.

I think that Sony and Microsoft will take a leaf out of Nintendo's book and go with cheaper hardware (albeit more powerful then current gen) and expand on their touch/gesture/movement controllers. Why wait for certain 22nm parts when 32/28nm should be cheaper, less risky and I think that Sony doesn't want to repeat the PS3 launch.

fehu · Nov 29, 2009

Shifty Geezer said:
...

If we have a gpu with let say 1000 advanced something cores, if we want to calculate normal phisyc on it, how much cores we must utilize?
If the number is like 50, freeing the cpu, and reutilizing the data folating in the gpu cache can be better than separating the task and utilizing those 50 cores for graphic.
Plus you can chose and balance how much far you want to go between destructible collapsable buildings, and gorgeous looking forest.

if the number is more like 100, well then i must be wrong...

Squilliam · Nov 29, 2009

Shifty Geezer said:
Remember the EA Madden trailer before this gen launched? We're miles from that at the moment, and it'll take a hell of a lot more performance than we have now to reach that. There is loads of room for improvement still and just as this generation doesn't have spare cycles to toy around with, I don't think next-gen will have either.

Yes I do concede that we do have a long way to go before we can truely stop convincing ourselves that its photorealistic and actually produce photorealistic games. However we are still at a point of diminishing returns where developers have options to make trade-offs for better physics, destructable terrain/buildings etc without impacting too starkly on the overall image quality.

Having slightly better image quality may not sell a game better in the next generation if the differences are marginal. However having proper destructable buildings in the sense that your bullets can chip and crack bricks based on the trajectories of the bullets or you can have truely deformable/dynamic words then perhaps thats the kind of trade off the decision makers will have to decide upon. Furthermore a visual arms race is expensive, but using the extra performance to improve the game world in a dynamic sense may not be so.

ADEX · Nov 29, 2009

Shifty Geezer said:
Cell-like heterogenous cores (everyday and vector processing) and GPU (graphics)

Regular CPU (everyday processing) and GPGPU (vector and graphics)

Larrabee-like homogenous cores with graphics hardware (everything)

Actually, there is a 4th option.

Take Cell and add a couple of Power7 type cores, a bunch of SPEs and add in some Gfx specific stuff for texture filtering. Also add a "cache mode" to the local store.

It'd be able to GPU stuff, GP stuff and anything in between. The PS4 could be a couple of these.

Predict: The Next Generation Console Tech

Gubbi

Blazkowicz

Shifty Geezer

uber-Troll!

Blazkowicz

Hornet

ADEX

Hornet

SedentaryJourney

Blazkowicz

Hornet

Squilliam

Beyond3d isn't defined yet

Acert93

Artist formerly known as Acert93

Shifty Geezer

uber-Troll!

Squilliam

Beyond3d isn't defined yet

Billy Idol

Shifty Geezer

uber-Troll!

neliz

GIGABYTE Man

fehu

Squilliam

Beyond3d isn't defined yet

ADEX

Similar threads