Predict: The Next Generation Console Tech

Status
Not open for further replies.
Cell may be dead end but it sure opened the discussion and directed the market for multi-core cpu and CPGPU universe today if it were recast with PPU much more powerful, more SPUs and memory accesses more efficient and flexible with the same transistor count of APU may be interesting for the needs of a next gen console with little space and TDP.The problem is the developers dont like the architecture ... and honestly can not tell if the cpu was per se or bad arrangement of PCB ps3(Cell low bandwidth acess do GDDR3).

However there powerpc A2 seems to be what IBM has to offer console and perhaps much more interesting tham cell. A2 combined with a good gpu could be the best hardware at sub 200 watts exists today.

In options such as AMD APU frankly speaking unless it comes with a GPU-level reasonable powerfull or enough for 5 years cycle (at least something like Radeon HD 5870) I think is the rapid way to obsolescence.

Alternatively as APU like is the use of ARM A15+ cpus customize without extreme limitations of space and TDP cell phones(cpus and gpu at less 20mm^2 and miliwatts) and PowerVR GPU 6/Rogue (32 cores, each generating 144 flops / cycle) using the same architecture of most cell phones, tablets and form a new alliance of developing over the universe powerpc / intell / AMD / Nvidia.
 
Cell may be dead end but it sure opened the discussion and directed the market for multi-core cpu and CPGPU universe today if it were recast with PPU much more powerful, more SPUs and memory accesses more efficient and flexible with the same transistor count of APU may be interesting for the needs of a next gen console with little space and TDP.The problem is the developers dont like the architecture ... and honestly can not tell if the cpu was per se or bad arrangement of PCB ps3(Cell low bandwidth acess do GDDR3).

However there powerpc A2 seems to be what IBM has to offer console and perhaps much more interesting tham cell. A2 combined with a good gpu could be the best hardware at sub 200 watts exists today.

In options such as AMD APU frankly speaking unless it comes with a GPU-level reasonable powerfull or enough for 5 years cycle (at least something like Radeon HD 5870) I think is the rapid way to obsolescence.

Alternatively as APU like is the use of ARM A15+ cpus customize without extreme limitations of space and TDP cell phones(cpus and gpu at less 20mm^2 and miliwatts) and PowerVR GPU 6/Rogue (32 cores, each generating 144 flops / cycle) using the same architecture of most cell phones, tablets and form a new alliance of developing over the universe powerpc / intell / AMD / Nvidia.

Wouldn't ARM be too weak for a home console? I'm curious as to how ARM cores compare in terms of performance against XCPU and the CELL PPU? I expect that going ARM would kill any possibility of backwards compatibility too.

I wonder if on the CPU side, game developers wouldn't want more single-threaded performance on a next-gen CPU than what we had this gen? I know there's the expectation that the GPU can also be used for GPGPU, but imho i'm convinced that next gen (as with this gen) GPU cycles will be completely pre-occupied with graphics related tasks, to the point that most GPGPU won't really be utilised much. Plus for workloads with alot of data dependency that isn't a goodfit for GPU surely a faster CPU with better branch prediction and single threaded performance would be more desirable?

I'm just not convinced that Sony or MS should be looking at a "many weak core" CPU design over a "multicore" (i.e. 2-6 core) fat and hefty CPU.

Perhaps any devs can shed some light on this?
 
Saving the planet isn't a topic for this thread. Take it to RSPCA.

Is there any indication that energy-efficiency is a consideration in next-gen console designs?

Maybe if they use an ARM-based design.

US is imposing more stringent Energy Saver requirements for big-screen TVs. Maybe similar regulations are coming for other electronics, including consoles?

There are also new laws which are going to make incandescent light bulbs more scarce, in favor of CFLs and LEDs, for instance.
 
Is there any indication that energy-efficiency is a consideration in next-gen console designs?
Design constraints for energy efficiency are of course welcome discussion. Whether people should or should not care, or the potential ramifactions of energy consumption, or the economic implications yadayada most obviously aren't subjects for a console technology discussion!

Anyone with a link to CE device power constraint laws that'll affect the design of next-gen consoles, feel free to post them. ;)
 
Wouldn't ARM be too weak for a home console? I'm curious as to how ARM cores compare in terms of performance against XCPU and the CELL PPU? I expect that going ARM would kill any possibility of backwards compatibility too.

I wonder if on the CPU side, game developers wouldn't want more single-threaded performance on a next-gen CPU than what we had this gen? I know there's the expectation that the GPU can also be used for GPGPU, but imho i'm convinced that next gen (as with this gen) GPU cycles will be completely pre-occupied with graphics related tasks, to the point that most GPGPU won't really be utilised much. Plus for workloads with alot of data dependency that isn't a goodfit for GPU surely a faster CPU with better branch prediction and single threaded performance would be more desirable?

I'm just not convinced that Sony or MS should be looking at a "many weak core" CPU design over a "multicore" (i.e. 2-6 core) fat and hefty CPU.

Perhaps any devs can shed some light on this?

Interesting observations that you bring here.

ARM Cortex A15 w/8 cores we had a chance to debate performance * and we got the possibility that cpu could even have a performance interesting for a next gen console (think 2 / 3 performance of a Core 2 Quad QX9770 or at least 2 to 3 times the performance Xenon/x360 cpu).

About BC both Sony and MS may have assessed the pros and cons and have decided it is not too important to add, as long as PS3 40GB version that removed GS/ps2 and the lack of updates in 2007 BC X360 does not seem to have affected sales of these consoles and indeed if they even increased in ps3.

It is likely that developers want more performance single thread, but I had impression they want is a CPU with more cache and less DMA latencies more efficient with fewer cycles them theoretical performance per se (theoretical: Xenon have "115Gflops" SP and Cell 218Gflops SP).

What about gpu ... GPGPU we really need a next gen console? To process physics, AI, networking, sound, game logic)? That's one thing for cpu flexible enough to fit applications that require and next gen game console. I also think that these next gen consoles need more specific devices and customize for maximum performance and efficiency than we would with pcs.

I personally think that the CPUs need to be powerful enough for a next gen console game and not very powerful and general porposes as those found in the aspect pcs.

About gpu I think this needs to be as powerful as possible in the graphics aspect (minus GPGPU here ) and flexible and open to high and low level(assembler) programing as possible for developers to be freed (without excesses like ps2 graphics sintesizer pratically without libraries) to choose which API( or none of this),tool etc want to reach your expectations ... and as far as possible manufacture as a sony and MS should thoroughly ask thirdies developers what they want to make their lives easier.

*
http://forum.beyond3d.com/showpost.php?p=1565654&postcount=6605
http://forum.beyond3d.com/showpost.php?p=1566183&postcount=6635
 
Interesting observations that you bring here.

ARM Cortex A15 w/8 cores we had a chance to debate performance * and we got the possibility that cpu could even have a performance interesting for a next gen console (think 2 / 3 performance of a Core 2 Quad QX9770 or at least 2 to 3 times the performance Xenon/x360 cpu).

About BC both Sony and MS may have assessed the pros and cons and have decided it is not too important to add, as long as PS3 40GB version that removed GS/ps2 and the lack of updates in 2007 BC X360 does not seem to have affected sales of these consoles and indeed if they even increased in ps3.

It is likely that developers want more performance single thread, but I had impression they want is a CPU with more cache and less DMA latencies more efficient with fewer cycles them theoretical performance per se (theoretical: Xenon have "115Gflops" SP and Cell 218Gflops SP).

What about gpu ... GPGPU we really need a next gen console? To process physics, AI, networking, sound, game logic)? That's one thing for cpu flexible enough to fit applications that require and next gen game console. I also think that these next gen consoles need more specific devices and customize for maximum performance and efficiency than we would with pcs.

I personally think that the CPUs need to be powerful enough for a next gen console game and not very powerful and general porposes as those found in the aspect pcs.

About gpu I think this needs to be as powerful as possible in the graphics aspect (minus GPGPU here ) and flexible and open to high and low level(assembler) programing as possible for developers to be freed (without excesses like ps2 graphics sintesizer pratically without libraries) to choose which API( or none of this),tool etc want to reach your expectations ... and as far as possible manufacture as a sony and MS should thoroughly ask thirdies developers what they want to make their lives easier.

*
http://forum.beyond3d.com/showpost.php?p=1565654&postcount=6605
http://forum.beyond3d.com/showpost.php?p=1566183&postcount=6635

I must admit i'm not as knowledgable as some when it comes to this kind of stuff, but on a purely conceptual level, surely an 8-core ARM chip would be less desirable for a next-gen CPU, even if it's 2-3 times the power of the 3-core XCPU, than a 2-4 core IBM core or other such variant. I mean i completely understand the power consumption concerns, but my concern would be in gimping a system with a CPU that isn't fast enough to feed data to the GPU. I would imagine an 8-core ARM chip wouldn't have enough performance per core than say even a modified updated CELL for example?

If any game code is parrallelisable enough such that getting it to run quickly on 8-cores could be done reasonably, then i would imagine that that specific workload would likely be more efficiently run on the GPGPU? So if you do that and only have more serialised code that isn't very parallelizable running on your CPU, wouldn't it be in your best interests to have say a tri-core CPU with more transisters and thus performance per core than having to struggle to port that code between 8 relatively weak CPU cores?

Again i do not claim any expertise in this. You likely have a better insight into how these things work than I ;-)
 
If any game code is parrallelisable enough such that getting it to run quickly on 8-cores could be done reasonably, then i would imagine that that specific workload would likely be more efficiently run on the GPGPU? So if you do that and only have more serialised code that isn't very parallelizable running on your CPU, wouldn't it be in your best interests to have say a tri-core CPU with more transisters and thus performance per core than having to struggle to port that code between 8 relatively weak CPU cores?

Again i do not claim any expertise in this. You likely have a better insight into how these things work than I ;-)

Sounds like a reasonable assumption to me. When we talked about A15 performance vs modern X68 it was based on some VERY rough beer mat calculations based on a single comparison benchmark to an AthlonXP. The best case scenrario as I recall for an 8 core A15 ar 2.5Ghz was around 50-75% of the QX9770 but that was the top end of a range which started at something like 1/6th the overall performance.

But even if it matched the performance of said QX9770 you are still talking far less single threaded performance which I assume still has a big place in console games.
 
If any game code is parrallelisable enough such that getting it to run quickly on 8-cores could be done reasonably, then i would imagine that that specific workload would likely be more efficiently run on the GPGPU? So if you do that and only have more serialised code that isn't very parallelizable running on your CPU, wouldn't it be in your best interests to have say a tri-core CPU with more transisters and thus performance per core than having to struggle to port that code between 8 relatively weak CPU cores?

The problem is that the parallel and sequential sections of a given code snippet is intermingled. You often have iterations of some calculation of a big data set that is easy to parallize, followed by a boundary condition check that is not. Running the parallel and sequential sections on different parts incur a latency (and bandwidth and power) penalty because your MPU has to submit the parallel work to a GPGPU or similar.

The solution to this is to make the GPGPU smart enough to resolve the sequential bits (even if it is slower than your MPU), or increase the parallel computational resources of your MPU to handle the parallel bits (even if it slower than your GPGPU).

Cheers
 
Last edited by a moderator:
Sounds like a reasonable assumption to me. When we talked about A15 performance vs modern X68 it was based on some VERY rough beer mat calculations based on a single comparison benchmark to an AthlonXP. The best case scenrario as I recall for an 8 core A15 ar 2.5Ghz was around 50-75% of the QX9770 but that was the top end of a range which started at something like 1/6th the overall performance.

But even if it matched the performance of said QX9770 you are still talking far less single threaded performance which I assume still has a big place in console games.

If an A15 is going to have similar performance to a PPC or x86 solution, it is going to spend rougly the same amount of power. There is nothing magic about the ARM ISA that gives it any sort of advantage compared to other ISAs.

Cheers
 
Prophecy2k;1572034[B said:
]I must admit i'm not as knowledgable as some when it comes to this kind of stuff[/B], but on a purely conceptual level, surely an 8-core ARM chip would be less desirable for a next-gen CPU, even if it's 2-3 times the power of the 3-core XCPU, than a 2-4 core IBM core or other such variant. I mean i completely understand the power consumption concerns, but my concern would be in gimping a system with a CPU that isn't fast enough to feed data to the GPU. I would imagine an 8-core ARM chip wouldn't have enough performance per core than say even a modified updated CELL for example?

If any game code is parrallelisable enough such that getting it to run quickly on 8-cores could be done reasonably, then i would imagine that that specific workload would likely be more efficiently run on the GPGPU? So if you do that and only have more serialised code that isn't very parallelizable running on your CPU, wouldn't it be in your best interests to have say a tri-core CPU with more transisters and thus performance per core than having to struggle to port that code between 8 relatively weak CPU cores?

Again i do not claim any expertise in this. You likely have a better insight into how these things work than I ;-)

Me too ;) I have worked for eons ago it was actually a mere apprentice (now even less), but was in time of the 8086 ...

About ARM is very likely that you are right, because maybe something like 2/3 of intell core 2 quad QX9770 is not enough to withstand a high end gpu level today (right here hoping for at least some level of a Radeon 5870 HD ...), but who knows customize an ARM with something like AltiVec SIMD, VMX128 way of life,more clock without the limitations of the cell phone universe is perhaps not as weak as it seems.

The interesting thing about a closed box with ARM beyond the similarity to cell phones would be the low TDP and transistor count may allow this expenditure wattage of the cpu transistors that would be spent in a really powerful gpu and effectively with large eDRAM.

Just a thought here and probably many will not agree ... In this case ARM is somehow a return to the old closed boxes until the coming of the first Xbox game(Pentium III / 733MHz general porpose) consoles cpus as serving the exact GPU needed (batches etc ...noting that this cpu needs to be sufficient to handle AI, game logic, physics, network, sound etc..) .

It would be delay this paradigm to focus more efforts on the GPU side? Perhaps, but it is likely that greatly simplify life for developers and even manufacturers like Sony and MS do not need to have to spend hundreds of millions even billions facilities (Cell Broadband Engine) that could really go for powerful GPU, cause game console as the original proposal is thought particularly in the graphics aspect.

And indeed a very interesting hypothesis of parallelism and customize a code could be run efficiently customize both the cpu and gpu? For this case we would have two processors working and balancing performance and exchanging with each other if one of these was the defining factor, but maybe in the end would be better to go to APU as a set or something like Cell + GPU? will the developers would like to work for two processors(they dont like so much work with SPUs as coprocessors)?
 
Last edited by a moderator:
ARM-based SOCs may not equal the raw power of console architectures but it seems we're on the verge of tablets and smartphones which will output 1080p graphics via HDMI . Not only that but graphics capability will improve every year.

The software pricing model for these devices (99 cents to a few dollars for tablet games) may not lend itself to development of games which make the most of these graphics capability.

But it seems the ecosystem for these mobile APUs is far more competitive and dynamic than just about any other chip architecture out there.
 
Resolution doesn't mean anything if you don't have the processing power available to do something with it. Should we seriously be awed by a 2011 device being able to output HD resolutions? A decade+ old gf2mx can do resolutions higher than HD.
 
The solution to this is to make the GPGPU smart enough to resolve the sequential bits (even if it is slower than your MPU), or increase the parallel computational resources of your MPU to handle the parallel bits (even if it slower than your GPGPU).
It will be interesting to see which one will prevail in the long run. GCN will have additional scalar pipeline for each SIMD/SIMT. I am not sure how exactly we are going to program that, since the threading model pretty much hides the SIMD. It would give us some nice options if a wavefront of threads is going to have an additional sequential "scalar" thread (with separate code) running in parallel to the SIMT calculation (with barrier synchronization with the SIMT + atomics of course).

Intel is planning to expand AVX to 1024 bits some time in the future (according to their documentation). 1024 bit AVX would process 32 single precision floats in parallel. That's equal to current graphics hardware SIMT sizes. Add some scatter/gather instructions (memory load/store with SIMD register contents as pointers/indices) and we can easily port GPU<->CPU SIMD code.
About ARM is very likely that you are right, because maybe something like 2/3 of intell core 2 quad QX9770 is not enough to withstand a high end gpu level today (right here hoping for at least some level of a Radeon 5870 HD ...), but who knows customize an ARM with something like AltiVec SIMD, VMX128 way of life,more clock without the limitations of the cell phone universe is perhaps not as weak as it seems.
According to some benchmarks four core ARM keeps up with order two Core Intel CPUs (energy efficient models clocked pretty low) if the code is properly threaded. That would mean around 1/3 of single core performance compared to Sandy Bridge, and Sandy Bridge frequency scales much higher. And nobody can really beat Intel when it comes to SIMD performance. If they implemented something as powerful as the current Sandy Bridge 256bit AVX, a single FPU/SIMD unit would likely be bigger than the whole four core ARM chip :)
 
Last edited by a moderator:
According to some benchmarks four core ARM keeps up with order two Core Intel CPUs (energy efficient models clocked pretty low) if the code is properly threaded. That would mean around 1/3 of single core performance compared to Sandy Bridge, and Sandy Bridge frequency scales much higher. And nobody can really beat Intel when it comes to SIMD performance. If they implemented something as powerful as the current Sandy Bridge 256bit AVX, a single FPU/SIMD unit would likely be bigger than the whole four core ARM chip :)

Yes im agree with you entirely, they will come with Ivy bridge promised to be 20% higher than sandy, but the ARM may have a trump card because they dont want to loose this segment supremacy for intell so easily, next moves will be interesting.
 
I must admit i'm not as knowledgable as some when it comes to this kind of stuff, but on a purely conceptual level, surely an 8-core ARM chip would be less desirable for a next-gen CPU, even if it's 2-3 times the power of the 3-core XCPU, than a 2-4 core IBM core or other such variant. I mean i completely understand the power consumption concerns, but my concern would be in gimping a system with a CPU that isn't fast enough to feed data to the GPU. I would imagine an 8-core ARM chip wouldn't have enough performance per core than say even a modified updated CELL for example?

Performance-per-core is irrelevant these days as the vast majority of game-code most suited to a single-threaded-execution-model hasn't really changed or evolved since the days when we were running it on the PSX.

The rest of the code requires large-scale computation and just happens to fit the data-parallel execution model much better. However you're still going to be limited by memory access latency & as such, makes it fit CPU [than say GPGPU] execution much better in most cases. Also in console development you'll likely not have [nor have any inclination to free up] the GPU cycles to spare for this as it'll be flat out with rendering, which makes up the vast majority of your bottlenecks typically...

For more information on the general considerations made when architecting modern game code you want to google the Data-Oriented-Design paradigm (beware, if you don't have a programming background you might not get much from it...)

Oh & given the same silicon budget / transistor count, I'd rather have a 16-32 core CELL than an 8 core ARM chip any day of the week...
 
Performance-per-core is irrelevant these days as the vast majority of game-code most suited to a single-threaded-execution-model hasn't really changed or evolved since the days when we were running it on the PSX.

The rest of the code requires large-scale computation and just happens to fit the data-parallel execution model much better. However you're still going to be limited by memory access latency & as such, makes it fit CPU [than say GPGPU] execution much better in most cases. Also in console development you'll likely not have [nor have any inclination to free up] the GPU cycles to spare for this as it'll be flat out with rendering, which makes up the vast majority of your bottlenecks typically...

For more information on the general considerations made when architecting modern game code you want to google the Data-Oriented-Design paradigm (beware, if you don't have a programming background you might not get much from it...)

Oh & given the same silicon budget / transistor count, I'd rather have a 16-32 core CELL than an 8 core ARM chip any day of the week...


Excellent information.

But archangelmorph would you believe that developers like to work with Cell with 32 SPU`? Even with a PPU modernized and improved, since they were not very receptive (please correct me if I'm wrong here) to have to work with this cpu with 2 core types or asymmetric and only 6 SPUs (one reserved for OS), and just only the extreme weakness of the RSX maked developers ​​as to its use by forceps (the most recent case was the Crytech who admitted he did not use SPUs to aid in Deferred shading / lighting in Crysis 2 based only use of the PPU + RSX).

Let me daydreaming ... Maybe a revamped cell with a ratio 2 to 4 PPUs each with 4 to 8 SPUs should be interesting, because it would give the possibility of developers want to use or not the capabilities of the SPUs and still have relatively "General porpose good cpus" (i know many dont like PPUs...if change to power6 or even 7) like to make their lives easier.
 
Last edited by a moderator:
I think the original patent of 4PPU's to 32 SPU's could be interesting with the PPU's updated to produce better single thread performance.
 
I think the original patent of 4PPU's to 32 SPU's could be interesting with the PPU's updated to produce better single thread performance.


I remember this patent that if I am not mistaken is 22 August 2002 and there is also another that shows the GPU Visualizer PPUs 4 + 16 + SPUs Pixel Engines (set of shaders cores?). I believe this was the original intention of the PS4 if the cell was well adopted by thirdies developers in ps3.

At the time of this patent many spoke of the Cell Broadband Engine Archtecture clocked at 4GHz and up to 1 TFlops single precision.

(unfornatelly im loose my picture/sheet with cell and vizualizer together in one single die...and a preview of APU ? )
 
It will be interesting to see which one will prevail in the long run. GCN will have additional scalar pipeline for each SIMD/SIMT. I am not sure how exactly we are going to program that, since the threading model pretty much hides the SIMD. It would give us some nice options if a wavefront of threads is going to have an additional sequential "scalar" thread (with separate code) running in parallel to the SIMT calculation (with barrier synchronization with the SIMT + atomics of course).

Intel is planning to expand AVX to 1024 bits some time in the future (according to their documentation). 1024 bit AVX would process 32 single precision floats in parallel. That's equal to current graphics hardware SIMT sizes. Add some scatter/gather instructions (memory load/store with SIMD register contents as pointers/indices) and we can easily port GPU<->CPU SIMD code.

According to some benchmarks four core ARM keeps up with order two Core Intel CPUs (energy efficient models clocked pretty low) if the code is properly threaded. That would mean around 1/3 of single core performance compared to Sandy Bridge, and Sandy Bridge frequency scales much higher. And nobody can really beat Intel when it comes to SIMD performance. If they implemented something as powerful as the current Sandy Bridge 256bit AVX, a single FPU/SIMD unit would likely be bigger than the whole four core ARM chip :)

The scalar bit in gcn is invisible to programmer, and it can't write to memory.
 
Status
Not open for further replies.
Back
Top