Predict: The Next Generation Console Tech

Status
Not open for further replies.
It's the most business elegant solution. You reduce the design supplier to one (AMD) and the manufacturer to one (Global Foundries).

A six core ARMv8 64 bit MS/AMD custom chip integrated with a high performance AMD GPU in the HSA architecture is the sexiest console design imaginable. The low power ARM cores would allow the power to go the GPU where it will count the most while keeping the CPU allocation of the TDP low.

MS could then turn around and not only run Windows 8, and Windows Phone apps on it, but could reuse the entire IP for other Windows devices.

Given how low performance the PPC GuTS core of XCPU is, emulation of it on the ARMv8 might not be that much of a stretch, while the altivec could be emulated on the GPU.

Do it, MS.


And why are you assuming an ARM CPU at console TDP levels (+20W?) will perform better than an AMD/Intel/IBM CPU at the same TDP levels?


This myth that ARM will undoubtedly do better than x86/PowerPC at their own game should disappear, at least until there are actually 64bit ARMv8 chips out there.
 
This myth that ARM will undoubtedly do better than x86/PowerPC at their own game should disappear, at least until there are actually 64bit ARMv8 chips out there.
Another thing people don't seem to realize is that moving from 32 to 64bits won't change performance at all on it's own. With x86 we got the performance from increased register counts combined with greatly improved architecture (K8). For the ARM to get any performance increase it has to come from other improvements, not just from being able to work on 64bit integers in GP registers.
 
And why are you assuming an ARM CPU at console TDP levels (+20W?) will perform better than an AMD/Intel/IBM CPU at the same TDP levels?

You're assuming I am. It's not that the ARM CPU will perform better, it is that it will take less of the TDP allowing more of it to be allocated to the performance of the GPU where more of the work is going to be offloaded in the HSA GGPU architecture.

I don't know if things are at the point that the compute clusters on the GPU can start taking over tasks that would have been handled by a powerful CPU in a more traditional design (and handle them more efficiently). That is the question.

This myth that ARM will undoubtedly do better than x86/PowerPC at their own game should disappear, at least until there are actually 64bit ARMv8 chips out there.
I'm not saying that. I'm saying that an ARM based CPU solution will allow an over all more powerful console solution that is more GPU processing centric. If things are at that point yet.. (looking at HSA if that can be implemented in a console.)

If you incorporate a discrete X86/PPC that is $60 of the b.o.m (throwing a number out) and takes 50 to 60 watts of the TDP that all comes out of the cost and resource budget remaining for the discrete GPU and RAM.

If a GPGPU processing model like HSA is possible in time for the next-gen then the lite CPU cores will actually be the way to go, which is what makes ARMv8 and ideal candidate.

If MS doesn't try it, I hope Sony comes up with a tight, integrated GPGPU, SoC based design for PS4. They may end up being close to MS in performance, but beating it in other factors like cost and form factor/size.
 
You're assuming I am. It's not that the ARM CPU will perform better, it is that it will take less of the TDP allowing more of it to be allocated to the performance of the GPU where more of the work is going to be offloaded in the HSA GGPU architecture.

You say you aren't... then outright state that you are assuming that the ARM is better perf/watt.
 
You say you aren't... then outright state that you are assuming that the ARM is better perf/watt.

No, I'm not. Where did I say that? Point it out. I even repeated myself explaining it several times. Jeesus

a discrete ARM CPU is less powerful and lower perf/watt than a discete PPC and X86, okay...

The overall solution (see the difference), using ARM instead of PPC/X86, in a GPGPU console solution where the CPU and GPU are tightly integrated (and the GPU is taking more of the load), like in HSA, you get the better perf/watt.

Another way to state it is this,

I propose that..

A discrete X86/PPC CPU + discrete GPU will be lower perf/watt (in a console with an approx 250watt TDP and $400 retail price)

than...

an integrated ARM CPU + GPU in an HSA GPGPU solution.

This is what I'm referring to when I'm talking about HSA (the next step in CPU-GPU integration), btw:

http://www.bit-tech.net/news/hardware/2012/02/03/amd-chip-roadmap-2013/1

The first Fusion APUs, which combine GPU and CPU technology on a single chip, have been available for some time, but Papermaster's vision goes further. In 2012, he explained, chips will be released that give the GPU access to the CPU memory for increased efficiency and data sharing between the two.

In 2013, this will be further enhanced by doing away with the concept of independent memory altogether. Instead, both the GPU and CPU will share a single unified memory space. The benefit: data can be processed by either subsystem without the delays associated with moving it to different areas of system memory.

Finally, 2014 will see the launch of the first HSA-compatible GPUs. Unlike current general purpose GPU technology, Papermaster claims that the HSA-compatible GPU will be cable of switching its compute context on the fly, executing code on whichever process makes most sense from a performance perspective.
Is HSA a viable solution for next-gen consoles? I don't know, my point is to propose that possibility and discuss it while making the case that ARM would be well suited to it.

The ARM, AMD, MS trifecta may very well factor in the next Xbox. Some early signs point to that direction, though I don't think it is viable for a 2013 release.
 
DDR4 could be a possibility, I believe Samsung started sampling last year.

Indeed, they did. The standard was finalised last fall. But the speed does not reach Gddr5 levels, 2.33 - 3.2 Gbps and only offer 16 bits data bus/module. However, the power draw will be significantly lower as it will beworking at 1.2 V.
Using ddr4 would mean trading speed for larger size. If paired with a framebuffer in edram I think it could be a possible solution.
 
No, I'm not. Where did I say that? Point it out. I even repeated myself explaining it several times. Jeesus

a discrete ARM CPU is less powerful and lower perf/watt than a discete PPC and X86, okay...

The overall solution (see the difference), using ARM instead of PPC/X86, in a GPGPU console solution where the CPU and GPU are tightly integrated (and the GPU is taking more of the load), like in HSA, you get the better perf/watt.

Another way to state it is this,

I propose that..

A discrete X86/PPC CPU + discrete GPU will be lower perf/watt (in a console with an approx 250watt TDP and $400 retail price)

than...

an integrated ARM CPU + GPU in an HSA GPGPU solution.

This is what I'm referring to when I'm talking about HSA (the next step in CPU-GPU integration), btw:

http://www.bit-tech.net/news/hardware/2012/02/03/amd-chip-roadmap-2013/1

Is HSA a viable solution for next-gen consoles? I don't know, my point is to propose that possibility and discuss it while making the case that ARM would be well suited to it.

The ARM, AMD, MS trifecta may very well factor in the next Xbox. Some early signs point to that direction, though I don't think it is viable for a 2013 release.
I still don't see how ARM is offering anything in that scenario. A Jaguar core is probably going to use like 1 watt or something.

Indeed, they did. The standard was finalised last fall. But the speed does not reach Gddr5 levels, 2.33 - 3.2 Gbps and only offer 16 bits data bus/module. However, the power draw will be significantly lower as it will beworking at 1.2 V.
Using ddr4 would mean trading speed for larger size. If paired with a framebuffer in edram I think it could be a possible solution.

Ya, I was only suggesting it as a possible option to ddr3. Faster, and lower power, but it would likely be insufficient as a sole memory system.
 
I thought there was a roadmap for gddr5 to be fully differential, doubling the bandwidth per pin? Is this a reality yet?
 
Another thing people don't seem to realize is that moving from 32 to 64bits won't change performance at all on it's own. With x86 we got the performance from increased register counts combined with greatly improved architecture (K8). For the ARM to get any performance increase it has to come from other improvements, not just from being able to work on 64bit integers in GP registers.

Note that ARMv8 also doubled registers, to 32. More importantly, they cleared up some of the historical cruft: No more predicates on everything, and shifts baked into instructions are immediate-only.

ARMv8 cpu's should be easier to make fast than older ARM architectures. However, at this point it's unclear how much of the simplification translates into actual usable benefit, given that the actual CPUs will also implement ARMv7 and Thumb-2.
 
Not much news about it really. No one seems to be jumping on board yet at least.

I used to be a firm supporter of differential signalling, but not so much anymore.

Single ended signalling require lots of pins on the DRAM package for ground and power planes to manage noise. However differential signalling requires two traces to be routed on the PCB per signal pin. As long as single ended signalling is within half the per pin bandwidth of differential signalling, the only advantage seems to be power usage.

Cheers
 
On the other hand, we've got issues with memory controllers supporting 6.0Gb/s GDDR5 and above, so what alternative is there but to go with D.S.?
 
On the other hand, we've got issues with memory controllers supporting 6.0Gb/s GDDR5 and above, so what alternative is there but to go with D.S.?

More on-die RAM ?

The lower power consumption per bit transmitted might be important in mobile applications. As these devices grow in capability and capacity we might see a migration to diff. signalling, and then from mobile devices to high end solutions (after all, everything is power limited these days).

Cheers
 
I don't think it's a zero sum. SE won't get much more than 6Gbps per line, probably ever. DS can do 20 Gbps per pair. There's also a much better signal integrity, less noise, simpler board layout, the PCB might cost less to design, test and mass produce.

Sony and IBM are working hard on optical interconnects between chips, but I would guess that's for PS5 or Xbox1440.
 
More on-die RAM ?

The lower power consumption per bit transmitted might be important in mobile applications. As these devices grow in capability and capacity we might see a migration to diff. signalling, and then from mobile devices to high end solutions (after all, everything is power limited these days).
Cheers

The lower power consumption per bit transmitted is important in mobile application, the question is how important it is versus other concerns. SPMT (link to consortium), (link to brief description) may be one of the contenders for upcoming mobile solutions. Or not. The right names are on the promoters list, but it has been awfully quiet.
 
Sure but going from 8 (closer to 6 actually usable) to 16 is significantly bigger deal than going from 16 to 32.
Absolutely. Although, with arm only 13 of those were usable (the PC was a GPR). ARMv8 uses r31 for SP and as a null register (depending on context), r30 for link register, and r29 for frame pointer. The rest of them are GPRs, with a really nice split in the calling convention they are promoting -- 9 for args to/from functions, 10 for callee-save, 9 temporaries, and a single one for persistent platform state (probably heap allocation pointer in most languages.)

Generally, I don't expect programs to get that much direct boost from the changes, but building faster processors should get easier.

x86 also doubled the amount of SIMD registers, how about ARMv8?
They used to have 32 64-bit registers, also addressable as 16 128-bit regs. ARMv8 has 32 128-bit registers, and when used as 64-bit ones, they are all packed into the lower-order bits of the respective regs.

There's one bit of braindamage left in the calling convention -- some of the floating point regs are marked as callee-save, but the callee is only expected to save and restore the lowest 64 bits. This means that if you are using SIMD, there effectively are no callee-save regs.
 
Aren't fabs moving to 450mm wafers reducing costs for chips with larger sizes? Wouldn't that affect the initial chip sizes of nextgen consoles as they're likely to benefit from such wafers down the road?
 
Status
Not open for further replies.
Back
Top