Predict: The Next Generation Console Tech

Status
Not open for further replies.
A10 APU with a small dedicated GPU

Small, power efficient, cheap and potent.

An A10 5800k with a dedicated 6670 GPU pulls around 142w of power during gaming on PC, maybe factor that up to 160w+ in a console due to every inch of the hardware is always run at full load.

Worrying parts are

- CPU performance is well, abysmal. No idea how well it will do in a console but on PC the A10's really are slow and get trounced on by the little core i3's.

- Memory bandwidth as it stands will not be that high is any higher then what PS3 has currently!

- Tessellation performance, developers are really going to push this feature hard next generation and if Microsoft's new machine is much faster then PS4 won't keep up.

- And overall the hardware is just slow and would struggle to keep up to an HD 6850.

I just hope Sony don't lumber the A10 with a dedicated GPU of equal performance to it's IGPU, something much stronger is needed!
 
I don't agree for the cpu part, it's not good for a gaming pc, but consoles are known for having weak cpu, and compensate with strong gpu (and spe for cell).

For a console I think it's a good ooo core. Plus, the weaknesses can be "hidden" on console, where dev can optimize the code better ...
 
Last edited by a moderator:
A10 APU with a small dedicated GPU

Small, power efficient, cheap and potent.

An A10 5800k with a dedicated 6670 GPU pulls around 142w of power during gaming on PC, maybe factor that up to 160w+ in a console due to every inch of the hardware is always run at full load.

Worrying parts are

- CPU performance is well, abysmal. No idea how well it will do in a console but on PC the A10's really are slow and get trounced on by the little core i3's.

- Memory bandwidth as it stands will not be that high is any higher then what PS3 has currently!

- Tessellation performance, developers are really going to push this feature hard next generation and if Microsoft's new machine is much faster then PS4 won't keep up.

- And overall the hardware is just slow and would struggle to keep up to an HD 6850.

I just hope Sony don't lumber the A10 with a dedicated GPU of equal performance to it's IGPU, something much stronger is needed!

If things are the way I think they will be the CPU being weak isn't going to hold the system back because the GPGPU in the APU will be doing the heavy computing.
 
A10 APU with a small dedicated GPU

Small, power efficient, cheap and potent.

An A10 5800k with a dedicated 6670 GPU pulls around 142w of power during gaming on PC, maybe factor that up to 160w+ in a console due to every inch of the hardware is always run at full load.
There is quite a bit of redundancy present.
The A10 die measures 246 mm², the HD6670 (Turks) measures 118mm² in 40nm (and can't be shrunk that much because of the 128bit interface). The A10 has a 128 Bit DDR3 interface, 8 ROPs, 24 TMUs and 6 CUs (VLIW4). It provides a PCI-Express interface, display outputs, UVD engine and so on. The Turks chips also has a 128 Bit memory interface (albeit a combined DDR3/GDDR5 one), 8 ROPs, 24 TMUs, 6 CUs (but of the VLIW5 type, which makes the whole proposition highly unlikely from the start in my opinion) and again provides 16 PCIe lanes, display outputs UVD ...

That is going to be expansive to manufacture, requires a more complex layout of the board/package (if you want to put both on an MCM in the Wii U style) and contains simply a lot of unneeded stuff. If you start to cut parts out of both chips, you can also build something reasonable from the start.

If you want to keep the bulldozer/piledriver modules, two of them measure 61mm² in 32nm, or maybe around 45 to 50mm² in 28nm. Add the northbridge/uncore/glue of Trinity and you may arrive at 70 mm² for the 4 cores including the glue to the GPU part. Now add just for the sake of the argument a full Pitcairn chip which includes 20 CUs with 80 TMUs (they could use 18 active CUs for high yields), a 256 Bit GDDR5 interface, 32 ROPs, of course display outputs, an UVD-engine and also 16 PCI-Express lanes. And all that in some additional 212mm². You actually don't need the 16 PCIe lanes. Two or four would be completely enough to connect to some southbridge (or you could integrate that also on die). The northbridge of Trinity also includes already a memory controller, which can be scrapped or let's say unified with the one from the GPU saving a few mm². In the end, you would end up with a die only slightly larger than the A10, but with vastly higher performance.
The SoC solution would have
20 CUs (1280 SPs)
80 TMUs
32 ROPs
two way setup with 32 Pixel/clock raster
256 Bit GDDR5 interface

All in < 300mm² (probably doable in <280mm²) in 28nm. It would shrink to <200mm² in 20nm, but further shrinks would get complicated because of the 256Bit interface (that's an incentive for an alternative memory solution). And on top of that one would have quite a few more resources than the combined ones of the APU+GPU combination, it's easier to use them in an efficient way, too! One also spares the complications of two different memory pools and such stuff.

Of course it relies on porting the Piledriver core to 28nm (easiest would be TSMC's, as the GCN implementation exists for this process already), but it's rumored to happen anyway (getting the GPU parts of the APUs to run well on GF's 32nm SOI process, took a bit of time [and die space!]; maybe IBM's/GF's 28nm bulk process makes it easier or the engineers can use some of the experience from the 32nm port). And in the middle of 2013 (when the mass production should run full steam) the 28nm process should be pretty mature and a lot cheaper than it was a year ago.

If one doesn't want to take the risk of porting the CPU cores to another process and the additional benefit of a synthezisable core (can be ported to another fab or shrunk much more easily than a BD derivative), AMD has the Jaguar core which offers exactly that. They are quite small so you can easily fit twice the number of cores in the same space (2,9mm² in TSMC 28nm, one "module" of four cores including 2MB L2 is probably slightly smaller than a BD/PD module with the same amount of L2 when both are at 28nm), offer comparable IPC (but at a lower maximum clock) and don't consume a lot of power. Within a ~35W power budget for the CPU cores alone (leaving 100+W for the GPU part [which allows decent clock speeds], if one limits the power consumption of the whole console to <200W), it's probably even faster (because PD cores can't use the high clock ceiling in that power constrained scenario) in a well threaded engine (which should be the case).

My personal view is, that one wouldn't build it as described (basically slapping some CPU cores on a Pitcairn), it should only demonstrate that one could get a vastly better overall solution than this proposed APU+GPU combo in the same power and a similar die size or cost budget. But the solutions we will see in XBoxNext/720 and Orbis/PS4 could be somewhat close. One would probably see some fine tuning with the size and layout of the GPU part, maybe removal of a few features (for what does one needs 6 display outputs on a console?) or adding some other ones (a few additional HSA features of Sea Islands? As it is a closed system, one could get similar functionality in a leaner and meaner way). And there is of course the question of the memory interface. 128 Bits are probably cheaper in the long run, even if it may be more expensive in the short term when using eDRAM or stacked memory (which could get quite large and has even the potential to form the main memory pool) for compensation.
 
Last edited by a moderator:
There is quite a bit of redundancy present.
The A10 die measures 246 mm², the HD6670 (Turks) measures 118mm² in 40nm (and can't be shrunk that much because of the 128bit interface). The A10 has a 128 Bit DDR3 interface, 8 ROPs, 24 TMUs and 6 CUs (VLIW4) and provides a PCI-Express interface, display outputs, UVD engine and so on. The Turks chips also has a 128 Bit memory interface (albeit a combined DDR3/GDDR5 one), 8 ROPs, 24 TMUs and 6 CUs (but of the VLIW5 type, which makes the whole proposition highly unlikely from the start in my opinion) and again provides 16 PCIe lanes, display outputs UVD ...

That is going to be expansive to manufacture, requires a more complex layout of the board/package (if you want to put both on an MCM) and contains simply a lot of unneeded stuff. If you start to cut parts out of both chips, you can also build something reasonable from the start.

If you want to keep the bulldozer/piledriver modules, two of them measure 61mm² in 32nm, or maybe around 45 to 50mm² in 28nm. Add the northbridge/uncore/glue of Trinity and you may arrive at 70 mm² for the 4 cores including the glue to the GPU part. Now add just for the sake of the argument a full Pitcairn chip which includes 20 CUs with 80 TMUs (they could use 18 active CUs for high yields), a 256 Bit GDDR5 interface, 32 ROPs, of course display outputs, an UVD-engine and also 16 PCI-Express lanes. And all that in some additional 212mm². You actually don't need the 16 PCIe lanes. Two or four would be completely enough to connect to some southbridge (or you could integrate that also on die). The northbridge of Trinity also includes already a memory controller, which can be scrapped or let's say unified with the one from the GPU saving a few mm². In the end, you would end up with a die only slightly larger than the A10, but with vastly higher performance.
The SoC solution would have
20 CUs (1280 SPs)
80 TMUs
32 ROPs
two way setup with 32 Pixel/clock raster
256 Bit GDDR5 interface

All in < 300mm² (probably doable in <280mm²) in 28nm. It would shrink to <200mm² in 20nm, but further shrinks would get complicated because of the 256Bit interface (that's an incentive for an alternative memory solution). And on top of that one would have quite a few more resources than the combined ones of the APU+GPU combination, it's easier to use them in an efficient way, too! One also spares the complications of two different memory pools and such stuff.

Of course it relies on porting the Piledriver core to 28nm (easiest would be TSMC's, as the GCN implementation exists for this process already), but it's rumored to happen anyway (getting the GPU parts of the APUs to run well on GF's 32nm SOI process, took a bit of time; maybe IBM's/GF's 28nm bulk process makes it easier or the engineers can use some of the experience from the 32nm port). And in the middle of 2013 (when the mass production should run full steam) the 28nm process should be pretty mature and a lot cheaper than it was a year ago.

You're talking about doing some custom work which leaks are pretty much giving the indication that Sony are avoiding this as it's expensive.

The days of expensive consoles with fully custom hardware is all but over imo.

I find it funny and yet slightly annoying that leaks are indicating 'XX' hardware will be used and people are still coming up with massive assumptions that it will be modified beyond recognition.

Sony have obviously picked an APU for a reason and that reason is not because it's an easy chip to butcher and beef up.
 
Actually, using a Kabini (working samples are existing) "just" with a vastly larger GPU part and a wider memory interface sounds like a relatively easy customization to me (comparable with producing a "custom" GPU with a differing number of physically present CUs) and would also qualify as an APU or SoC if you want. AMD has all the components, it should be possible for them to put it together. It's not like they are going to produce something completely new from scratch. AMD is actually advertising that they have this capability to mix'n match.

In my eyes it does not make sense that Sony will try to win customers with a A10 and a HD6670 combo (the VLIW4/VLIW5 mismatch does not help to convince me of the credibility of that rumor neither) in the year 2014 (or they lost their mind). I only care about rumors which make some sense, not the stupid ones. ;)
 
Actually, using a Kabini (working samples are existing) "just" with a vastly larger GPU part and a wider memory interface sounds like a relatively easy customization to me (comparable with producing a "custom" GPU with a differing number of physically present CUs) and would also qualify as an APU or SoC if you want. AMD has all the components, it should be possible for them to put it together. It's not like they are going to produce something completely new from scratch. AMD is actually advertising that they have this capability to mix'n match.

In my eyes it does not make sense that Sony will try to win customers with a A10 and a HD6670 combo (the VLIW4/VLIW5 mismatch does not help to convince me of the credibility of that rumor neither) in the year 2014 (or they lost their mind). I only care about rumors which make some sense, not the stupid ones. ;)

Do you have any idea how much heat that little heat spreader would be kicking out if AMD shoved a 7850+ level GPU in there with the CPU?

There would be such a concentration of heat in a small area that cooling it would be a complete pig.

And with the cost of actually putting a faster GPU on the de, R&D, testing....etc...etc... it would just be easier and maybe some what cheaper to just put a bigger GPU in there.

And do people not that think that Sony could genuinely not want a mega fast, mega money sucking console next generation?

Nintendo are not on top any more but they make more money then Sony, maybe they want to try and make a machine that can be made into a profit at a faster pace then it took PS3.

An A10 5800k is £90 and for that you get a 3.8Ghz+ quad core and a decent GPU and all of that within 100w.

A dedicated 6670 is £49 which more or less doubles your GPU power.

AMD don't allow an APU to run with any dedicated GPU that higher then a 6670 in terms of power as then you start getting to the point where the dedicated cards would be faster making the APU crossfire poitnless.

That applies to the PC space, would mixing an APU with a much beefier dedicated GPU be a waste of time then? Who knows.

What I would like to know is if PS4 did have a beefier dedicated GPU what could the APU do to assist it?

Could you move all the tessellation and post processing over the the APU? Maybe also have all the physics moved over the APU via OpenCL?

What sense would that APU make with a big dedicated GPU.
 
Do you have any idea how much heat that little heat spreader would be kicking out if AMD shoved a 7850+ level GPU in there with the CPU?

I think that's why we have had rumors of the move from Steamroller to Jaguar cores. If that's really the case, the CPU won't be running at 3.8 GHz, but somewhere in the 1.6-2.0 GHz range while the GPU shaders run in the 0.75-1.0 GHz range. Then it's certainly manageable to have a 7850 GPU performance in an APU.

Instead of think of an APU consisting of a CPU with integrated GPU. Think of it as a GPU with an integrated CPU. The difference being that the GPU is driving the performance requirements and not typical desktop work loads.
 
Do you have any idea how much heat that little heat spreader would be kicking out if AMD shoved a 7850+ level GPU in there with the CPU?
Roughly the same as a HD7870 does from a 212mm² die? :LOL:
I specifically gave some numbers (which I've chosen because they appeared reasonable to me) for the power consumption of the CPU cores and the remaining (let's call it "GPU") part. They add up to about 135W (+x, depending on how close to the 200W ceiling I assumed MS and Sony want to go). That isn't unheard of and quite manageable in my opinion.
And do people not that think that Sony could genuinely not want a mega fast, mega money sucking console next generation?
A SoC solution combining Jaguar cores with some slightly sub Pitcairn GPU coming at the end of 2013/beginning of 2014 is neither "mega fast" nor "mega money sucking". It's actually a reasonably cheap alternative which compromises performance for power consumption and manufacturability, especially in the long run. :rolleyes:
An A10 5800k is £90 and for that you get a 3.8Ghz+ quad core and a decent GPU and all of that within 100w.
A 4 core Kabini is undoubtly slower, but does it in 25W (already including the chipset as it integrates the southbridge) for the top bin and a tiny die size compared to a Trinity. You have plenty of space and power budget left to double up on the cores (to get the CPU performance on par [single thread isn't that important!] with that power sucking hog the 100W A10 is) and increase the GPU resources until it easily defeats a Trinity even with the help of a small GPU.
A dedicated 6670 is £49 which more or less doubles your GPU power.
6 GCN CUs in 28nm are ~30mm². Add in a few ROPs, factor in the additional effort for distributing the work over the CUs (front end/work distribution) and you are looking at maybe 50mm² additional (to the GPU already present in the APU/SoC) die size you need to get this kind of performance (GCN CUs are usually quite a bit faster than VLIW ones, but let's forget about that here). 50mm² per die results in a cost differential of ~5$ per die in a mature process (order of magnitude number). It surely beats adding a 50$ GPU (to be fair, it's going to be cheaper as one saves the actual board [it gets soldered to the same mainboard, which is going to be more complex and expensive in exchange], the cooling solution saves you a bit, you cut out the margins of the OEMs and so on, but you get the idea). And it solves the problem of having two separate memory pools (afaik, the unified memory of the XBox360 is an advantage compared to the PS3).
The developement work necessary to build such a customized APU/SoC is a one-time-effort. The benefits of the simpler programmability and the lower production costs (it gets more pronounced later with shrinks) add up continously over the following multi year lifetime.
AMD don't allow an APU to run with any dedicated GPU that higher then a 6670 in terms of power as then you start getting to the point where the dedicated cards would be faster making the APU crossfire poitnless.
Yes, you are right. It is pointless. Better go with a full size GPU (either dedicated or part of the APU/SoC). It earns you more (and a more consistent) performance while being more flexible and easier to program for.
That applies to the PC space, would mixing an APU with a much beefier dedicated GPU be a waste of time then? Who knows.

What I would like to know is if PS4 did have a beefier dedicated GPU what could the APU do to assist it?
An APU/GPU combo only makes sense, if the GPU part of the APU is not primarily used for rendering in games, but maybe for lowering idle power (like Enduro/switchable graphics) or for accelerating some compute stuff (game physics, emulation of Cell's SPEs, whatever, but this could also be done on the main GPU, especially if one has a larger monolithic SoC).
Could you move all the tessellation and post processing over the the APU?
Post processing maybe. But this would give you a fixed split between the main rendering power and the post processing. I would prefer something more flexible (as one gets when putting the GPU resources into a single GPU).
And tessellation? I don't think so. You create a lot of data which needs to be shoved into the nearest CU. Routing that over global memory (in case of a somehow unified memory pool) or even PCIe to another GPU to do the further processing isn't going to help (maybe save for a very exotic and specialized use case).
Instead of think of an APU consisting of a CPU with integrated GPU. Think of it as a GPU with an integrated CPU. The difference being that the GPU is driving the performance requirements and not typical desktop work loads.
Yes.
 
Last edited by a moderator:
The reason it's usually 2x is that it's usually the easiest thing to do if the devkit is based on the retail unit, not because devs need 2x the amount of memory, though a lot of teams screw themselves by using it in development. Best part is you can always tell who fucked up this way at E3 because they run on devkits and not testkits and the only real reason for that is the game won't run in the final memory footprint.
I donno, that seems like a stretch, often times test kits don't come around till much farther into a console life cycle. For example at EA we didn't get test kits till pretty close to launch for the PS3 and 360. Smaller studio's might not get the test hardware until much later, especially at the beginning of the console life cycle. Not only that, but sometimes the game isn't optimized for the smaller memory foot print, just because of the stage of the project. There might not even be a "ship" or "release" version yet. That's not to say that the dev team screwed up or anything.

There are two versions of the 360 devkits. One version has 512 MB, and the other has 1GB. I used both at work. I wouldn't put too much truck in the "devkits always have twice as much ram" theory.
I would agree, and actually even with the standard retail amounts of ram, you can still deploy network builds, and do a reasonable amount of debugging.

However, one thing to consider in terms of the Xbox360's debug units is that the console was origional destined to have only 256MB's of share memory, before it was upped to 512. So 512MB's *was* double the origional specs.
 
Going back to the latest A10-5800k rumours, what is based on the A10-5800K....Richland?

This time frame would fit in almost perfectly with dev-kits 2, 3 and 4.

http://www.fudzilla.com/home/item/27823-amd-richland-28nm-apu-comes-in-q2-2013

We mentioned quite a few details about AMD’s Richland here and now we managed to get a bit more information, including a launch timeframe.

If AMD manages to stick to the current schedule, production candidate samples should be out in late Q4 2012. So, if all goes well, AMD might choose to show some demos at CES 2013.

Production candidate samples should work at the final frequency and should have all the features. After this development step, Richland should move to a production ready sample stage by mid-Q1 2013 and it should be ready for production in early Q2 2013.

It usually takes a while until GlobalFoundries manages to produce enough chips for launch. The current plan is to build up inventory and launch them in late Q2 2013. If AMD doesn’t hit any delays Computex or the first days of June 2013 seem like a good bet for a more accurate launch timeframe.

---------

I also noticed this, which could tie into the rumours of an underclocked pitcairn. It mentions the compute units being similar. Could this mean that CrossFiring a pitcairn GPU is easier with Richland than Trinity? If there is indeed a 2nd GPU.

http://technewspedia.com/more-details-emerge-richland-future-amd-apu/


For several months we are getting some information about a mysterious amd apu codenamed Richland , APU that is not on the release schedule (roadmap) that AMD published in February this year



The calendar for 2013 AMD have the APU Kaveri (four cores Steamroller and IGP Graphics Core Next “GCN” with 512 shader processors), Kabini (four cores Jaguar and a GCN IGP) and Temash SoC (similar to Kabini, but with different and Southbridge integrated IGP) then what is the APU Richland?


The APU Richland would become similar to a new APU Kaveri, but more economical to produce, ie unlike current APUs AMD microprocessor and where the chips with some damaged units (x86 cores, cache sections, or shader processors IGP) are sold as cut and cheaper versions, AMD will have a production schedule similar to Intel, producing chip variants adapted to different market segments .


While Kaveri will consist of two modules Steamroller (four integer processing cores) and comprises two IGP GCN Compute Units (each with 256 shader processors); Richaland will consist of only one module Steamroller (two integer processing units ) and IGP GCN Compute Unit with a reorganized, which has 192 shader processors, and is very similar to Compute Units that make up the Cape Verde and Pitcairn GPUs.
 
http://technewspedia.com/more-details-emerge-richland-future-amd-apu/
technewspedia said:
While Kaveri will consist of two modules Steamroller (four integer processing cores) and comprises two IGP GCN Compute Units (each with 256 shader processors); Richaland will consist of only one module Steamroller (two integer processing units ) and IGP GCN Compute Unit with a reorganized, which has 192 shader processors, and is very similar to Compute Units that make up the Cape Verde and Pitcairn GPUs.
That appears be a pile of crap as they seem to confuse, what CUs actually are. CUs with 256 or 192 SPs are very similar to CapeVerde or Pitcairn CUs? I don't think so. And it somehow sounds as it was written in another language and then run trough some online translator. :rolleyes:
Edit: The source they link at the end does not contain any of this.
 
Last edited by a moderator:
You're talking about doing some custom work which leaks are pretty much giving the indication that Sony are avoiding this as it's expensive.

Any chip(s) Sony does will be semi-custom, it's not all that expensive. As to the latest leak, it's an all or nothing leak. Either you believe that Sony has lost its mind and is going to go with an A10 caliber APU next gen, or the whole thing is BS. No way did the guy just "forget" to mention a discreet GPU. Personnaly, I think it's a healthy dose of FUD.


The days of expensive consoles with fully custom hardware is all but over imo.
Sure, but it's just as sensless to go with strictly off shelf parts. Again, semi-custom is not that expensive.

I find it funny and yet slightly annoying that leaks are indicating 'XX' hardware will be used and people are still coming up with massive assumptions that it will be modified beyond recognition.

Me too, luckily in about 2 months we'll have solid ideas on both PS4 and xbox720.

Sony have obviously picked an APU for a reason and that reason is not because it's an easy chip to butcher and beef up.

There's lots of good reasons for Sony to go with an APU.
 
An A10 5800k is £90 and for that you get a 3.8Ghz+ quad core and a decent GPU and all of that within 100w.

A dedicated 6670 is £49 which more or less doubles your GPU power.

I'm not sure why you're quoting retail prices as Sony/AMD most suredly wont be working a contract that way.

Using Gipsel's reasonable assumptions for an APU:

A 260mm^2 chip on a 300mm wafer gives you 217 full die per wafer.

217 @ 80% yield gives you about 172 KGD.

$8000 per 28nm processed wafer / 172 = $46.50 per KGD.

a very generous $12 per KGD royalty @ an estimated 80mil units = $960,000,000 + $140,000,000 upfront design payment = $1.1 Billion AMD contract. Pretty damn good for a semi-custom design.

$1.1 Billion is right around what the R&D Chief for Sony said would be the cost for their next gen chip.

$46.50 + 12.00 + 1.50 (packing) = $60 per APU (CPU+GPU). That's pretty reasonable for a $399 console.
 
Indications are that at least one out of Microsoft and Sony will be using an single APU processor and likely taking advantage of GPGPU functionality. This is quite a substantial departure from past console designs.

If they are willing to do this, I wonder why there have been no indications they will attempt to use a mix of high performance per watt (or throughput-optimised) and high single threaded performance (or latency-optimised) CPU cores.
Going solely with steamroller/piledriver cores will deliver the latter but sacrifice the former, while the trade-off with Jaguar would be the opposite.

It would seem ideal to me to instead use, for example, one steamroller/driver module flanked by 4-8 jaguar cores to achieve the best of both worlds.
AMD have shown their ability to mate radically different CPUs and GPUs, originally built for differing process nodes, on the same die, so how much more difficult would it be to combine multiple CPU core types?
Is it technically infeasible to have CPU cores running at different clock speeds working at the same time?
 
I'm not sure why you're quoting retail prices as Sony/AMD most suredly wont be working a contract that way.
As an indicator that even at retail the components are cheap enough to marry in the one device.

$46.50 + 12.00 + 1.50 (packing) = $60 per APU (CPU+GPU). That's pretty reasonable for a $399 console.
Pretty reasonable? More like damned cheap! That'd be a significant profit margin per box at $400.
 
Got bored and decided to build the rumoured PS4!!!

Capture.png


If I can build that with off the shelf PC parts at consumer retail prices imagine how cheap Sony could build it for!!

There's a lot of things which can be reduced...

- Power supply
- PCB real estate for the GPU and RAM
- HDD price
- Blu-Ray drive price

Using parts with big bulk discount prices you could maybe even do that for £200!
 
Status
Not open for further replies.
Back
Top