Predict: The Next Generation Console Tech

Status
Not open for further replies.
I was looking at the benchmarks on Tom's for the 8790m. AMD indicated that it should fit in the same power envelope as the 7690m which was in the 20-25W. If that's true. then we're looking at 384 GCN shaders @ ~900MHz in about 25W as well.

Assuming a linear scaling, 384 shaders in 25 Watts, 768 in 50 W, 1152 in 75W, and 1536 in 100W. Depending on the power budget I think anything from 768 to 1536 is doable in a console.

If both console manufactures were targeting 100W for CPU/GPU SOC, I could see something like this: 4 Jaguar cores + 1152 shaders or 8 jaguar cores + 768 shaders.

is APU + GPU setup better than only APU in a nxt gen console or vice-verse ?

Is the power saving more in APU + GPU than in only an APU ! According to the gpu context switching the external gpu can be switched off which increases the power saving than keepin it on a low power state . If only an APU consisting 768 or 1152 spu is used; then the main gpu must be "on" even for the low load operations like video playback which takes more power than a 384 spu gpu in a APU .
 
Last edited by a moderator:
Turning of sections of the same die would surely be preferable than two chips. As in for a low power media mode.

Also motherboard would be easier to design, simpler. I think a single chip is the way forward.
 
http://www.jedec.org/category/technology-focus-area/3d-ics-0
The High Bandwidth Memory (HBM) task group in JC-42 has been working since March 2011 on defining a standard that leverages Wide I/O and TSV technologies to deliver products ranging from 128GB/s to 256GB/s. The HBM task group is defining support for up to 8-high TSV stacks of memory on a data interface that is 1024-bits wide. This interface is partitioned into 8 independently addressable channels to support a 32-byte minimum access granularity per channel. The specification is expected to be completed in late 2012 or early 2013.
I'm more and more optimistic about this. It looks like both HBM and HMC might be possible for late 2013 launch.
It's kind of official now that they went 1024bit, 533Mhz to 1066MHz DDR. So a pair of these chips would be 256GB/s to 512GB/s, they could even go with a single one, and a split memory pool with DDR3.
 
No. That patent is talking about switching between high- and low-power GPUs in the same device to save on power draw when doing light workloads.

The meat of any patent is present in it's first claim:
Pro Tip : To understand any patent, jump straight to the first claim, and never take any journalist's interpretation of a patent as fact. ;)

Also listed....

Embodiments of the present invention as described herein may be extended to enable dynamic load balancing between two or more graphics processors for the purpose of increasing performance at the cost of power, but with architecturally similar GPUs (not identical GPUs as with SLI). By way of example, and not by way of limitation, a context switch may be performed between the two similar GPUs based on which one would have the higher performance for processing agiven set of GPU input. Performance may be based, e.g., on an estimated amount of time or number of processor cycles to process the input.
 
Well If they found a way to do load balancing between two GPUs (like Lucid does), if they were to use this for their next generation system, I still think that they should have gone with 2 identical SoC.
 
I don't see the upsides to two SoC's or SoC + GPU... You increase board complexity, cooling complexity, split the memory pool, or add latency to it by making the controller external and shared, and end up with a machine that's harder to get the most performance out of. I swear, some people just want exotic hardware for the sake of obscuring comparisons to existing, measurable systems so they can continue to live in a console-utopia la-la-land.

I can see a lot of Sony's blind cell-pimping fan base being very disappointed when they announce a system that's actually comparable to something existing and testable. They don't seem to understand that an average-in-theory, yet efficient system with an easy learning curve trumps a monstrous-in-theory, inefficient system with a stupidly steep learning curve that no one can hope to fully utilise. They just want to attach emotion to hardware and fantasise about godlike devs untapping 100% of this generations latest edition of inefficient hardware.

It's like entering a 5000hp quad turbo octal-rotary powered tractor into F1 and saying "yes, it's hard to make use of that extra power in a tractor, and the engine isn't tested to be reliable at all - but we have the best driver!".
 
I don't see the upsides to two SoC's or SoC + GPU... You increase board complexity, cooling complexity, split the memory pool, or add latency to it by making the controller external and shared, and end up with a machine that's harder to get the most performance out of. I swear, some people just want exotic hardware for the sake of obscuring comparisons to existing, measurable systems so they can continue to live in a console-utopia la-la-land.

I can see a lot of Sony's blind cell-pimping fan base being very disappointed when they announce a system that's actually comparable to something existing and testable. They don't seem to understand that an average-in-theory, yet efficient system with an easy learning curve trumps a monstrous-in-theory, inefficient system with a stupidly steep learning curve that no one can hope to fully utilise. They just want to attach emotion to hardware and fantasise about godlike devs untapping 100% of this generations latest edition of inefficient hardware.

It's like entering a 5000hp quad turbo octal-rotary powered tractor into F1 and saying "yes, it's hard to make use of that extra power in a tractor, and the engine isn't tested to be reliable at all - but we have the best driver!".

I don't see what's so wrong about a few people wishing for an APU+GPU as Sony's next console design. It ain't that exotic, and it goes in line with actual PC hardware. And while you may not see the upside, a lot of people on this very board do. One upside is that you will have two smaller chips to manufacture but may end up with a higher overall transistor budget, thus more performance. It may produce better yields without having to be one single chip or MCM. Sure it makes the board layout more complex, but that's been that way for the last 15 years. If it's a solution that provides more performance than an APU by itself than that's even better. And I'm not so sure why this would be such a pain to develop for considering this is how PC gaming and the consoles have been for a while. I personally like the idea of APU+GPU as it allows the APU's GPU to focus on non graphical tasks. Not that bad of a solution really. A super APU would be lovely as well, but may be too cost prohibitive with poor yields. If they can get good yields then that'd be great as well.
 
I don't see the upsides to two SoC's or SoC + GPU... You increase board complexity, cooling complexity, split the memory pool, or add latency to it by making the controller external and shared, and end up with a machine that's harder to get the most performance out of. I swear, some people just want exotic hardware for the sake of obscuring comparisons to existing, measurable systems so they can continue to live in a console-utopia la-la-land.
Well my takes on the matter is just production and R&D costs. I suspect that AMD will not have an off the shelves product that include Jaguar cores that is not an APU. On their high power range of products they don't have either chip of sane size that include no GPU. I can't see Sony for example using those high power 4modules chips even with disabled module.
So if you go with an off the shelves parts you might end with an APU. Either way you don't go with an off the shelves CPU.
If you don't go with an off the shelves CPU it costs money, and it can get worse if you also go with a "not of the shelves" GPU. It is quiet some money on top of already quiet some money.
Things is I don't think that either MSFT of Sony would go with real off the shelves parts. Money has to be spend but my pov is that Sony is a bad shape and that they could invest on only one chip, chip of reasonable size that has good yields and sane price from scratch and use two of them.
I would not bet on shrinking coming fast to lower production price, it will require another massive R&D investment on an ever costlier process.

I take it that you want a single SoC because two chip is too complex (well the 360 launched with 3, and most pc still have two main chip a cpu a gpu). Thing is I could see a relatively big (not insane) chip fitting the bill but with shrink being a more long term goals than it used to be, I would be wary especially for Sony. Then you have the bandwidth issue and the amount of ram you want for your system.
I can't defend others only my pov, Sony should aim for something cheap to produce. 2 SoC head to head (through hyper transport link) and each linked to DDR3 should not be such a crazy complex thing, not more that the rsx and cell combo and their separate pools of memory or most pc for that matter.
Just for the ref, guesstimating based on AMD presentation I think that a quad core jaguar set-up (with cache) should be ~80mm2, if you add a cap verde class of GPU you are ~200mm2 (for the whole APU).
It is a bit light imo, you may want to push a bit further than that as you are already pass 185mm2 and going further seems to imply extra cost with no cheap price reduction in size.
On the other hand you could go with a quad core and between 6/8 Cus on a reasonable size chip and use 2.

If you have enough money you could indeed go with a massive single chip, a custom CPU and custom GPU, etc. do the engineering maks, tests, etc. for all those chips and have something better.
Another thing is that 4 Jaguar cores is not that much of a jump versus this generation of product, with a dual SOC you have CPU cores in spare for OS, possible resources consuming future peripheral. You could also use defective part, say you have one SOC that is fully functional and the other as only 2 or 3 CPU cores active (out of four).
If they are confident in their load balancing technique (and if it is something more than an empty patent) they could also paired "odd configuration" for the GPUs.
For example:
System1:
SoC 1: 4 cores 6 CUs
SoC 2: 2 cores 7 CUs
System2:
SoC 1: 4 cores 7 CUs
SoC 2: 2 cores 6 CUs

With system 1 and 2 performing the same or (too close to call) and the SoC would fully functional would be 4 cores 7 CUs.

My POV is focused on costs or try to.

Overall wrt software I would think that being symmetric a dual Soc would be a tad less tricky to begin with, identical GPUs, till better use of resources is mastered (load balancing between the GPUs) you can still rely on AFR kind of solution, on console with v-sync on more than often now it would not be much of an issue.

I can see a lot of Sony's blind cell-pimping fan base being very disappointed when they announce a system that's actually comparable to something existing and testable. They don't seem to understand that an average-in-theory, yet efficient system with an easy learning curve trumps a monstrous-in-theory, inefficient system with a stupidly steep learning curve that no one can hope to fully utilise. They just want to attach emotion to hardware and fantasise about godlike devs untapping 100% of this generations latest edition of inefficient hardware.

It's like entering a 5000hp quad turbo octal-rotary powered tractor into F1 and saying "yes, it's hard to make use of that extra power in a tractor, and the engine isn't tested to be reliable at all - but we have the best driver!".
That was funny, real geeks want a larrabee clone :)
 
With excavator amd wants to built a single chip that can do both CPU and gpu tasks and enabling HSA features . Can amd design the ps4 APU( 4-8 jaguar cores + 384 GCN cores) on a single chip like the excavator ? And is it possible to include an APU + gpu on a MCM ? How much will the heat produce affect on a MCM setup ? Or using identical Gpus with 4-8 CPU cores on a MCM !
 
With excavator amd wants to built a single chip that can do both CPU and gpu tasks and enabling HSA features . Can amd design the ps4 APU( 4-8 jaguar cores + 384 GCN cores) on a single chip like the excavator ? And is it possible to include an APU + gpu on a MCM ? How much will the heat produce affect on a MCM setup ? Or using identical Gpus with 4-8 CPU cores on a MCM !
I don't know. Excavators should not be there before quiet a while and we don't know how far GCN 2 will go as far as sharing the same memory space as the CPU.
AMD is quiet late to that game and HSA is for now a moving target to me. Intel IGP already share the same memory space as their CPU, though it not shown by 3d drivers because APIs don't support that, but it shows wrt media encoding tasks. D.Kanter articles at realworldtech are interesting on the matter ;)
ARM is already there too between its A7/15 cores and starting from the Mali 6xx line of GPUs. Same here no support from existing APIs.

Same for the MCM, I don't know how heat affect it, though Intel had dual pentium 4 on Mcm so it can handle heat somehow.
Looking at what Nintendo uses in the WiiU I would say that two reasonable chips could fit on such a piece, it may simplify cooling, I don't either.
 
I read once that AMD would cancel Steamroller and launch Excavator instead, it makes some sense, dump the postponed design and fast-track the better one.
But then there was that troll article saying AMD cancels everything, keeping only Jaguar and ARM - AMD responded by denying it cancels anything at all. So a Steamroller on PS4 is still likely.
 
Streamroller is delayed that is sad but sure, they have trinity v2 now for next year :(
 
Last edited by a moderator:
I don't see what's so wrong about a few people wishing for an APU+GPU as Sony's next console design. It ain't that exotic, and it goes in line with actual PC hardware.

It's only used for actual PC hardware because that's all they can do with it... they can't make a big APU until there is a fast memory solution for it and until it can be justified with non-gaming uses, because such a thing won't be attractive to gamers for quite some time. They have this fairly respectable lower end GPU already there, it seems a waste to just use it for non-gaming tasks when you can add something similar and hack it up with AFR... well that's how it works in theory anyway. I don't really think it's a great solution for the masses (for PC's) but in a laptop where you don't really want your APU's fairly capable GPU just switching off when it could be helping out, it sort of makes sense.

Yes, consoles would do it a lot better than this. I think a solution with an APU that features a low end GPU makes some sense IF it's paired with a mid-high end discreet GPU. Overall I think a big APU is better, but it it means a 120mm^2 APU with a 230mm^2 GPU vs. a 300mm^2 APU for example, then yeah, it's a nice solution to get more overall die area without venturing into that "too big for profit and yields" region. I think the idea of pairing two identical low-mid end APU's is stupid, however. There just isn't enough good reasons for doing it vs. the downsides.
 
System1:
SoC 1: 4 cores 6 CUs
SoC 2: 2 cores 7 CUs
System2:
SoC 1: 4 cores 7 CUs
SoC 2: 2 cores 6 CUs

If a dual chip system is inevitable for getting the most overall transistors in the thing then I'd rather this,

System3:
SoC: 6 cores 6 CUs
GPU: 12CU's

If not, then this,

System4:
SoC: 8cores 18CU's

I don't see the point in having two almost identical chips just so you can "turn one off". It's not that big of a deal. If they must go dual chip make one SoC for CPU with a low end integrated GPU for HSA advantages for physics and general processing and one big GPU dedicated to game graphics. Don't make them rely on doing half the work each and bringing the result together in the middle.
 
If a dual chip system is inevitable for getting the most overall transistors in the thing then I'd rather this,

System3:
SoC: 6 cores 6 CUs
GPU: 12CU's

If not, then this,

System4:
SoC: 8cores 18CU's

I don't see the point in having two almost identical chips just so you can "turn one off". It's not that big of a deal. If they must go dual chip make one SoC for CPU with a low end integrated GPU for HSA advantages for physics and general processing and one big GPU dedicated to game graphics. Don't make them rely on doing half the work each and bringing the result together in the middle.

I vote for System3!.
 
If a dual chip system is inevitable for getting the most overall transistors in the thing then I'd rather this,

System3:
SoC: 6 cores 6 CUs
GPU: 12CU's

If not, then this,

System4:
SoC: 8cores 18CU's

I don't see the point in having two almost identical chips just so you can "turn one off". It's not that big of a deal. If they must go dual chip make one SoC for CPU with a low end integrated GPU for HSA advantages for physics and general processing and one big GPU dedicated to game graphics. Don't make them rely on doing half the work each and bringing the result together in the middle.

Indeed. AMD and Nvidia have got idle power draw for flagship GPUs into the single digits, yet somehow we need some odd split scheme of power. I think it will be APU with GPU off for non gaming media tasks and then APU's GPU is a GPGPU during gaming. The interesting stuff will be the memory hierarchy and the packaging.
 
Can someone please explain how x CUs GPGPU processing in an APU + y CUs graphics processing in a discrete GPU is better than x+y CUs discrete GPU doing both in a more versatile package?
 
Just for the ref, guesstimating based on AMD presentation I think that a quad core jaguar set-up (with cache) should be ~80mm2, if you add a cap verde class of GPU you are ~200mm2 (for the whole APU).

According to AMD, a single Jaguar core is 3.1 mm². Let's say we're looking at a quadcore implementation, we're looking at 12.4 mm² for the cores and 2 MB of cache on TSMC's 28 nm process should be roughly 6 mm². Those numbers are a little optimistic probably, so let's say that a Jaguar quadcore implementation including cache is around 25 mm². An 8-core implementation would be around 50 mm² according to those numbers actually.

BTW, that cache size calculation is done assuming a cache density of 3 Mbit/mm², which pretty much is the maximum cache size possible on TSMC/s 28 nm process and probably a little optimistic in this case. 2 Mbit/mm² is probably more realistic, which results in an L2 cache size of 8 mm². 25 mm² total for a quadcore implementation isn't so unrealistic in other words.

EDIT: My pie in the sky SoC would be this:
-24 CU (1536 SPs) 24 ROPs, 96/48 Texels filtered/clock (int/fp16) and a double rasterizer like on Tahiti. Size should be roughly 225 mm².
-8 Jaguar cores with 4 MB L2 cache total. Size roughly 50 mm².
-64 MB eDRAM L3 cache. Size roughly 45 mm².
-256-bit memory bus and other bits and pieces. Size ~65 mm²
Total Die size is 385 mm². Clocks should be 1.6 to 2 GHz for the CPU cores and 750 MHz for the GPU. TDP around 150 Watts.
8 GB of DDR4 memory at 3200 MHz for a 102 GB/s bandwidth, or 8 GB of GDDR5 memory at 6000 MT/s for 192 GB/s.

Well, as I'm saying, it's a bit pie in the sky, but one can hope :p.
 
Last edited by a moderator:
Can someone please explain how x CUs GPGPU processing in an APU + y CUs graphics processing in a discrete GPU is better than x+y CUs discrete GPU doing both in a more versatile package?

I think the only way APU + discrete GPU (maybe) makes sense is if you had multiple product lines that share the APU as the main processor. For example, XboxTV a $149 settop box for media and XBLA runs off the APU. Xbox3 runs off the APU but has an additional GPU for the high end games.

But other than better individual chip yields, I don't think there a benefit.
 
According to AMD, a single Jaguar core is 3.1 mm². Let's say we're looking at a quadcore implementation, we're looking at 12.4 mm² for the cores and 2 MB of cache on TSMC's 28 nm process should be roughly 6 mm². Those numbers are a little optimistic probably, so let's say that a Jaguar quadcore implementation including cache is around 25 mm². An 8-core implementation would be around 50 mm² according to those numbers actually.

BTW, that cache size calculation is done assuming a cache density of 3 Mbit/mm², which pretty much is the maximum cache size possible on TSMC/s 28 nm process and probably a little optimistic in this case. 2 Mbit/mm² is probably more realistic, which results in an L2 cache size of 8 mm². 25 mm² total for a quadcore implementation isn't so unrealistic in other words.

EDIT: My pie in the sky SoC would be this:
-24 CU (1536 SPs) 24 ROPs, 96/48 Texels filtered/clock (int/fp16) and a double rasterizer like on Tahiti. Size should be roughly 225 mm².
-8 Jaguar cores with 4 MB L2 cache total. Size roughly 50 mm².
-64 MB eDRAM L3 cache. Size roughly 45 mm².
-256-bit memory bus and other bits and pieces. Size ~65 mm²
Total Die size is 385 mm². Clocks should be 1.6 to 2 GHz for the CPU cores and 750 MHz for the GPU. TDP around 150 Watts.
8 GB of DDR4 memory at 3200 MHz for a 102 GB/s bandwidth, or 8 GB of GDDR5 memory at 6000 MT/s for 192 GB/s.

Well, as I'm saying, it's a bit pie in the sky, but one can hope :p.

how much heat will the soc dissipate ?
 
Status
Not open for further replies.
Back
Top