Predict: The Next Generation Console Tech

Status
Not open for further replies.
According to AMD, a single Jaguar core is 3.1 mm². Let's say we're looking at a quadcore implementation, we're looking at 12.4 mm² for the cores and 2 MB of cache on TSMC's 28 nm process should be roughly 6 mm². Those numbers are a little optimistic probably, so let's say that a Jaguar quadcore implementation including cache is around 25 mm². An 8-core implementation would be around 50 mm² according to those numbers actually.

BTW, that cache size calculation is done assuming a cache density of 3 Mbit/mm², which pretty much is the maximum cache size possible on TSMC/s 28 nm process and probably a little optimistic in this case. 2 Mbit/mm² is probably more realistic, which results in an L2 cache size of 8 mm². 25 mm² total for a quadcore implementation isn't so unrealistic in other words.

EDIT: My pie in the sky SoC would be this:
-24 CU (1536 SPs) 24 ROPs, 96/48 Texels filtered/clock (int/fp16) and a double rasterizer like on Tahiti. Size should be roughly 225 mm².
-8 Jaguar cores with 4 MB L2 cache total. Size roughly 50 mm².
-64 MB eDRAM L3 cache. Size roughly 45 mm².
-256-bit memory bus and other bits and pieces. Size ~65 mm²
Total Die size is 385 mm². Clocks should be 1.6 to 2 GHz for the CPU cores and 750 MHz for the GPU. TDP around 150 Watts.
8 GB of DDR4 memory at 3200 MHz for a 102 GB/s bandwidth, or 8 GB of GDDR5 memory at 6000 MT/s for 192 GB/s.

Well, as I'm saying, it's a bit pie in the sky, but one can hope :p.

A pitcairn is ~ 212mm² with 20 CU including the 256-bit memory bus size?
I dont understand why a 24CU variation would be significantly bigger?
 
A pitcairn is ~ 212mm² with 20 CU including the 256-bit memory bus size?
I dont understand why a 24CU variation would be significantly bigger?

You're probably right. I was working from Tahiti die size, but maybe it would have been easier to work up from Pitcairn die size.

BTW, here are die shot from AMD Southern Island GPUs: http://news.softpedia.com/news/AMD-Releases-GPU-Die-Shots-of-Sothern-Islands-Series-289091.shtml
I calculated that the 384-bit memory interface takes up at least 65 mm², probably more than that and I was working with 3/4th of a full Tahiti GPU. That worked out to be 225 mm² without the memory interface. It's all rough guessing of course. The nextBox probably won't have all the GPGPU features of Tahiti and will probably be based on some evolution of GCN.

My guess would be that if you scale Pitcairn up to 24 CUs, you'd end up at roughly 240 mm², maybe even a bit smaller. You'd still have 32 ROPs, more than the 24 I put in my SoC, but at 240 mm² it will already include a 256-bit GDDR5 memory interface. If we work from that, my SoC would end up at roughly 340 mm², let's say 350 mm² to be on the safe side. The specs would be the same as the one I claimed, just with 32 ROPs instead of 24.

Another thing, I think that a 24 CUs GPU at 750 MHz with a TDP of say 100 Watts should be possible without binning. The 20 CUs Pitcairn mobile GPU at 850 MHz has a TDP of 75 Watts, to put things into perspective. 100 Watts for the GPU would leave 50 Watts for the rest of the SoC. Still, I think my specs are very much pie in the sky. I'd be really happy if they were close to what we'll get, but I expect something slightly more pedestrian.
 
Last edited by a moderator:
A just slight ot questions
The tsmc 28nm process has evolved (or be tamed) so that a new gpu revision would be littler or dissipate less heat?
 
If a dual chip system is inevitable for getting the most overall transistors in the thing then I'd rather this,

System3:
SoC: 6 cores 6 CUs
GPU: 12CU's

If not, then this,

System4:
SoC: 8cores 18CU's

I don't see the point in having two almost identical chips just so you can "turn one off". It's not that big of a deal. If they must go dual chip make one SoC for CPU with a low end integrated GPU for HSA advantages for physics and general processing and one big GPU dedicated to game graphics. Don't make them rely on doing half the work each and bringing the result together in the middle.
Given the choice I would go with System 4 ;)
Either way 6cores +2/3 CUs + 14Cus GPU

Well with an APU you can indeed turn off the GPU or an even more hypothetical second APU but fot me that is not the main point, the main point is really costs.
1 billion Dollars for broadway revision, the number echoes in my head a lot. It is more that what Sony invested on the Cell, foundry revamping aside my memory tells me it was around 400 millions (it is old I could be off and I've been to lazy to check to say the truth), there also was other contributors but the scale of the project was different.
As Nvidia said in its presentation the cost of implementing logic is going up. So not being in the semi conductor industry (and hence could wrong) I think that putting together 2 different chips even with already existing building blocks could cost a lot of money. I've no idea about how things can go wrong when we speak about billions of transistors vs a couple hundred (like this gen), I would think that the odds of encountering a problem (not bad yields but a really design problem) doesn't scale linearly with the transistor count but faster. A problem on a critical part of the chip can be a pain in the ass.
So I think that design, implementing, testing, etc. two different and complex chips could prove extremely costly, the all process being costly to begin with, more than it used to be.
So that is why instead of an hypothetical APU+GPU I wonder about a dual SoC (I would not put the odd high, next to zero actually).
As a side I don't speaking of almost identical SoC but identical, just of same chips with different parts fused off for the sake of maximizing the use of what is coming back from the foundry (they would be the same chip as the hd7780 and the hd7750 are for example).

According to AMD, a single Jaguar core is 3.1 mm². Let's say we're looking at a quadcore implementation, we're looking at 12.4 mm² for the cores and 2 MB of cache on TSMC's 28 nm process should be roughly 6 mm². Those numbers are a little optimistic probably, so let's say that a Jaguar quadcore implementation including cache is around 25 mm². An 8-core implementation would be around 50 mm² according to those numbers actually.

BTW, that cache size calculation is done assuming a cache density of 3 Mbit/mm², which pretty much is the maximum cache size possible on TSMC/s 28 nm process and probably a little optimistic in this case. 2 Mbit/mm² is probably more realistic, which results in an L2 cache size of 8 mm². 25 mm² total for a quadcore implementation isn't so unrealistic in other words.
Well I did not use those figures, I use a more simpler guesstimation, bobcat are 75mm2 with the gpu, gpu being around half of the die when you take in account the IO,etc. AMD saif thet JAguar are mostly the sane size as bobcat (though on different process), so no gpu, doubling the core count => ~80mm2
It might be safer than focusing on the core size alone, and the cache with all the surrounding glue which can amount to a lot. Guesstimate anyway it is not supposed to be accurate.
 
Last edited by a moderator:
Because sony cares so much about your power bill. I don't see how it is interesting at all in the context of a device which will be always plugged in.
 
Also listed....
It's from the description, which is pretty much inconsequential to the patent. This particular embodiment is actually nothing to do with the patent, because Claim 1 is explicit in describing a system and apparatus yada yada for architecturally dissimilar GPUs.

The intention here is basically to have a 'graphics controller' JIT compiler farm out jobs to two different GPUs. That would fit in with an APU and GPU, but there's still the issue of why bother?! ;) A big, off the shelf GPU will be simpler and easier to work with and more cost effective than splitting the GPU resources across two chips. In a laptop it makes sense to switch between low and high performance parts, but where power is no object you want to provide the easiest, cheapest, simplest set of resources.
 
wouldn´t it be simpler if they implemented some kind of cpu and gpu clock scaling?
hd 7870 idles at less then 15w
they could just let the console idle at less then say 30w if they had different profiles for gaming and browsing.

did ps3 or xbox360 had some kind of idle power saver, looking at the 200w idle power it doesn't look like it had.
 
Reasons to have a Full Fledged GPU + an APU with it own mini GPU:

- Power conservative Dashboard--and can be run independently of the main GPU (tame the lag)
- Power Saving mode for background downloads when device is "ready but not 'on'" watching movies, etc to meet various national requirements
- SmartGlass/WiiU like uses could be setup to use the APU and not touch main screen visuals
- An entire "class" of Arcade/Indie games (maybe even "Xbox360" class visuals)
- Mobile portability (forward looking) to dovetail with the above and expanding the market/brand

I won't say these are all easy to design in but these, and others, may be a reason for both an APU and GPU.
 
30n7w9k.png

Two same GPU ?
 
Can't many of those ideas be solved by GPU throttling (either clock or CU's if that's possible)? As for Wuu-like off-screen rendering (which I consider unlikely as I expect smart device other than Wuu to use smart hardware over dumb screens), the same results can be achieved by rendering the small screen graphics on GPU and then follow that with rendering the main screen. If you have 20 CUs in your system, you have 20 CU units per frame. Whether that's 4 CUs rendering for 1/30th second and 16 CU's rendering 1/30th of a second, or that's 20 CUs rendering for 1/150th's followed by rendering main screen for 4/150ths, it amounts to the same thing. Same with GPGPU workloads. The only advantage I can see for an APU is closer integration between CPU and GPU when sharing workloads, but I don't know how valuable that is with the sorts of implementations we're likely going to see.
 
Well I did not use those figures, I use a more simpler guesstimation, bobcat are 75mm2 with the gpu, gpu being around half of the die when you take in account the IO,etc. AMD saif thet JAguar are mostly the sane size as bobcat (though on different process), so no gpu, doubling the core count => ~80mm2
It might be safer than focusing on the core size alone, and the cache with all the surrounding glue which can amount to a lot. Guesstimate anyway it is not supposed to be accurate.

Here is a die shot of the Ontario SoC: http://chipdesignmag.com/lpd/pangrle/files/2012/08/barry1.png
The CPU cores are in between the 2 L2 cache blocks that are in the lower left and upper left corners of the SoC. According to my measurements, those 2 Bobcat cores along with their cache are around 16.5 mm². A quadcore Bobcat with 2MB of L2 cache total would be 33 mm² on TSMC's 40 nm process. (That's assuming Ontario is 75 mm² in total.) All in all, I'd say that 50 mm² for an 8-core Jaguar implementation with 4 MB of L2 cache is pretty generous.

BTW, here is the article that I got that die shot from: http://chipdesignmag.com/lpd/pangrle/2012/08/09/amd’s-bobcat-processor/ It's pretty interesting actually.
 
Here is a die shot of the Ontario SoC: http://chipdesignmag.com/lpd/pangrle/files/2012/08/barry1.png
The CPU cores are in between the 2 L2 cache blocks that are in the lower left and upper left corners of the SoC. According to my measurements, those 2 Bobcat cores along with their cache are around 16.5 mm². A quadcore Bobcat with 2MB of L2 cache total would be 33 mm² on TSMC's 40 nm process. (That's assuming Ontario is 75 mm² in total.) All in all, I'd say that 50 mm² for an 8-core Jaguar implementation with 4 MB of L2 cache is pretty generous.

BTW, here is the article that I got that die shot from: http://chipdesignmag.com/lpd/pangrle/2012/08/09/amd%E2%80%99s-bobcat-processor/ It's pretty interesting actually.
Well if it is tinier all the better, I did not really care for accuracy, I rounded 75mm^2 to 80mm^2 to begin with ;).
THe point still stand ok you can get a quad core jaguar and a cap verde class of GPU just below the 185mm^2 "limit", it is still a bit short (by self) to power a next generation design, either way you go past 185mm^2 or us another chip.
 
Reasons to have a Full Fledged GPU + an APU with it own mini GPU:

- - SmartGlass/WiiU like uses could be setup to use the APU and not touch main screen visuals

Yes the whole game within the game thing could be done better. Also high quality content on ingame screens,monitors,holopanels etc.
Even a proper 3d HUD.
 
I was thinking, one of the uses would be in the instance of multiple applications active at once.

For example:
Music, Browser, Other Applications that may or may not interact with the game.

I know currently with both PS3 and Xbox360, there are some games that are really pushing the systems and this affects system UI performance. Things like the Xbox Guide slowing down or not being able to do anything with the PS3 XBM while things are loading in games are examples of this.
 
Status
Not open for further replies.
Back
Top