How that post is supposed to be helpful? I mean it seems that you doesn't know how much of an over head there is to put those memory cell together.
I don't know, MrFox doesn't know either, it is not an issue by self but I don't see why you answered in such a mocking tone... At least MrFox dig into an article (which I actually read too while I was my self searching for information before posting) and brang some extra information in the conversation.
On the other hand you post is hilariously useless to follow your wording, and whereas that is not my point to sound harsh you can see for your self that it is an unpleasant way to comment on one's post.
40mm^2 would indeed be without overhead (an useless bunch of memory cells) but how big is the overhead? But even that is unclear, as memory size in TSMC own test chip and the size measured in GPU is different as MrFox is pointing out.
For cache it has to be really big. A 50% overhead would be ~60mm^2. 100% get to 80mm^2.
My guess is that Durango is made of 2 chips:
1) the GPU which include 32MB of ESRAM.
Looking at cap verde and what could be the size of the scratchpad (lets go with 80mm^2), I could see the chip being in the same ballpark as pitcairn. That would be with 12 CUs and something based on GCN.
There would be no trouble to fit the IO (a 256 bit bus to the RAM and a reasonably fast link to the CPU ~30GB/s).
2) the CPU, 8 jaguar core and the L2 would not take much room, even with extra units having specific purpose (security, sound, what not). If there is only cpu cores and cache and the IO, definitely 80mm^2 is enough. Adding stuff would not make the chip "big" anyway I look at it.
Overall if I follow that line of thinking I could see the whole set-up being pretty affordable. The silicon budget would not be much higher than what we have in the last xbox revision.
Down the line the plan would be to shrink the system only once, putting the GPU and GPU on the same SoC.
Still the GPU(/north bridge) would be above the 185mm^2 and as such be a tad more costly than it could be. Overall the system design seems really biased toward "low" production costs to the point where it gets me to wonder if MSFT could have pushed further.
The tiniest GPU that sold with a 256 bit bus was barely above 190mm^2. I wonder if it would be doable to get all the IO in a chip that is 185mm^2 or just below.
I'm do not know how DDR3 vs GDDR5 wrt to the number of pins you need, I guess some people here know.
So going further down my line of thinking, I could see MSFT using something that is neither GCN or previous AMD GPU architectures.
I think that a significant amount of the transistors AMD spent on GCN may be vouched as useless for MSFT primary use which is graphics.
GCN architecture has to keep track of more threads (vs vliw4/5 architectures), the amount of "memory" on chip has increased to meet Dx requirements, etc. You have ACEs.
To how much space that amount is unknown, though looking at the transistor count there was a beefy increase between from say juniper to cap verde, turned well in both perfs and transistors density though.
When it all said and done, I could see MSFT cut corner and take parts of different GPU architectures to try to get the GPU as tiny as they can get it to be.
If I look at the SIMD alone the difference in efficiency is not much between VLIW4 architectures and GCN. GCN dealing with graphics should keep 100% of its ALUS busy (in a ALUS bound scenario obviously) whereas according to AMD own data, on average (in the same ALUs bound scenario) the VLIW keeps 3.8 alu busy out of 4 => 95%.
I could see them use lesser sized "global data share" and "local data share" disregarding the requirements they set in the PC realm.
I could see them pass on the improvements GCN GPUs brang wrt to tesselation, in a close environment I could see Cayman level of performance being enough. MSFT has enough grip to enforce on editors a proper level (the level matching the perfs of their hardware) of adaptive tessellation.
I could see them pass on what seems to be a reworked block in GCN gpus which might include the command processor and the ACEs.
The one thing they should definitely take out of GCN (not sure of my wordinghere ... I mean use GCN ROPs) is the ROPs that seems way more efficient than the ones in previous architecture.
Overall I could see something in many regard closer to HD69xx (or what is in Trinity) than to the discrete GPU AMD ships nowadays.
If my goal were to be really cheap I would make quiet some trade off to get the chip at 185mm^2 (or just below). Those trade off save money and that could be the part engineers could not compromise on whereas the amount of performance lost would be minimal in percent vs say the ~220mm^2 GPU/chip (GCN based) I were discussing at the beginning of my post.
Actually I would go as far as cutting the amount of SIMD to 10 if needed. Which looking at the rumors we have could mean that 2 SIMDs are include on the CPU die. Again it would possibly less optimal but looking at costs as the primary driver for the system...
Edit actually I could see them using GCN as Compute performance could be more relevant.
In which case, MSFT could end with something like that (I'm not pretending that it would have not impact on performance, I'm arguing it would have a reasonable impact):
1 ~185mm^2 chip including the GPU and the scratchpad memory.
2 80/90mm^2 chips including the 8 jaguar cores and the L2 and a 2 SIMD GPU.
Now along with DDR3 I could see MSFT pretty replace the existing 360 SKUs (may be not the arcade if they want the HDD to be standard which I would wish) with the new system keeping the pricing structure and subsiding the difference in BOM between those products.
To me the kind of specs were hear about Durango hints at a pretty aggressive pricing strategy.
Edit If IO are an issue and if it doable to connect 8GB to a 192 bit bus (looking at some pc shipping with bobcat APU so 64 bit bus along with 4 GB of RAM it should be doable) I would make that trade off too. Definitely looking at the GPU (# of ROPS and ALUs and the amount of bandwidth the scratchpad is supposed to provide) it is unclear to which extend a lesser amount of bandwidth to the main RAM would alter performances /+60GB/s sounds a bit over kill
Edit 2 None of that efforts/arbitrations may have been needed depending on the actual size of the scratchpad + move engines still it could impact IO.