Make educated guess of DurangOrbis die sizes, tdps, and costs based on VGLeaks

It's probably a DRAM array with a SRAM interface of some sort.

It doesn't have to be Mosys' 1T-SRAM, which is very short latency, it could have longer latencies (andhigher density).

Cheers
 
Some forumers over on Semiaccurate think the chip or SoC might be a stack on SRAM. I think that kinda explains how they can integrate the huge die with everything else... Charlie says its an active interposer. In other words SRAM = Interposer. Also I don't think it needs to be 28nm in that case.

The primary reason to use SRAM over EDRAM for large arrays is the processing steps required for DRAM compromise logic performance. If it is a different die alltogether I would expect them to use DRAM since you have at least four times as much capacity then.

Cheers
 
The primary reason to use SRAM over EDRAM for large arrays is the processing steps required for DRAM compromise logic performance. If it is a different die alltogether I would expect them to use DRAM since you have at least four times as much capacity then.

Maybe they valued latency over capacity then? And any yield or sourcing issues too.
 
Maybe they valued latency over capacity then? And any yield or sourcing issues too.

There's a limit to what SRAM can do, or needs to do for latency if it's a separate die or if it's on the far side of the same chip. If it's not linked as tightly as an L1 or L2 cache, the latency of the memory is added on top of the miss latency of the nearer caches, signal transit time, and arbitration overhead.

IBM has shown that having EDRAM as a last-level cache can be mostly equivalent to an SRAM (just several times bigger) for the sorts of chips it makes, save some corner cases where the slower DRAM arrays and banking conflicts can cause latency to flare up to the point that IBM's design can't hide that they aren't SRAM.

In some circumstances, EDRAM for large memory pools can provide a benefit because its density is such that the physical distances signals need to travel are smaller.
If it's across an interface as well, then SRAM's benefits are swamped even further.
 
Right that's why if it's all in the same package as the interposer there wouldn't be any latency problems. I think that's the only thing that can explain all the whys and hows of going with SRAM.
 
Is somebody like Aaron Pink in the surrounding?
TSMC claims a peak density of 3900 Kgates/mm^2, I don't know how it translates in MB and I guess it depends on the structure of that pool of memory.
I guess it could give us quiet some reliable estimates based on that raw data.

My take is that it is plain SRAM on the same die as the GPU.

For the ref INtell 22nm value:
a high-density 0.092- µm2 cell, a low-voltage 0.108- µm2 cell, and a high-performance 0.130-µm2 cell. The SRAM operated at 4.6 GHz at 1 V.”
TSMC is High perf @28nm is 0.127µm2
 
6 transistors/bit * 8 * 32000000 = 1.5Billion transistors for the actual memory. That would probably be less than 120mm2 @28nm but there would be other logic on there.

wow.gif
Wow that is really big, I never expected it to be so large! With ~70mm2 for CPU and ~140mm2 GPU..... they will almost reach the chip size of Radeon 7950/70.
 
Durango actually may have bigger SOC then Orbis. eSRAM will "make up" for Orbis CU advantage in space size, and then there will also be DMEs too but we don't know their transistor count or die space they take.
 
http://www.chipworks.com/blog/technologyblog/2012/12/11/a-review-of-tsmc-28-nm-process-technology/
Earlier this year, we completed a limited analysis of the high density SRAM on the AMD RadeonTM HD 7970 215-0821060 graphics processor, which was fabricated with TSMC’s HP process. Our TEM analysis confirmed the 215-0821060 transistor structure was identical to that seen in the Altera Stratix V device, as would be expected since both are based on the TSMC 28 nm HP process. The 215-0821060 features a 0.16 µm2 6T-SRAM with the transistors arranged in a uniaxial layout. By contrast the 90 nm ATI 215PADAKA12FG graphics processor extracted from ATI Radeon X1950 Pro Graphics Card had a SRAM cell that is over five times bigger, at 0.86 µm2.
32MB would be only 40mm2?
 
Why is there such a big difference between the sram test chip (0.127) and an array on a real GPU (0.16)?
 
Hilariously wrong.
How that post is supposed to be helpful? I mean it seems that you doesn't know how much of an over head there is to put those memory cell together.
I don't know, MrFox doesn't know either, it is not an issue by self but I don't see why you answered in such a mocking tone... At least MrFox dig into an article (which I actually read too while I was my self searching for information before posting) and brang some extra information in the conversation.
On the other hand you post is hilariously useless to follow your wording, and whereas that is not my point to sound harsh you can see for your self that it is an unpleasant way to comment on one's post.

40mm^2 would indeed be without overhead (an useless bunch of memory cells) but how big is the overhead? But even that is unclear, as memory size in TSMC own test chip and the size measured in GPU is different as MrFox is pointing out.

For cache it has to be really big. A 50% overhead would be ~60mm^2. 100% get to 80mm^2.

My guess is that Durango is made of 2 chips:
1) the GPU which include 32MB of ESRAM.
Looking at cap verde and what could be the size of the scratchpad (lets go with 80mm^2), I could see the chip being in the same ballpark as pitcairn. That would be with 12 CUs and something based on GCN.
There would be no trouble to fit the IO (a 256 bit bus to the RAM and a reasonably fast link to the CPU ~30GB/s).
2) the CPU, 8 jaguar core and the L2 would not take much room, even with extra units having specific purpose (security, sound, what not). If there is only cpu cores and cache and the IO, definitely 80mm^2 is enough. Adding stuff would not make the chip "big" anyway I look at it.

Overall if I follow that line of thinking I could see the whole set-up being pretty affordable. The silicon budget would not be much higher than what we have in the last xbox revision.
Down the line the plan would be to shrink the system only once, putting the GPU and GPU on the same SoC.

Still the GPU(/north bridge) would be above the 185mm^2 and as such be a tad more costly than it could be. Overall the system design seems really biased toward "low" production costs to the point where it gets me to wonder if MSFT could have pushed further.
The tiniest GPU that sold with a 256 bit bus was barely above 190mm^2. I wonder if it would be doable to get all the IO in a chip that is 185mm^2 or just below.
I'm do not know how DDR3 vs GDDR5 wrt to the number of pins you need, I guess some people here know.

So going further down my line of thinking, I could see MSFT using something that is neither GCN or previous AMD GPU architectures.
I think that a significant amount of the transistors AMD spent on GCN may be vouched as useless for MSFT primary use which is graphics.
GCN architecture has to keep track of more threads (vs vliw4/5 architectures), the amount of "memory" on chip has increased to meet Dx requirements, etc. You have ACEs.
To how much space that amount is unknown, though looking at the transistor count there was a beefy increase between from say juniper to cap verde, turned well in both perfs and transistors density though.
When it all said and done, I could see MSFT cut corner and take parts of different GPU architectures to try to get the GPU as tiny as they can get it to be.

If I look at the SIMD alone the difference in efficiency is not much between VLIW4 architectures and GCN. GCN dealing with graphics should keep 100% of its ALUS busy (in a ALUS bound scenario obviously) whereas according to AMD own data, on average (in the same ALUs bound scenario) the VLIW keeps 3.8 alu busy out of 4 => 95%.

I could see them use lesser sized "global data share" and "local data share" disregarding the requirements they set in the PC realm.
I could see them pass on the improvements GCN GPUs brang wrt to tesselation, in a close environment I could see Cayman level of performance being enough. MSFT has enough grip to enforce on editors a proper level (the level matching the perfs of their hardware) of adaptive tessellation.

I could see them pass on what seems to be a reworked block in GCN gpus which might include the command processor and the ACEs.

The one thing they should definitely take out of GCN (not sure of my wordinghere ... I mean use GCN ROPs) is the ROPs that seems way more efficient than the ones in previous architecture.

Overall I could see something in many regard closer to HD69xx (or what is in Trinity) than to the discrete GPU AMD ships nowadays.

If my goal were to be really cheap I would make quiet some trade off to get the chip at 185mm^2 (or just below). Those trade off save money and that could be the part engineers could not compromise on whereas the amount of performance lost would be minimal in percent vs say the ~220mm^2 GPU/chip (GCN based) I were discussing at the beginning of my post.

Actually I would go as far as cutting the amount of SIMD to 10 if needed. Which looking at the rumors we have could mean that 2 SIMDs are include on the CPU die. Again it would possibly less optimal but looking at costs as the primary driver for the system... Edit actually I could see them using GCN as Compute performance could be more relevant.

In which case, MSFT could end with something like that (I'm not pretending that it would have not impact on performance, I'm arguing it would have a reasonable impact):
1 ~185mm^2 chip including the GPU and the scratchpad memory.
2 80/90mm^2 chips including the 8 jaguar cores and the L2 and a 2 SIMD GPU.

Now along with DDR3 I could see MSFT pretty replace the existing 360 SKUs (may be not the arcade if they want the HDD to be standard which I would wish) with the new system keeping the pricing structure and subsiding the difference in BOM between those products.
To me the kind of specs were hear about Durango hints at a pretty aggressive pricing strategy.

Edit If IO are an issue and if it doable to connect 8GB to a 192 bit bus (looking at some pc shipping with bobcat APU so 64 bit bus along with 4 GB of RAM it should be doable) I would make that trade off too. Definitely looking at the GPU (# of ROPS and ALUs and the amount of bandwidth the scratchpad is supposed to provide) it is unclear to which extend a lesser amount of bandwidth to the main RAM would alter performances /+60GB/s sounds a bit over kill

Edit 2 None of that efforts/arbitrations may have been needed depending on the actual size of the scratchpad + move engines still it could impact IO.
 
Last edited by a moderator:
How that post is supposed to be helpful?

It wasn't. It was posted to let him know he was wrong. If I was paid to teach people things I might bother to make a long post why, but since I am not being paid, I'll let people do their own research and figure it out.
 
It wasn't. It was posted to let him know he was wrong. If I was paid to teach people things I might bother to make a long post why, but since I am not being paid, I'll let people do their own research and figure it out.
The spirit of discussion does promote an open sharing of knowledge and ideas without material recompense. You're under no obligation of course, but a lack of at least a line of broad correction moves your contribution outside of the scope of discussion. I can understand not doing that for trolls, but MrFox is partaking in a legitimate line of conversation.
 
It wasn't. It was posted to let him know he was wrong. If I was paid to teach people things I might bother to make a long post why, but since I am not being paid, I'll let people do their own research and figure it out.

According to the Chipworks die shot the big edram block on the Wii U is ~40 mm^2 (or just under). Are you saying that it's not, or that it's not got 32 MB of edram? Or am I misunderstanding completely?
 
It wasn't. It was posted to let him know he was wrong. If I was paid to teach people things I might bother to make a long post why, but since I am not being paid, I'll let people do their own research and figure it out.
Ninjaprime, I'm laughing at the superior intellect.

Previously I brought up that Mosys is describing their 1T cell size with an included overhead, making it easy to calculate memory density. Afterwards, I wanted to give the 0.16 figure as a better starting point than 0.127 because it's based on a real implementation (not a test chip). I wasn't sure what the norm was for calculating the overhead, what needed to be included when talking about density (because I assumed it's not linear, and depends on the chosen granularity, width, probably other things), I was expecting others to add that information, or give a rule of thumb similar to what Mosys are giving for their memory... but I don't have money to give you, because that information wants to be free, you are holding it hostage.

Anyway, for the difference between 0.127 and 0.16:
http://www.realworldtech.com/iedm-2010/6/
The 0.127um2 cells are tuned for maximum density, but require 1.1V for operation. Trading off density for lower operating voltage (e.g. in SRAMs used with logic), TSMC also provides a 0.155um2 cell that requires ~0.7V.
So even if we had the "test chip" density, it's probably not a useful figure. It also explains why the GPU had a 0.16 cell size.

It would be a gain, its just not possible. These chips are mostly likely 28nm, which is pretty much the same scale-wise as Intel's 32nm. The largest server chip Intel made on 32nm was Westmere-EX, and it was 10 cores and had only 30MB of L3 cache on it. The L3 array takes up, ~40% of the chip, and its a giant 513mm^2 chip. It would be ridiculous for a console to use up 200mm^2 for a 32MB chunk of cache.
EDIT: I maintain my educated guess of 40mm2 with an overhead of 50%, so 60mm2. It's less than a third of your 200mm2 guess, I think I'm closer.
 
Last edited by a moderator:
According to the Chipworks die shot the big edram block on the Wii U is ~40 mm^2 (or just under). Are you saying that it's not, or that it's not got 32 MB of edram? Or am I misunderstanding completely?

eDRAM vs SRAM, and probably dense low perf eDRAM on the nintendo part, if I had to guess.
 
Back
Top