Make educated guess of DurangOrbis die sizes, tdps, and costs based on VGLeaks

Ninjaprime · Feb 5, 2013

MrFox said:
EDIT: I maintain my educated guess of 40mm2 with an overhead of 50%, so 60mm2. It's less than a third of your 200mm2 guess, I think I'm closer.

Cell size, even with overhead, is not the same thing as working block. Look at a Sandy Bridge quad core die shot, thats 8 MB taking up around 25% of a 216mm^2. Roughly the same size of cells as the TSMC 28nm.

MrFox · Feb 5, 2013

Ninjaprime said:
eDRAM vs SRAM, and probably dense low perf eDRAM on the nintendo part, if I had to guess.

We already know how "dense" it is. 40mm2 at 45nm for 32MB.
Also we know Durango is low speed at 100GB/s, heck it would be very slow even for an eDram bank.

When including overhead, at 45nm Mosys 1T-SRAM-Q (expensive process, 4x density) would end up at 34mm2 and IBM eDRAM would be 45mm2 (even more expensive process, 3x density). SRAM at the same process would be 133mm2 at 45nm, so it would be below 60mm2 at 28nm. The same data from Mosys shows a 50% overhead for SRAM at 90, 65 and 45, so I'm using it for 28 too. It all fits, the numbers still give me 60mm2 no matter how I slice them. Even when using a die shot of the WiiU and crunching the numbers in reverse.

Ninjaprime said:
Cell size, even with overhead, is not the same thing as working block. Look at a Sandy Bridge quad core die shot, thats 8 MB taking up around 25% of a 216mm^2. Roughly the same size of cells as the TSMC 28nm.

It's the wrong reference. Your mistake is caused by your inability to differentiate between a last level cache and a simple pool of ram. That's why you end up with such a hilariously wrong number. You need to explain the WiiU ram area with a better explanation than "it's dense".

Ninjaprime · Feb 5, 2013

MrFox said:
We already know how "dense" it is. 40mm2 at 45nm for 32MB.
Also we know Durango is low speed at 100GB/s, heck it would be very slow even for an eDram bank.

When including overhead, at 45nm Mosys 1T-SRAM-Q (expensive process, 4x density) would end up at 34mm2 and IBM eDRAM would be 45mm2 (even more expensive process, 3x density). SRAM at the same process would be 133mm2 at 45nm, so it would be below 60mm2 at 28nm. The same data from Mosys shows a 50% overhead for SRAM at 90, 65 and 45, so I'm using it for 28 too. It all fits, the numbers still give me 60mm2 no matter how I slice them. Even when using a die shot of the WiiU and crunching the numbers in reverse.

It's the wrong reference. Your mistake is caused by your inability to differentiate between a last level cache and a simple pool of ram. That's why you end up with such a hilariously wrong number. You need to explain the WiiU ram area with a better explanation than "it's dense".

Find me your magic 32MB block of 6T SRAM that fits in "below 60mm^2" then. I'll wait. Even Intels 22nm process probably can't fit 32MB in twice that area.

liolio · Feb 5, 2013

MrFox said:
We already know how "dense" it is. 40mm2 at 45nm for 32MB.
Also we know Durango is low speed at 100GB/s, heck it would be very slow even for an eDram bank.

When including overhead, at 45nm Mosys 1T-SRAM-Q (expensive process, 4x density) would end up at 34mm2 and IBM eDRAM would be 45mm2 (even more expensive process, 3x density). SRAM at the same process would be 133mm2 at 45nm, so it would be below 60mm2 at 28nm. The same data from Mosys shows a 50% overhead for SRAM at 90, 65 and 45, so I'm using it for 28 too. It all fits, the numbers still give me 60mm2 no matter how I slice them. Even when using a die shot of the WiiU and crunching the numbers in reverse.

It's the wrong reference. Your mistake is caused by your inability to differentiate between a last level cache and a simple pool of ram. That's why you end up with such a hilariously wrong number. You need to explain the WiiU ram area with a better explanation than "it's dense".

I can't elaborate on your estimate but if the scratchpad is indeed in that ballpark, 60mm^2, it would make sense to me to cut corners and try to get the chip under the 185mm^2.

A cap verde with that amount would in that ball park. Though looking at Durango, there are 2 more CUs, more memory controllers, as I see it you also have fit the IO (as in Xenos) to feed the CPU, etc. Definitely it would miss the mark.

It really make me wonder about which GPU architecture MSFT chose its GPU.
From juniper was ~1 billion transistors, Cap verde is 1.5 millions the scaling is almost perfect wrt to transistor density. I wonder about how many transistors a 12 SIMD GPU based on Cayman architecture would "weight".
It is worse to notice that the 4 SIMD (/256sp vliw4 design) version of trinity beats in most of the cases (or at worst matches) the highest end llano part which features 5 SIMD (/400sp vliw5 design).
It is quiet a feat, too bad there wasn't that many products released based on that architecture so it is tough to guess how good are the "perf per transistor".

I wish we could compare say Juniper, a 10 SIMD part based on Cayman architecture and cap verde wrt to perfs and perfs per transistors. Looking at both Cayman and trinity I would assert that a 10 SIMD/16 ROPs would be in the ball park of juniper wrt to transistor count while out performing it in every scenario. I would also bet that it would also bet that it performs closer to Cap verde than to Juniper.

Overall I wonder if MSFT could have looked at existing AMD architecture ask for an estimate about how "big" a Vliw4 would be using TSMC 28 nm process and decided it was good enough.
Overall the transistor density could be lower than in Cap verde but it could still be a "win" with regard to the die size. Say if the scaling is 0.6 (GCN pulls a perfect 0.5), assuming a 1 billion transistor chip (~juniper) with 10 SIMD and 16 ROPs (vliw4 design), you get a 100mm^2 chip vs 123 for cap verde. Saving 23mm^2 may not sound like that much but MSFT could definitely be after those kind of "win".

If they could get 2 chips, =<185mm^2 for the GPU and =<80mm^2 for the CPU, definitely I could see the system sell for cheap, as I said pretty much replacing the existing SKU (pretty much the silicon budget of Xenos alone at the 360 launch).

They may do like Nintendo put the two chips on a MCM with a single cooling solution.

Power consumption could be surprisingly low.
25 Watts for the CPU doesn't sound out of place
45-55 Watts for the GPU (extrapolate from cap verde hd 7750, which have less SIMD, the same amount of ROPs, a tad higher clock speed, a more power hungry memory controller but durango has more of them, the figure sounds "right").

~80 Watts for the heart of the system.

EDIT
All this is not super clear to say in another manner, I wonder if AMD it-self for Trinity may have vouched for their VLIW4 design based on its own merit more than based on timeline issues.
It seems that a lot of the win in GCN GPUs were backed into the vliw4 design. Looking at how juniper compares to Cap Verde, I would assert that a "GCNed" Redwood could have ended up in the ~900 millions transistors (from ~600millions). That a beefy increase in silicon budget and it is not clear by which extend it would have beaten the VLIW4 design in trinity which (in transistors) weight mostly the same as the redwood integrate in Llano and has an extra SIMD to play with.
Looking at the choices made by Nvidia from Fermi to Kepler, one has to wonder if MSFT could have come to same conclusion while comparing AMD VLIW4 and GCN design, ie the price paid for the massive increase in compute performance is too high.
I don't state is as GCN doesn't bring improvement in the graphic department but less that what extra SIMD/ROPs could buy you.

It's a quiet inaccurate way to view things but take cap verde die size and a scaling of 60% (GCN achieved 0.5) from 40nm to 28nm, get the size Cap verde could have at 40 nm you get chip of 205mm^2. Juniper is 166mm^2, that's 20 % tinier, the other way the resulting chip is 23% bigger than Juniper.
Now consider that a VLIW4 part of ~170mm would out perform Juniper. Grow that part (add SIMD) till you reach a die size area around 205mm^2.
You will end with grosso mod half a HD6970: so 12 SIMD and 16 ROPs.

You shrink it (same scaling 0.6 as used before), instead of Cap verde you have a part with 12 SIMD, so 20% increase in shading power (or cap verde has 83% the shading power of that hypothetic part). Does GCN improved efficiency make up for that, for graphic alone, I would bet in most case no.

mboeller · Feb 5, 2013

Grall said:
Those numbers in the OP are pretty much 100% bogus (edram won't draw 25W, nor will a handful of GDDR modules eat 40W for example), so I don't see how this thread could go anywhere.

If TSMC* is the foundry for Durango, which is IMO the most likely choice, then it should be SRAM because TSMC does not (yet?) have an 28nm eDRAM process according to their website but only an 40nm eDRAM process.

*= both, the CPU and the GPU were AFAIK designed for this process. For the CPU it is not 100% sure but the most likely foundry after the latest round of rumours.

MrFox · Feb 5, 2013

Ninjaprime said:
Find me your magic 32MB block of 6T SRAM that fits in "below 60mm^2" then. I'll wait.

No.

WiiU is 40mm2, eDRAM is usually 3x density, a shrink to 28nm is 2x.
40mm2 x 3 x 0.5 = 60mm2 (an optimistic figure)

What if WiiU has an impressive 4x eDRAM?
40mm2 x 4 x 0.5 = 80mm2 (a pessimistic figure)

So you can disagree with the WiiU area I started with, or with the shrink to 28nm, or the eDRAM density, or you think there's an additional overhead because they'd use many smaller banks. Which one is it? Which one of these values would you change to end up over 200mm2?

Ninjaprime · Feb 5, 2013

MrFox said:
So you can disagree with the WiiU area I started with, or with the shrink to 28nm, or the eDRAM density, or you think there's an additional overhead because they'd use many smaller banks. Which one is it?

All of them but the first one. You can't just throw generalized numbers together to make a finished product. You don't even know what type of eDRAM the Wii U is using. How can you scale from an unknown? How do you know that the 1T SRAM numbers are still valid, they haven't updated them in 3 or 4 years? Why do the 1T SRAM numbers apply to whatever eDRAM process Wii U used? See what I mean?

MrFox · Feb 5, 2013

Ninjaprime said:
All of them but the first one. You can't just throw generalized numbers together to make a finished product.

I'm not, I'm making Educated Guesses (the title of this thread).
Please proceed... give us your educated guesses for the numbers you disagree with so strongly. Show me where I'm wrong, how do you end up with over 200mm2 for 32MB at 28nm? According to TSMC it's a 400% overhead.

Ninjaprime said:
You don't even know what type of eDRAM the Wii U is using.

I didn't know at the time (I made an educated guess at 45nm), but recently others on B3D mentioned it's probably the NEC/Renesas at 40nm, but might have ECC.

Ninjaprime said:
How can you scale from an unknown? How do you know that the 1T SRAM numbers are still valid, they haven't updated them in 3 or 4 years? Why do the 1T SRAM numbers apply to whatever eDRAM process Wii U used?

know.... you keep using that word, I do not think it means what you think it means.

My guess is based on the fact that all the eDRAM technologies have always been 2x to 4x density over SRAM, and I was looking for the additional overhead they have over SRAM. I use mosys as an example because the numbers were available. It shows the significant overhead compared to SRAM and that it's stable across the shrinks. It's probably at least 3x, and it's not plausible that anyone can do above 4x. We also have the SRAM cell size at 28nm directly from TSMC. My 0.5x is based on that.

The reason I do not believe they used smaller banks, is because I expect their needs and constraints are similar, the low speed of 100GB/s is a hint they didn't go wild.

Proelite · Feb 6, 2013

Condering that Wii U was able to squeeze in 320 shaders GPU in a less than ~80mm^2 of space and at 40nm, Durango might be able to fit it's 768 shaders GPU in less than ~100mm^2 of space.

~200 mm^2 for GPU + CPU + Esram?

liolio · Feb 6, 2013

Proelite said:
Condering that Wii U was able to squeeze in 320 shaders GPU in a less than ~80mm^2 of space and at 40nm, Durango might be able to fit it's 768 shaders GPU in less than ~100mm^2 of space.

~200 mm^2 for GPU + CPU + Esram?

That waht I was thinking earlier, though I was wondering about MSFT using AMD VLIW4 design, lesser performances than GCN though a pretty neat improvement over their previous VLIW5 designs, all pretty for free as far silicon budget in concerned.

Take half Cayman (389/2 ~195mm^2), incidentally a hypothetical 12 SIMD part , go with a scaling of 0.6 from 40nm to 28nm, you get ~120mm^2.

Acert93 · Feb 6, 2013

Shifty Geezer said:
Yes, we need ot be clear on that. Earlier there was some confusion over the RAM being 1T DRAM or SRAM or whatever. We're now told it's low-latency SRAM, so 6 transistor fast SRAM.

I think everyone is reading too much into the VG leaks "low latency."

That is not a number.

Low latency compared to what? eDRAM? The CPU memory pool? Compared to GPU's normal memory latency?

Maybe it is full blown SRAM but I think the first hint it isn't is they are calling it ESRAM and not just SRAM.

Xbat · Feb 19, 2013

vgleaks says the Liverpool SoC used in orbis is considerably bigger than any other amd apu. If memory serves me correctly the A10 is 246mm2.
So what would be a realistic size for orbis SoC?

Brad Grenz · Feb 19, 2013

I always assumed it would be around 300mm2.

Xbat · Feb 19, 2013

Would 300mm2 be considered much bigger than 246mm2?

upnorthsox · Feb 19, 2013

Xbat said:
Would 300mm2 be considered much bigger than 246mm2?

20% bigger would be much bigger yes. It could even go as high as 325mm^2

Xbat · Feb 19, 2013

upnorthsox said:
20% bigger would be much bigger yes. It could even go as high as 325mm^2

You right and I forgot that A10 is 32nm and Liverpool is 28nm so yeah.

loekf · Feb 22, 2013

AlphaWolf said:
Esram vs edram

FYI, at smaller nodes, 1T SRAM = eDRAM. TSMC has its own eDRAM (not sure if they already have it at 28 nm)

TSMC also has high density 6T SRAM.

Biggest problem with eDRAM is leakage (= power).

Arwin · Feb 28, 2013

Most normal people know already that whether or not one of the two has the most powerful system isn't even that important. Whatever performance surprise Microsoft may have, if they can offer the system $100 cheaper than Sony, that's going to have a far bigger effect than whether or not some of the games are upscaled.

Ruskie · Feb 28, 2013

Arwin said:
Most normal people know already that whether or not one of the two has the most powerful system isn't even that important. Whatever performance surprise Microsoft may have, if they can offer the system $100 cheaper than Sony, that's going to have a far bigger effect than whether or not some of the games are upscaled.

Will it be cheaper though? Sure 8GB of GDDR5 memory cost alot, but I'm sure Sony won't have Kinect 2 type of camera built in system so price could actually be very very close.

Xbat · Feb 28, 2013

I thought the esram would also cost quite a bit.

Make educated guess of DurangOrbis die sizes, tdps, and costs based on VGLeaks

Ninjaprime

MrFox

Deludedly Fantastic

Ninjaprime

liolio

Aquoiboniste

mboeller

MrFox

Deludedly Fantastic

Ninjaprime

MrFox

Deludedly Fantastic

Proelite

liolio

Aquoiboniste

Acert93

Artist formerly known as Acert93

Xbat

Brad Grenz

Philosopher & Poet

Xbat

upnorthsox

Xbat

loekf

Arwin

Now Officially a Top 10 Poster

Ruskie

Xbat