Wii U hardware discussion and investigation *rename

Status
Not open for further replies.
Can anyone find a HQ die shot of a 40nm, DP-less GPU like Redwood? That might help fill in the picture of possible densities.

Edit: can't find one of 55nm rv730 or rv710 either
 
Last edited by a moderator:
So you suspect DP was removed from wuugpu? If that is so, that would harm the GPGPU capabilities of the chip - which is a flag I'm not sure Nintendo ever really carried in public at least, but many fans certainly did.

Of course, AMD's VLIW architecture wasn't ever any damn good really for GPGPU apps, but no DP would just be another aspect where wuu continues to underwhelm.
 
So you suspect DP was removed from wuugpu? If that is so, that would harm the GPGPU capabilities of the chip - which is a flag I'm not sure Nintendo ever really carried in public at least, but many fans certainly did.

Of course, AMD's VLIW architecture wasn't ever any damn good really for GPGPU apps, but no DP would just be another aspect where wuu continues to underwhelm.


I've nothing to add to this debate, but just thought I'd correct you there. Iwata introduced it as a GPGPU in the hardware Nintendo direct. Mentioned quite a few times ;) Which is why everyone went apesh*t on it!
 
So you suspect DP was removed from wuugpu? If that is so, that would harm the GPGPU capabilities of the chip - which is a flag I'm not sure Nintendo ever really carried in public at least, but many fans certainly did.

Of course, AMD's VLIW architecture wasn't ever any damn good really for GPGPU apps, but no DP would just be another aspect where wuu continues to underwhelm.

I'm just spitballing for ways in which Latte might have higher shader density than Brazos and ... erm ... Llano.

Loss of DP might not eliminate usefulness for GPGPU entirely - Xenos doesn't have DP capability afaik but it's been used for GPU compute.
 
Cell's SPE DP is much slower than SP too, and the SIMD on Cell PPE and Xenon are SP only. I don't think DP is really important for most gaming tasks. With FMA you can more or less emulate the same precision of DP FMUL/FADD using SP operations (not the same dynamic range, but that's even less important), at a cost that isn't that much worse than what you get on ALUs that weren't optimized for DP throughput.
 
http://i.imgur.com/mrWZj77.jpg
Scaled to size assumed to be 8.3mm X 9 mm for 75mm^2( some sites say 80mm, if that is the case my numbers for the measurements would be 7% lower than they should be for Brazos)

gNkktjV.jpg

Wii U's SIMD blocks are larger but not 2x as large, it looks like the caches occupy more area. If both are believed to be from TSMC's 40nm process, it looks like what is inside the Wii U has more cache.

0tak1ZS.jpg

The cache blocks are these size. if we use these size, the total cache area inside 1 Brazos SIMD is 2656px. For 2 Wii U blocks it is 3648 px. Cache area on die is then 37% larger. In comparison, the whole SIMD block is 50% larger when comparing 1 Brazos SIMD to 1 Wii U.
 
So you suspect DP was removed from wuugpu? If that is so, that would harm the GPGPU capabilities of the chip - which is a flag I'm not sure Nintendo ever really carried in public at least, but many fans certainly did.

Of course, AMD's VLIW architecture wasn't ever any damn good really for GPGPU apps, but no DP would just be another aspect where wuu continues to underwhelm.
Maybe they added DP to Wii U. And that is why its larger than Brazos.
 
If it's Renesas's 40nm and not TSMC's it may just not be as dense.. TSMC's 40nm is very dense. Renesas offers a lot of the IP for modules this GPU would have (but TSMC probably does as well). Chipworks has looked at plenty of Renesas chips so they'd probably have recognized it, but they didn't give insights in any official capacity so the commentators aren't necessarily their best representatives.
 
If it's Renesas's 40nm and not TSMC's it may just not be as dense.. TSMC's 40nm is very dense. Renesas offers a lot of the IP for modules this GPU would have (but TSMC probably does as well). Chipworks has looked at plenty of Renesas chips so they'd probably have recognized it, but they didn't give insights in any official capacity so the commentators aren't necessarily their best representatives.
Hmmmm I thought chipworks said it was TSMC which is what the thread at Neogaf says in the first page.

The die is exactly 11.88 x 12.33mm (146.48mm²). It's manufactured at 40nm, apparently on an "advanced CMOS process at TSMC".
 
Hey folks.

Just a correction: It is not confirmed 40nm TSMC. Jim from Chipworks contacted me after and said that it was just his guys' assessment. It could very well be Renesas in house fab. Of course, that means it could also be 55nm, but I still think the eDRAM density makes a good case against that.

Cheers!
 
Hey folks.

Just a correction: It is not confirmed 40nm TSMC. Jim from Chipworks contacted me after and said that it was just his guys' assessment. It could very well be Renesas in house fab. Of course, that means it could also be 55nm, but I still think the eDRAM density makes a good case against that.

Cheers!
How large is the large eDRAM pool in mm^2?

How do I edit/delete my post? That last post was pointless.

Anyways I got my answer for the eDRAM die area. ~40mm^2 for 32MB of eDRAM, which means 6.4Mb per mm^2. Pretty much on point with 55nm density and not 40nm if that is the case. IBM's eDRAM at 45nm is 11Mb/mm^2.
 
How large is the large eDRAM pool in mm^2?
The whole EDRAM block takes 38.68mm².

The memory banks alone (incl. sense amps) occupy 31.20mm², that makes 7.47mm² in logic overhead (I/O buffers, decoders, wiring spacing, etc.) -- almost 20%.
 
A) The 11Mb/mm^2 figure is for 32nm.
B) eDRAM density can be different between foundries.
C) TSMC's 40nm eDRAM 1Mbit array figure seems to fit rather well (0.145mm^2/Mbit)
Well if it is not made by TSMC like suggested, 55nm does solve all the questions regarding the areas of the blocks. Sorry for the misquote of 45nm, I just skimmed the article. I know the density could be different between foundries but I am unaware of how much the difference is.

Anyone have insights to if the chip can be 55nm at Renesas?
 
I still think the eDRAM density makes a good case against that.

At least if we just look at the array blocks (none of the overhead circuitry or massive spacing between the arrays, just one group of yellow rectangles), the 32MB partition is composed of 16*8*256kB arrays, and 1Mbit is about 0.102mm^2 (pixel area/image area * 146.48mm^2). The smaller partition above is 2MB, yes? So, 16*128kB arrays, about 0.126mm^2 per Mbit.

Anyways, the density is in that sort of ballpark, of course lower if you just take the whole partition area including the I/O & misc circuitry.

---

Haven't checked Renesas eDRAM densities lately. Someone?


---

Still bizarre with the Brazos shader block being ~0.90mm^2 and the one on WiiU is ~1.46mm^2. :s The latter is closer to the 1.62mm^2 blocks in rv770.

:???:
 
Last edited by a moderator:
How do I edit/delete my post? That last post was pointless.

Anyways I got my answer for the eDRAM die area. ~40mm^2 for 32MB of eDRAM, which means 6.4Mb per mm^2. Pretty much on point with 55nm density and not 40nm if that is the case. IBM's eDRAM at 45nm is 11Mb/mm^2.

Fer Chrissakes when you post something in contradiction of what has been posted already, it makes sense to read. "The overall density for the 32nm eDRAM arrays was not disclosed but should be >11Mbit/mm2 density, based on a previous paper at VLSI Symposium".

Note - 32nm. (And "was not disclosed" and "should be", which IMO is dubious since the cell sizes are from IBM themselves and show a more reasonable scaling, quoted from the same article: "The eDRAM cells shrank from 0.0672um2 in 45nm down to 0.0394um2.", so just why 32nm should have almost three times the density of the 45nm node, when the fundamental cell size of 45 nm is only 0.0672/0.0394 = 1.70 times larger is a bit of a mystery. I'd prefer to hear from IBM themselves, and for shipping silicon.)
 
Last edited by a moderator:
At least if we just look at the array blocks (none of the overhead circuitry or massive spacing between the arrays, just one group of yellow rectangles), the 32MB partition is composed of 16*8*256kB arrays, and 1Mbit is about 0.102mm^2 (pixel area/image area * 146.48mm^2). The smaller partition above is 2MB, yes? So, 16*128kB arrays, about 0.126mm^2 per Mbit.

Anyways, the density is in that sort of ballpark, of course lower if you just take the whole partition area including the I/O & misc circuitry.

---

Haven't checked Renesas eDRAM densities lately. Someone?


---

Still bizarre with the Brazos shader block being ~0.90mm^2 and the one on WiiU is ~1.46mm^2. :s The latter is closer to the 1.62mm^2 blocks in rv770.

:???:

Oh boy...that should teach me to just parrot others' assessments without checking myself. That area for eDRAM density seems to actually be much more in line w/ Renesas' 55nm process. Cell size is apparently .12um^2 (40nm is .06um^2).

If this is true, I do not want to be the one to relay the implications!
 
My measurements tell me it's more like 1.89mm². :???:

hm... 4 shader blocks, 20 each, hence the 80 for bobcat. Each block is ~164x274 pixels out of the 2051x1821 image, and the die area is ~75mm^2, which gives me 0.90mm^2.

no? :oops::(

Cell size is apparently .12um^2 (40nm is .06um^2).

No array density? There's still overhead in between when constructing the arrays. For example, TSMC 40nm eDRAM cell size is ~0.0583um^2 (right in the same ballpark of your 40nm figure), whereas the macro arrays (1Mb) in the literature are 0.145mm^2. There's nearly 2.4x bloat in this case (you'd expect something closer to 0.0611mm^2 with 1024x1024 bits), though I've seen as low as 2x bloat depending on who's manufacturing & what performance targets etc.

The above array sizes I calculated there were for a group of yellow tiles, not just based on a single one btw, so there's still spacing in between those tiles. Hope that makes sense. :oops:
 
Last edited by a moderator:
Status
Not open for further replies.
Back
Top