Wii U hardware discussion and investigation *rename

Status
Not open for further replies.
In this R770 die shot a row of 80 shaders (arranged to the right of the TMUs in a 4 x 20 fashion) takes up ~13 mm^2:

http://techreport.com/r.x/radeon-hd-4870/die-shot.jpg
(Got link from AIStrong post on the GAF).

In the Wii U die shot a similar row of four physical blocks of shaders takes up ~6 mm ^2.

RV770 was on 55nm, and Wii U is almost certainly on 40nm (going by the edram). This appears to show perfect (or perhaps slightly better than) scaling from 55nm to 40nm. I *think* this means that it is safe to say that the Wii U has 2 rows of 4 x 20 shaders.

So, in summary, I think:
- 40 nm
- 32 MB edram
- 16 TMUs
- 160 shaders
- 8 ROPs
You should it compare with more recent versions of AMD GPUs. They changed the layout quite a bit (one has now only two groups per full size SIMD, each containing 40 [VLIW5] or 32 SPs [VLIW4]) as AlStrong already mentioned (but he deleted his post afterwards for whatever reason [edit]Ah, I just see he posted an expanded version[/edit]). Just compare with Llano and Trinity. Or with the Brazos die shot (40nm!). The Wii U resembles much more the newer layouts than the old 55nm RV770. Both, visual comparison with newer versions and scaling the RV770 size result in the same. Full size SIMDs are very likely (i.e. 320 Sps in total).

If you don't think that fits, you have to realize that the SPs are actually quite small compared to the total size of the chip. As mentioned, the SP portion of a half size SIMD (40 SPs) of Brazos measures a measly ~1.8mm² or something. In the Wii U GPU it's ~3mm² for a full SIMD (or 1.5mm² for a half). And the SP portion of a full size SIMD in 55nm measured about 6.4 mm² (RV770 actually supports DP, which the Wii U certainly lacks). A full node shrink (as from 55 to 40 nm) should bring this down to 3.x mm². Combine this with the very low clock target (enables a slightly denser layout) and one is perfectly fine.
 
You should it compare with more recent versions of AMD GPUs. They changed the layout quite a bit (one has now only two groups per SIMD, each containing 40 [VLIW5] or 32 SPs [VLIW4]) as AlStrong already mentioned (but he deleted his post afterwards for whatever reason). Just compare with Llano and Trinity.

Went to pull some die shot pixel numbers. :p But yes, the new blocks in Llano and Trinity are much more packed than rv770 shader blocks. :)
 
hm... I'm getting ~1.6-1.7mm^2 per shader block on rv770... and 1.48mm^2 in WiiU. I think you're off by 2x...

*rv770 die shot is 600x589 pixels, shader block is ~ 62x38 -> 1.7mm^2/256mm^2

WiiU GPU is 3000x3098, shader block is 290x320 -> 1.49mm^2/150mm^2

It seems like it goes beyond ideal scaling, but they can probably get a bit denser with 40 shaders per block (not unlike Llano).

I think you might be right, I measured 8 R770 shader blocks for greater accuracy then divided by ... four .... :(

So my R770 figure is indeed wrong by 2x.
 
You should it compare with more recent versions of AMD GPUs. They changed the layout quite a bit (one has now only two groups per full size SIMD, each containing 40 [VLIW5] or 32 SPs [VLIW4]) as AlStrong already mentioned (but he deleted his post afterwards for whatever reason [edit]Ah, I just see he posted an expanded version[/edit]). Just compare with Llano and Trinity. Or with the Brazos die shot (40nm!). The Wii U resembles much more the newer layouts than the old 55nm RV770. Both, visual comparison with newer versions and scaling the RV770 size result in the same. Full size SIMDs are very likely (i.e. 320 Sps in total).

If you don't think that fits, you have to realize that the SPs are actually quite small compared to the total size of the chip. As mentioned, the SP portion of a half size SIMD (40 SPs) of Brazos measures a measly ~1.8mm² or something. In the Wii U GPU it's ~3mm² for a full SIMD (or 1.5mm² for a half). And the SP portion of a full size SIMD in 55nm measured about 6.4 mm² (RV770 actually supports DP, which the Wii U certainly lacks). A full node shrink (as from 55 to 40 nm) should bring this down to 3.x mm². Combine this with the very low clock target (enables a slightly denser layout) and one is perfectly fine.

Thanks, I didn't think to look at Brazos. I just went to the most similar architecture to the rumoured Wii U chip, which looks like it was a mistake.
 
http://i.imgur.com/OoCdhKS.jpg

All the repeating macro-structures I've spotted.

B-group is obviously the SIMD multiprocessor array. The F-group is most probably the two ROP partitions (2x4 operators).

Stuff below E looks like two instantiations of some structure as well (mirrored).

It's fun watching the NeoGAF thread.. a lot of people seem to be clinging to this idea that there's all this fixed function graphics magic to make up for the relatively small amount of die area that's dedicated to shaders. What they're forgetting is that this isn't just a GPU, but it's an entire SoC sans CPU. So for instance it needs various peripheral interfaces (SD, USB, NAND) and may include fixed video decoders as well.
 
Last edited by a moderator:
I'd just like to say "well done" Fourth Storm. You have done most excellently in pursuing this! I don't post at NeoGaf but I've been lurking in the Wii U thread and I think you deserve a round of handshakes and cheers for doing a service to gaming. :)

Most glad to be of service. Chipworks are the real heroes here, however, as they've provided a $2500 photo free of charge. I've taken a borderline unhealthy interest in finding out what's under the Wii U's hood, even knowing quite well I may not like what I find. This is certainly an excellent payoff.

http://i.imgur.com/OoCdhKS.jpg

All the repeating macro-structures I've spotted.

B-group is obviously the SIMD multiprocessor array. The F-group is most probably the two ROP partitions (2x4 operators).

Take a look at the block directly to the left of your first "A." It appears that it might be another TMU - only slightly shifted.
 
Yes, thanks Fourth Storm (and the others). ;)

I think you might be right, I measured 8 R770 shader blocks for greater accuracy then divided by ... four .... :(

So my R770 figure is indeed wrong by 2x.

Just as long as it makes sense. :)
 
Nope, it's a single block with mirrored SRAM bank layout inside.

How can you tell the difference between those two things? Is there really one?

Maybe I'm too color blind to tell but I can't get anything out of the areas between the SRAM arrays..
 
... and may include fixed video decoders as well.

It most certainly does, in addition the Iwata asks when they talked about the gamepad mentions that the GPU actually includes encoders for compressing for streaming over the miranet for reduced latency.
 
Stuff below E looks like two instantiations of some structure as well (mirrored).

It's fun watching the NeoGAF thread.. a lot of people seem to be clinging to this idea that there's all this fixed function graphics magic to make up for the relatively small amount of die area that's dedicated to shaders. What they're forgetting is that this isn't just a GPU, but it's an entire SoC sans CPU. So for instance it needs various peripheral interfaces (SD, USB, NAND) and may include fixed video decoders as well.

A lot of people? I've seen a lot people ridiculing these multitudes of people "hoping for magic Nintendo special sauce" etc....but not seen many actually claiming that. A lot of solid discussion happening actually :)
 
wiiugpu_chipworksinfo.jpg


Thanks to randy from chipswork
 
Take a look at the block directly to the left of your first "A." It appears that it might be another TMU - only slightly shifted.
Thanks!

Updated: http://i.imgur.com/DFr0AUM.jpg
Exophase said:
How can you tell the difference between those two things? Is there really one?

Maybe I'm too color blind to tell but I can't get anything out of the areas between the SRAM arrays..
Mostly SRAM bank layout - count, structure, relative positions, pattern matching & etc.
 
At the risk of going insane and having my eyes fall out, Llano has 64 of what I think are sram "cells" in each shader block. The Wii U seems to have 32.

I have no idea what this means.
 
A lot of people? I've seen a lot people ridiculing these multitudes of people "hoping for magic Nintendo special sauce" etc....but not seen many actually claiming that. A lot of solid discussion happening actually :)

This is the first time I've even read the forum so my opinions are all formed from today alone.. take that as you will. I won't bother to post it, but I can count about 11 posts in the die shot thread alone supporting a GPU advantage due to fixed function hardware (and mainly if not entirely from different posters). I'm not saying a majority of the forum is doing it but it's a lot of people.

If we're going to be talking about this potential I feel it's worth asking - what added fixed function hardware can help a remotely modern GPU design? I figure anything in the critical path for shaders is going to be a problem to go out of the shader array for, or will need additional hardware/software solutions to decouple it like TMUs are.

fellix said:
Mostly SRAM bank layout - count, structure, relative positions, pattern matching & etc.

Okay, so if you're only using SRAM like I am are you saying that it doesn't count as a copy of a block if one is mirrored vs the other? Because I've seen several die shots with complex blocks that are clearly duplicated (for instance, multiple CPU or GPU cores) but mirrored relative to each other.
 
So the new chipworks annotations say that the big eDRAM array is slower and less dense.. is that really how this works? I figured you'd be trading density for speed, not getting both. Why use the less dense version then? Lower power consumption?

If the top eDRAM is used primarily for Wii BC it shouldn't need to be nearly as fast. I wonder if the annotation accidentally switched this.
 
At the risk of going insane and having my eyes fall out, Llano has 64 of what I think are sram "cells" in each shader block. The Wii U seems to have 32.

I have no idea what this means.
Llano's IGP "splices" two 4-way SIMDs into one 8-way block, the same does Trinity. That's why the register bank count is doubled.
 
Status
Not open for further replies.
Back
Top