Wii U hardware discussion and investigation *rename

Status
Not open for further replies.
I think I can speak for everyone here, on this: I think we all want a much deeper insight into the inner-workings of the AMD GPGPU, obviously. Insight that is certainly not going to come from Nintendo.
 
Many reasons, I think. Perhaps they're holding back on their killer 1st/2nd party titles so that they have some ammo to face the coming Xbox3/PS4 onslaught.

I don't really buy this, but there is an element of truth in so far as the second XMas when they are not supply constrained will be more important to them. They ought to sell out of pretty much anything they make through the launch XMas.

Having said that if MS/Sony are launching in 2013 with much better hardware, then they aren't going to be winning that comparison on graphics, so I'd imagine the bulk of what they'll hold back for that window are the 1st party franchises.
 
People on GAF are saying that the WiiU's power draw is rated as 75W peak, 45W typical. If that's the case, the GPU can't be that powerful.

If the GPU gets half "the power" and we're looking at a 40nm part, then it's probably around a 300 gigaflop GPU. Typical ATI 40nm GPUS, had a Gigaflops/W ranges of 12 (4770) to ~16 (5970).

Unless this thing got upgraded to a 28nm GPU, at least in terms of RAW power, it's not a whole lot more powerful then current gen consoles.

I'm sure thanks to more modern architectures in both GPU and CPU, they're getting much more efficient performance, but there's only so much you can do in a "typical 45W" power envelope.

Yep. Actually we discussed it on B3D and the Wii U power supply specs were found by a GAF member way back in JUne who attended a Wii U event. It's a 75 watt max PSU. http://www.neogaf.com/forum/showpost.php?p=39306355&postcount=4531

So that means "real" max of the console is going to be that 45 watts. Just like other consoles actual draw falls well short of their PSU specs.

At the end of the day it's that spec that really crimps the power potential.

The e6760 bg has thought was the GPU only draws 35 watts TDP (480 shaders). That said it's a straight DX11 GPU so I dont think that's it. I'd still put money on a downclocked RV730. Which draws 70 watts TDP for the higher end model HD4670 and 55 watts for the lower end HD4650 (320 shaders). Cut the clocks of that lower end part as needed? Another possible hint in that direction is the e6760 takes GDDR5 so Nintendo would have had to have AMD rework the bus for DDR3, which is possible but would take $. The Rv730 already works with DDR3. It seems the e6760 may have been in one version of the dev kit begging the question why? But perhaps they were getting their SOC featuring the final rv730+CPU on one chip ready or anything like that?

The 4650 is clocked at 600 mhz, so there would be your start point for downclocking. I could guess perhaps 400 mhz? On the negative side, it's not an enormously more grunty GPU than Xenos/RSX at that point. Which depending on your view of Wii U could be positive evidence.

Overall I have to say the Wii U is coming out a bit more grunty looking than I thought. I think the alleged weak CPU can be worked around. The revelation of 2GB of RAM was obviously a big plus (if we assume Nintendo will eventually be able to use more for games). 32MB of EDRAM is obviously a hefty amount well above Xbox 360. IF, huge if, the GPU actually comes in at 500+gflops it would almost be that nice mid-gen machine I have so doubted, minus perhaps the unfortunate CPU rumors.

All that will still get it nowhere imo LOL (the same as I think Wii being twice as powerful as it actually was would have made no material difference as it still would have lagged the HD twins hugely) , and I'd rather the whole screen controller thing just didn't exist, but I digress.
 
The thing is, even with this speculation and maybe having a halfway decent GPU the Wii U is an utter disappointment as far as hardware is concerned. The fact that first gen games are not stomping the floor with PS360 is pathetic, sad, dismal, abysmal, and atrocious. Nintendo clearly does not mean to compete on a generational level. It's just sad in my eyes because I see Nintendo's relevance continually waning year and year even if the Wii was a huge, albeit temporary, spike a few years ago. To purposefully not release a powerful next gen console (at least a step up like the DC was compared to PSSSN64) is shameful and they deserve to get their butts handed to them in the marketplace. And I don't buy the tablet excuse either, if that thing is the reason the machine has so little oomph behind it then Nintendo failed in getting me to buy their machine.

Still need to know more about the CPU. What exactly is its architecture and what is it based off of? Is it really Power7 or is it really an enhanced Broadway? I've always felt IBM's PR with the whole "same technology in Watson" is ultimately referring to the edram in Power7 and not the core itself. Would a cut down tri core Power7 weaker than Xenon or CELL (CELL obviously because of the SPE's) in many situations? Would it not be faster in most situations? And the need to offload some tasks to the GPU is also somewhat alarming to me because it means the CPU is a little weak to do some tasks, or maybe the GPU is just better suited at it. I still expect PS4 and X720 CPU's to assist in some graphical tasks just because there will still be ways to make them useful.

32 MB of edram is splendid though, and should give Wii U a definite advantage in many tasks compared to PS360. And 1 GB of RAM dedicated to OS is entirely wasteful. I can understand the rumors of MS's 8 GB monster having maybe 2 - 3 gigs dedicated to the OS and the "application" processor/cores given another 6 GB to spare. But half of total memory dedicated to the OS, no thanks.
 
Given the 32MBs eDRAM is supposed to be from IBM, and given the tiny size of the Broadway cores, could we be looking at a single integrated chip with the CPU on the GPU? A discrete package for the CPU sounds fairly redundant to me.

I've been banging this drum since the IBM press release last E3, shouting into the wind about IBM's wording, the history of the players involved, the lack of GPU fab being touted while IBM were chest beating, the performance of the platform, the power envelope, the ... 4cm fan. Etc, etc.

160 ~240 shaders at 500 ~ 700 mHz on a SoC. It could still happen. It. Could. Still. It's the dream. Embrace everything that you are, Nintendo.

And a handful of us merrie men called it on the power consumption as soon as we saw the fan. Never deny the truth presented by the gloriously small Nintendo case fan.
 
160-240 shaders at 500~700MHz would most likely be unable to produce even XB360 details while rendering the 480p screen too, it's higher than that for sure, especially when taking into account the reports of several multiplatform titles being actually 1080p + 480p on Wii U
 
Going somewhat off-topic:

I did want to say this. No matter what the U's final specs are, I will get one for myself. I think the features of the OS and the controller are enough to warrant a purchase within the next 6 months. I'm intrigued by the gameplay possibilities, and I want one *just* so I can play Tank! Tank! Tank!, a follow-up to one of my favorite games of all time, Tokyo Wars. I can look beyond the specs. The amount of RAM alone means the OS could be extremely interesting, to me anyway. I've always been someone to enjoy machines other than PCs for web browsing. I bought a Dreamcast well before the U.S. launch on 9.99.99 (although well after the Japanese launch in late 98).

I took another look at Wii U's Metroid Blast, part of Nintendo Land, and boy, it does look like a lot of fun. Key word there, fun. With today's many revelations, I've nearly forgotten about the E3 2011 real-time demos. They almost seem like a distant memory. Don't they? I can only hope & pray that Nintendo and AMD haven't gutted the U's visual performance down from what we saw in the Bird & Zelda demos. That would be completely outrageous if they have...


Alright, now back on-topic:

Anyone care to guess how many render back-ends/rops the U's GPU has: 4? 8? Also, can someone tell me how many transistors 32 MB edram takes up?
 
Last edited by a moderator:
Going somewhat off-topic:

I did want to say this. No matter what the U's final specs are, I will get one for myself. I think the features of the OS and the controller are enough to warrant a purchase within the next 6 months. I'm intrigued by the gameplay possibilities, and I want one *just* so I can play Tank! Tank! Tank!, a follow-up to one of my favorite games of all time, Tokyo Wars. I can look beyond the specs. The amount of RAM alone means the OS could be extremely interesting, to me anyway. I've always been someone to enjoy machines other than PCs for web browsing. I bought a Dreamcast well before the U.S. launch on 9.99.99 (although well after the Japanese launch in late 98).

I took another look at Wii U's Metroid Blast, part of Nintendo Land, and boy, it does look like a lot of fun. Key word there, fun. With today's many revelations, I've nearly forgotten about the E3 2011 real-time demos. They almost seem like a distant memory. Don't they? I can only hope & pray that Nintendo and AMD haven't gutted the U's visual performance down from what we saw in the Bird & Zelda demos. That would be completely outrageous if they have...


Alright, now back on-topic:

Anyone care to guess how many render back-ends/rops the U's GPU has: 4? 8? Also, can someone tell me how many transistors 32 MB edram takes up?

It's most likely 8 back ends, since the popular assumption is that the GPU is some form of RV730.
 
Also, can someone tell me how many transistors 32 MB edram takes up?
Since DRAM uses 1 tranny per bit that'd be 8*32M ~ 256M. However there's undoubtedly some overhead there, buffers and various kinds of other logic associated with the DRAM arrays raising that sum somewhat. That section should be fairly inconsequential though on the whole compared to the eDRAM itself.
 
Since DRAM uses 1 tranny per bit that'd be 8*32M ~ 256M. However there's undoubtedly some overhead there, buffers and various kinds of other logic associated with the DRAM arrays raising that sum somewhat. That section should be fairly inconsequential though on the whole compared to the eDRAM itself.

Ah thanks Grall!
 
So can we assume the 32 MB eDRAM's inclusion is for multi-monitor support more than anything? I wonder how fast the main memory is too. It would be a relief to find out it's something like GDDR5, or at least high speed DDR3 on a 128 bit bus. A 64 or 96 bit bus would make GDDR5 or XDR the only practical option, but I don't think you could have 2 GB of RAM on a 96 bit without some weird module config.

Also is the 32 MB of eDRAM likely used to "emulate" the environment of the 24 1T-SRAM and 3 MB of eDRAM found in the GC and Wii?
 
Last edited by a moderator:
So can we assume the 32 MB eDRAM's inclusion is for multi-monitor support more than anything? I wonder how fast the main memory is too. It would be a relief to find out it's something like GDDR5, or at least high speed DDR3 on a 128 bit bus. A 64 or 96 bit bus would make GDDR5 or XDR the only practical option, but I don't think you could have 2 GB of RAM on a 96 bit without some weird module config.

Also is the 32 MB of eDRAM likely used to "emulate" the environment of the 24 1T-SRAM and 3 MB of eDRAM found in the GC and Wii?
I wonder about it, the last rumors stated mem1 32MB, mem2 1GB.
So it could very well be that the edram is accessible for both the CPU and the GPU.

The problem is the which kind of interface links the CPU the GPU and the EDRAM? Depending on that we could try to guesstimate how much bandwidth the GPU has to play with.
No matter the size of the EDRAM it could be less than in the 360 where the ROPS are tighly integrating into the edram. I could explain why AA has not be spotted so far in pre-released shots.
If the GPU were connected to that much edram through a wide IO there would be no reasons to pass on AA. If shading power is not a bottleneck for forward render @ 1080p x2 AA should be free, same is true for 720p x4 AA.

I'm close to bet that the link between the edram/scratch pad memory and the CPU and gpu is not that fast. I don't know what is a average data for a low power crossbar but that may give us clue.

As a result I searched existing data... Lazy me :LOL:
I find out that that the PLB6 bus that link the powerpc 476 core to L2 or other things. The max speed of that bus is 800Mhz, their can be up to 8 segments/nodes on the bus so according to IBM documents that's up to 204.8 GB/s of aggregate bandwidth (see page 11 so 25.6 GB/s per segment).
Honestly I don't get the vocabulary they use, if somebody for the sake of pulling me from ignorance what IBM means by masters, slaves and devices in that context that would help :)

Till somebody explained us more precisely I will make the following assumption: the three cores already takes 3 segments. So we are left with 5 segments to play with and up to 128GB/s of bandwidth.
IBM docs states also this (too bad they don't let me do copy and paste... so I've to type it...):
The PLB6 bus ordering mechanics allows easy attachment to high performance PCI type devices
So I wonder if from the GPU pov there is no mem controller and VRAM, in fact the GPU could see the EDRAM and access through multiple "pci express like" links (and with way lower latencies) instead of its own mem controller (linked to the ROPs in off the shelves RV7xx).

For the class of GPU Nintendo seems to use, I would say in the redwood/Kurt ball park so I could see them use 8ROPs. I think that 3 links would be enough to make the most of such ROPs, that would be around 75GB/s of bandwidth, 4 would for sure (/overkill).

So from the bus pov I wonder If things could looks like this:
the bus is connected through six or seven point to 35MB of EDRAM.
One point connects to a 2MB slice (the main cores)
Two points to two 512KB slices
2, 3 or 4 points are connected to 2, 3 or 4 slices of x MB each (could be 16/16, 12/12/8 or x4 8).

So from the pov of the bus the GPU has 3 or 4 points of connections.

-----------------

That's just speculation but it gives an idea, A worse case scenario would be 2 point and the GPU has 51,6 GB/s worse of bandwidth to play with, best case scenario is 4 and bandwidth is ~100GB/s.
Even 51.6 GB/s would not be that bad, it's all depend on the number of ROps and how potent they are.
If the ROPs retains their caches, etc they may do plenty well with 51.6 of bandwidth but it would explain why AA is for now not there.

I could see a lot of the work done by AMD consisting in decoupling the ROPs (like in newer designs) from the memory controller and linking them to the pci express link interface(s) which would in fact connect to 2, 3 or 4 points on the bus).

It could be completely different than that and use a crossbar, I haven't search data for that. Insight welcome. It could also be completely different altogether (if what I describe doesn't make sense which is well possible).
I think though that it give a reference point about what kind of bandwidth a potent interconnect within a SoC.

Feel free to criticize and speculate people :)

-----------------------------------------

With regard to the main memory I would bet on single channel (/64bit bus) connection to the ddr3. 2 4Gb memory chips would linked to that bus. Bandwidth is constrained (you have to feed the Cpus and the texture through it) so Nintendo reserved as much a 1GB (so only one chips) for the OS as it's unlikely to steal much bandwidth from the games.
Nintendo may have ended with 1GB for the OS because it could have been the the cheapest option. 4gb memory chips are mass produced and cheap now, in the long run they might turn cheaper than lower grade memory chip. For the RAM I would bet on DDR3 1600.

EDIT

I also wondered if the GX2 name (supposedly the ISA for the GPU) may have something to do with what I describe (whether it's correct/sensical or not).
I wondered if it could be linked to the idea of doing the rendering through x2 Pci express like links.
So the GPU would have 2 points of connection to the bus and as such 51.6GB/s of bandwidth to play with. I could make sense from the GPU pov to has usually for the class of GPU Nintendo seems to use you have two memory controllers. I've no idea but may be internally it may help to keep things "alike".
 
Last edited by a moderator:
As I read that some was wondering about the size of 32MB of edram I made gross measurment on a blue gene /Q chip.
I found for the L2 ~118 mm^2 or 100mm^2 (they were border on the picture I used not sure it was part of the chip or not ).
So it an L2 and it should definitely be more complex that some scratch pad memory.
Another gross estimate would be to remove the parts that read "L2 0x or L2 1x" (the darker parts).
That's mostly 1/3 of the cache area ( I already remove the Xbar switch, in blue in the picture).
That's gave us around 70 and 80 mm^2 for the 32MB of EDRAM.

when it all say and done you still need logic but I would guesstimate 100 mm^2 is the maximum size, and it's likely to be less than that.

EDIT:
Great post liolio - I'm reading it over coffee as I pour over all of today's news :smile:
Thank you, not sure if it is deserved though but
why do you drink coffee past one in the morning? :LOL:

EDIT 2
If I go by my theory here how I would see the system specs:
1 SoC
Single channel (64 bit bus) to 2GB of DDR3 1600.

Digging further into the psu capacity and bobcat size (on a process pretty close)
I would say the three cores takes ~10 mm^2
all the EDRAM (32MB scratch pad + the L2 cache) ~90 mm^2
Redwood as implemented in llano takes ~90mm^2 of silicon (personal) (so that without the memory controller or any IO)

So if the WiiU as redwood class type of the GPU while they "stuffed" the memory controller and various other IO, glues, bus(es) or crossbar, etc you may have a chip above 200 mm^2.

I remember reading that IBM wnated the Cell below 190 mm^2 because of some production, lithography reason. I think Nintendo has no Ken Katuragi in its rank and they may have listen to IBM so I would bet on a chip tinier than 190 mm^2.
I think that llano mobile part are a good reference. The GPU runs at a base clock of 400MHz and turbo up to 600MHz. Those part are high bins.
If Nintendo had gone with such a GPU I would have expect a clock speed around in the +400 territory.

So I think Nintendo went for a lesser GPU clocked higher. I would hope on a bastard child of a Caicos and a Redwood, so it would be 2/3/4 SIMD and 8 ROPs clocked @800 MHz.
It could make the SoC design simplier if there if only to clock domain, 1.6GHz for the CPUs, 800Mhz for everything else (bus, cache, scratch pad memory, GPu).

Sadly looking further at llano die with the constrain of a tiny chip, also taking Nintendo extreme focus on embedded memory vs everything else. I would go with this for the GPU:
3 SIMD, 4 ROPs @ 800MHz and a lot of bandwidth (wrt to processing resources).

That may be doable to have the Soc tiny enough.
-------------
Final Specs
1 SoC
64 bit bus, 2 GB of DDR3 1600 (1GB for the OS)
3 CPus @ 1.6GHz
3MB of L2 @800MHz
3 SIMD @800MHz (360 FLOPS one will notice the strong resemblance with the 360 in Nintendo design choices)
4 ROPS @ 800MHz
32MB of EDRAM.
51.76 GB of bandwidth for the GPU (could be less if the CPU somehow read form the Edram too).
 
Last edited by a moderator:
Status
Not open for further replies.
Back
Top