Wii U hardware discussion and investigation *rename

Status
Not open for further replies.
I'd think the GPU most likely is not hitting the TDP or anything. It most likely has a TPD of ~30W and uses 15-20W or so during normal gaming. Only things like futuremark or GPU burn tests can actually hit the TPD.

Yes, I was using a 25w TDP, like the r700 embedded e4690, which IIRC is also 320sp but clocked at 600mhz. As you say, the gpu is not likely using its TDP rating, just like every other card on the wiki page people are getting performance numbers from. 7gflops per watt TDP is lower than you would expect from an embedded r700 design, especially since the one that does exist is ~16gflops per watt.
 
What number are you using? As already posted this whole gpu die has about 15 watts to work with. There is more than just a gpu core on this chip.

If everything else takes a watt that leaves you at 172 glfop @ 12.57 gflop per watt or 352 glfop @ 25.14 gflop per watt.

Looking at amd gpu at 40nm ex. Radeon HD 5550 [320:16:8 @ 550mhz ] 9.03 gflops per watt.
assuming the gpu has a TPD of 30W with normal operation power of 20W, you get about 6GFLOPs/W for 160 ALUs and 12GLOPs/W for 320 ALUs. All with TPD.

55nm 4550 has 3.8GFLOPs/W TPD
55nm 3870 has 4.7GFLOPs/W TPD
55nm 4870 has 7.5GFLOPs/W TPD

40nm 5450 has 5.4GFLOPs/W TPD
40nm 6450 has 8.9GFLOPs/W TPD
40nm 4670 has 8.1GFLOPs/W TPD
40nm 4770 has 12GFLOPs/W TPD

as you move down the performance scale, it looks like Perf/W goes down. Wii U looks like a 160ALU part with at 40nm with a few extra watts for the extra logic and ram based solely on the gpu performance per watt.
 
I have to wonder how well many of these multi-platform games are utilizing the FPU in Espresso.
Remember this is just food for thought
Don't quote me one this(please), but if what I had read once is correct then I would imagine developers wouldn't bother saving Floating points at all on the CPU.
So we know the chip has paired singles. The registers are 64 bit doubles, so when people talk about paired singles you assume you'll need to split the register in two.
No, the registers are actually 96 bits wide, a double and a single. To load it you have to load in your single, do a merge operation to move it into the upper 32 bits, then load your other one. This makes stacks explode since it takes 3 operations or 12 bytes to save a floating point register to the callee no matter what.
That's simply what I had heard though and I certainly won't say it's true.
 
Remember this is just food for thought
Don't quote me one this(please), but if what I had read once is correct then I would imagine developers wouldn't bother saving Floating points at all on the CPU.
So we know the chip has paired singles. The registers are 64 bit doubles, so when people talk about paired singles you assume you'll need to split the register in two.
No, the registers are actually 96 bits wide, a double and a single. To load it you have to load in your single, do a merge operation to move it into the upper 32 bits, then load your other one. This makes stacks explode since it takes 3 operations or 12 bytes to save a floating point register to the callee no matter what.
That's simply what I had heard though and I certainly won't say it's true.
A merge op is required only for swizzle/permute types of move. There's a normal* load paired-singles op.

* Actually, it's a very powerful op as it can quantize from integer formats.

ps: I'm blu on gaf.
 
assuming the gpu has a TPD of 30W with normal operation power of 20W, you get about 6GFLOPs/W for 160 ALUs and 12GLOPs/W for 320 ALUs. All with TPD.

55nm 4550 has 3.8GFLOPs/W TPD
55nm 3870 has 4.7GFLOPs/W TPD
55nm 4870 has 7.5GFLOPs/W TPD

40nm 5450 has 5.4GFLOPs/W TPD
40nm 6450 has 8.9GFLOPs/W TPD
40nm 4670 has 8.1GFLOPs/W TPD
40nm 4770 has 12GFLOPs/W TPD

as you move down the performance scale, it looks like Perf/W goes down. Wii U looks like a 160ALU part with at 40nm with a few extra watts for the extra logic and ram based solely on the gpu performance per watt.

55nm 4670 has 8.1GFLOPs/W TDP, is a 320SP part and is clocked at 750MHz, Gives it 480GFLOPs, is oddly enough codenamed "Mario"... All while being a desktop GPU.

The embedded part e4690 is 55nm as well. Does ~16GFLOPS/watt, 320SP and is clocked at 600MHz, gives 384GFLOPs, and has a TDP of 25watts... Also of course is codenamed "Mario"

Moving "Mario" down to 40nm should allow even the desktop part to see the performance we are getting here, this doesn't include that Latte is on a MCM with Espresso, shrinking power draw further.

For Wii U's GPU to have 160ALUs, it has a ridiculously high power draw given it is embedded on a MCM.

Just an estimation, but the HD4670 at 40nm, should draw ~40% less wattage (based on IBM's processor shrinks) this puts it just under 36watt TDP with a 750watt clock, Wii U's GPU is clocked at 550MHz and should have a TDP of ~25watts (this isn't what it draws, this is the design draw which is always higher), again this is just a way to compare what R700 would be like with 320sp at 40nm.
 
Last edited by a moderator:
Here is an interesting little test performed by user named Blu on Neogaf.

http://www.neogaf.com/forum/showpost.php?p=47593495&postcount=3295

It compares the Wii's Broadway CPU at 729Mhz to an AMD Bobcat core running at 1.33Ghz. Normalized for clockspeed, Broadway completed the test ~26% faster than the Bobcat core. For this particular workload, then, a Broadway @ 1.243Ghz would perform similarly to a 1.6GHz Bobcat. Use of paired singles can make quite the difference, and I have to wonder how well many of these multi-platform games are utilizing the FPU in Espresso.

That's very interesting, I would have expected them to perform very similarly per clock on matrix multiply.

I wonder how much access latency affected the results. Nintendo have gone big for low latency memory pools since the GC, where as Bobcat has to contend with DDR3 and a half speed level 2.

It's about 10 years since I used a command line compiler - I've only used easymode Visual Studio since - but I'd quite like to try running that benchmark on my old Athlon 64 with 64-bit SIMD and see how the old beast stacks up against Broadway and Bobcat.
 
So, the idea is that the gpu would be inefficient at 320sp, because of a slow CPU and slow ram, so it makes more sense for the gpu to be 160sp.

Actually the main idea is that it performs nothing like the 320 shader parts that we see out in the wild, and that while even a memory crippled 444 mhz Llano (same flops, lower tri/fill) can usually comfortably surpass the 360 despite all the PC factors, the Wii U can't.

The simplest explanation is that it doesn't actually have a HD 5550 level GPU in there. The more convoluted explanation is that a combination of factors are conspiring in different ways to make the every game support the idea that it doesn't actually have a HD 5550 level GPU in there.
 
So, the idea is that the gpu would be inefficient at 320sp, because of a slow CPU and slow ram, so it makes more sense for the gpu to be 160sp. Yes this seems much more efficient, a r700 part that performs 7gflops per watt... How again does this make sense? Or does 14gflops per watt make better sense, ports performing badly should never be evidence of something like this, some of which were done in only 6months. If you could do something like this then we could look to the HD remakes of ps2 games running on ps3 and 360 at 60+fps, at 1080p. The logic here is in the gutter.
If you don't go with the 160 SP idea, then there are two outcomes. 1) It's a 320 SP part completely gimp'd by the rest of the system. 2) Devs with considerable experience of AMD DX10 GPUs are unable to use those 320 SPs. The whole 'launch game' argument doesn't work any more because the basics of the architecture are no longer completely new, unlike trying to get PS2 games working on very different PS3 hardware. BLOPS and Batman etc. run on similar 320 SP architecture on PC with much better results. How can the same GPU achieve only half as much in Wii U other than the devs being pretty incompetant? Or Nintendo making things crazily difficult (perhaps not such a bad idea if we're to believe devs aren't even told how powerful the hardware is! :p)

The logic is not in the gutter. It's inconclusive, but sound. The comparisons are fair as they recognise difference ans similarities, where the old 'launch games don't count' ignores the changes that have happened with software in this industry.
 
If you don't go with the 160 SP idea, then there are two outcomes. 1) It's a 320 SP part completely gimp'd by the rest of the system. 2) Devs with considerable experience of AMD DX10 GPUs are unable to use those 320 SPs. The whole 'launch game' argument doesn't work any more because the basics of the architecture are no longer completely new, unlike trying to get PS2 games working on very different PS3 hardware. BLOPS and Batman etc. run on similar 320 SP architecture on PC with much better results. How can the same GPU achieve only half as much in Wii U other than the devs being pretty incompetant? Or Nintendo making things crazily difficult (perhaps not such a bad idea if we're to believe devs aren't even told how powerful the hardware is! :p)

The logic is not in the gutter. It's inconclusive, but sound. The comparisons are fair as they recognise difference ans similarities, where the old 'launch games don't count' ignores the changes that have happened with software in this industry.

Actually, taking the 360 game and trying to make it work on Wii U is what we've been hearing from developers, not the PC version of said game, working to 360's strengths and avoiding it's weaknesses is common sense, but to port it quickly over to Wii U (in some cases in as little as 6 months) We would naturally see these problems arise, just look at 360 games that play sloppy on a PC, many examples there GTA4 for instance... and look at COD BlkOPs on PC http://www.youtube.com/watch?v=t-3uLNa3e7s that is with HD 4670, but with a superior CPU and more ram, also while the GPU is 320SP, it is clocked at 800MHz. How exactly are ports evidence again?

Here is the real question, we know in E3 2011 that the bird demo was running on Wii Us that were unfinished so the GPU had to be downclocked to 400MHz, is the Bird demo which is displaying 2 different scenes @ and above 360 quality even possible with only 128GFLOPs? To me that is much more unlikely than Wii U housing a GPU with 320SPs.
 
Actually, taking the 360 game and trying to make it work on Wii U is what we've been hearing from developers, not the PC version of said game, working to 360's strengths and avoiding it's weaknesses is common sense, but to port it quickly over to Wii U (in some cases in as little as 6 months)

Taking the 360 version of a game and throwing it across to the PC on a card like the HD 5550 results in big performance increases. We've heard from developers on this very forum just how little hardware specific optimisation PC ports get, and yet still games run faster and/or at higher resolutions when you plug in a significantly faster GPU than the 360 has.

We would naturally see these problems arise, just look at 360 games that play sloppy on a PC, many examples there GTA4 for instance... and look at COD BlkOPs on PC http://www.youtube.com/watch?v=t-3uLNa3e7s that is with HD 4670, but with a superior CPU and more ram, also while the GPU is 320SP, it is clocked at 800MHz. How exactly are ports evidence again?

The BLOPS video you linked to is running at more than twice the resolution of the Xbox 360 version of the game and with higher settings, and it's not even the GDDR 5 version of the card. The stuttering is because the game is being captured in FRAPS.

Normal Gameplay: 40-52
Recording: 18-25


When it's not capturing it runs between 40 - 52 FPS which is around the same as Wii U BLOPs 2 but with higher minimum frame rate. In other words, this is massively exceeding what the 360 achieves with the game and massively exceeding what the Wii U achieves with BLOPs 2.

You don't even understand that you're posting something which further undermines the idea of a 320 shader Wii U.

Here is the real question, we know in E3 2011 that the bird demo was running on Wii Us that were unfinished so the GPU had to be downclocked to 400MHz, is the Bird demo which is displaying 2 different scenes @ and above 360 quality even possible with only 128GFLOPs? To me that is much more unlikely than Wii U housing a GPU with 320SPs.

You have no idea how the bird demo would have run on the 360, or of the demands of the demo. This is just fanboy hand waving.
 
Taking the 360 version of a game and throwing it across to the PC on a card like the HD 5550 results in big performance increases. We've heard from developers on this very forum just how little hardware specific optimisation PC ports get, and yet still games run faster and/or at higher resolutions when you plug in a significantly faster GPU than the 360 has.



The BLOPS video you linked to is running at more than twice the resolution of the Xbox 360 version of the game and with higher settings, and it's not even the GDDR 5 version of the card. The stuttering is because the game is being captured in FRAPS.

Normal Gameplay: 40-52
Recording: 18-25


When it's not capturing it runs between 40 - 52 FPS which is around the same as Wii U BLOPs 2 but with higher minimum frame rate. In other words, this is massively exceeding what the 360 achieves with the game and massively exceeding what the Wii U achieves with BLOPs 2.

You don't even understand that you're posting something which further undermines the idea of a 320 shader Wii U.



You have no idea how the bird demo would have run on the 360, or of the demands of the demo. This is just fanboy hand waving.

Right, it was my point that the GPU is 800GFLOPs, has 3times the ram available to it, and a much faster CPU. Your GDDR5 comment makes no sense btw, as 360 and Wii U lack this as well. I'm not even sure you understand how to properly run logic in your head. I am the one on neogaf who originally posted the 176GFLOPs, so I know how to be realistic, but you don't even know how to come to terms with the clusters being far too big for 20ALUs. The more dense and low clocks are about the only way this makes any sense.

I am starting to think you are another poster on Neogaf, just come here to sound like you are more technically apt, not sure it is working though as your inability to read posts properly and notice that that PC has much more power than the Wii U, would give you some pause before stating that it hurts my point. That build is clearly struggling with the game with over twice the power we are assuming Wii U to have, well 4times the power in your case. Even given the resolution change, you can't make up that much ground with only 352GFLOPs and a much slower CPU.

The reality is, the only thing that points to 160SPs right now and completely dismiss 320SPs is a group of fanatics that clearly have never ported games to different platforms and would like nothing more than Wii U to lack a basic ability to out perform last generation consoles. I've left you to your ridiculous theories of R600 integration and 160SP, from developers comments we have been told that it is R700, Matt from NeoGAF clarified it to me directly in the latte thread, and quite a number of developers have said the GPU is about 50% better, well... considering 320 is 50% more SPs than 240, and 160 is 30% in the opposite direction, I think it is premature to say Wii U is 160SPs and 320SPs is out of the question.
 
Right, it was my point that the GPU is 800GFLOPs, has 3times the ram available to it, and a much faster CPU. Your GDDR5 comment makes no sense btw, as 360 and Wii U lack this as well. I'm not even sure you understand how to properly run logic in your head. I am the one on neogaf who originally posted the 176GFLOPs, so I know how to be realistic, but you don't even know how to come to terms with the clusters being far too big for 20ALUs. The more dense and low clocks are about the only way this makes any sense.
...

Friendly advice before the mods get involved. You really need to tone down your posting style a couple of notches otherwise I doubt whether your stay here will be that long.
 
I am starting to think you are another poster on Neogaf, just come here to sound like you are more technically apt...
You joined July last year to talk about Wii U and that's been your only contribution to the board. Function has been here since 2003 as a valued, intelligent contributor. What does that tell us about your ability to find an interpret facts? :p

Function is very capable to talk sensibly and logically. Engage in a proper argument using comparable data (numbers can be used in different ways, so let's use them to show other interpretations rather than complaining about others seeing things differently). Certainly do not take a high-horse position when you don't have the reputation to support that.
 
Friendly advice before the mods get involved. You really need to tone down your posting style a couple of notches otherwise I doubt whether your stay here will be that long.

If one's argument can be dismissed by simply saying he is a fanboy, I don't know if I'd continue to post there. However, I should apologize to Function for not understanding how logic works in his head, it's not his fault that his string of logic seems foreign to mine. I guess I was expecting his argument to be laid out, rather than shot out onto the page.
 
Last edited by a moderator:
I guess I was expecting his argument to be laid out, rather than shot out onto the page.
People can't spend all day crafting well considered documents for discussion. That's where an idea can be challenged with a proper question or counter point. The post above yours discussing the hardware photos are showing exactly that, with everyone involved willing to reconsider their in-flux opinion of what Wii U is. It's interesting too to reflect on one's previous ideas. Your post history shows you expected >50 GBs RAM BW, didn't discount GDDR5, expected a DX11 card due to a DOF effect, etc. Some of your ideas have been proven wrong. Anyone can be wrong, and being wrong isn't a crime. Although blind arguments as many console apologists do are a destructive to the conversation and so do get lambasted, but otherwise the population here (those that don't get booted out) can be trusted on the whole to consider sensibly and without prejudice.
 
If you don't go with the 160 SP idea, then there are two outcomes. 1) It's a 320 SP part completely gimp'd by the rest of the system. 2) Devs with considerable experience of AMD DX10 GPUs are unable to use those 320 SPs. The whole 'launch game' argument doesn't work any more because the basics of the architecture are no longer completely new, unlike trying to get PS2 games working on very different PS3 hardware. BLOPS and Batman etc. run on similar 320 SP architecture on PC with much better results. How can the same GPU achieve only half as much in Wii U other than the devs being pretty incompetant? Or Nintendo making things crazily difficult (perhaps not such a bad idea if we're to believe devs aren't even told how powerful the hardware is! :p)

The logic is not in the gutter. It's inconclusive, but sound. The comparisons are fair as they recognise difference ans similarities, where the old 'launch games don't count' ignores the changes that have happened with software in this industry.
This is completely unscientific, but strictly looking at Razor's Edge, the reported slowdowns seem to have nothing to do with the GPU. While they occur only in certain situations, they don't consistently manifest. They're very much random. Digital Foundry mentioned heavy slowdown in a miniboss fight against several gunships for example. I've played that chapter at least five times, and the slowdown only happened on one playthrough. So maybe the slowdowns in BLOPS and Batman are caused by something else.
 
You joined July last year to talk about Wii U and that's been your only contribution to the board. Function has been here since 2003 as a valued, intelligent contributor. What does that tell us about your ability to find an interpret facts? :p

Function is very capable to talk sensibly and logically. Engage in a proper argument using comparable data (numbers can be used in different ways, so let's use them to show other interpretations rather than complaining about others seeing things differently). Certainly do not take a high-horse position when you don't have the reputation to support that.
Nothing at all and to your second paragraph. Fair enough, I didn't pay attention to his join date, and NeoGaf members are plenty intelligent. The Wii U tech discussion there is at least comparable to this one, so where did he take those numbers I gave him? he looked at what supported his view and disregarded the rest. He paid no mind that the PC I pointed out was running with 800GFLOPs, a much higher clocked CPU and much more RAM, yet can only run a higher resolution than the 360 version and struggles to climb to into the 50s for FPS? yet somehow my point that PC ports are not a good measurement of power was dismissed because I question whether the bird demo displaying 2 scenes could even run on a 128GFLOPs machine.

That is the post you are defending.
People can't spend all day crafting well considered documents for discussion. That's where an idea can be challenged with a proper question or counter point. The post above yours discussing the hardware photos are showing exactly that, with everyone involved willing to reconsider their in-flux opinion of what Wii U is. It's interesting too to reflect on one's previous ideas. Your post history shows you expected >50 GBs RAM BW, didn't discount GDDR5, expected a DX11 card due to a DOF effect, etc. Some of your ideas have been proven wrong. Anyone can be wrong, and being wrong isn't a crime. Although blind arguments as many console apologists do are a destructive to the conversation and so do get lambasted, but otherwise the population here (those that don't get booted out) can be trusted on the whole to consider sensibly and without prejudice.
While I am not an engineer, I do understand what I am talking about, bottlenecks happen from many different places, for various reasons. Early ports built from 360 games show that the Wii U isn't a 360, and that it has different weaknesses and strengths, little else can be said. Especially about rushed ports, but if you want to continue that line of reasoning, I don't think I have the ability to explain why you are wrong.
 
These W/GFLOP estimates aren't worth an awful lot. It's not like FLOPs are the only thing the chip is doing or like the rest of the work scales perfectly with the FLOP load, nor do we even know how well games are utilizing it (the ISA isn't exactly easy to get very high utilization out of).

The 160SP idea seems bizarre (in a "why would they do that" kind of way) but I'm inclined to agree with function here, the performance we're seeing fits it much better.. you can find HD 6450 (160SP part, although higher core clock than Wii U) results that seem competitive with XBox 360 too, although I haven't looked that deeply into it..

function said:
That's very interesting, I would have expected them to perform very similarly per clock on matrix multiply.

I wonder how much access latency affected the results. Nintendo have gone big for low latency memory pools since the GC, where as Bobcat has to contend with DDR3 and a half speed level 2.

It's about 10 years since I used a command line compiler - I've only used easymode Visual Studio since - but I'd quite like to try running that benchmark on my old Athlon 64 with 64-bit SIMD and see how the old beast stacks up against Broadway and Bobcat.

A good matrix multiply kernel will push down main memory access overhead asymptotically towards zero because it grows at n^2 while FMADDs grow at n^3. It's a good test for when you want to try to show off something near peak FLOP performance.

I'd like to see generated assembly for both cases (I saw blu offered, yes I'm interested!) but it's not hard to imagine how Broadway could get better IPC than Bobcat here. It has the advantage of FMADDs, three-way addressing, and more registers (well, more register addressing flexibility anyway, 32 2x32-bit registers vs 16 4x32 - this is assuming x86-64 was used). While the peak FLOPs are similar the FMADDs free up a dual-issue opportunity for loads, stores, and flow control stuff, plus it can issue branches outside the normal dual-issue. x86 can soak up some of that back with load + op but the compiler might be afraid to do that since IIRC it requires alignment for SSE.

This doesn't mean that it applies to general purpose code though, although it might somewhat transfer to some other FP heavy stuff.

Remember this is just food for thought
Don't quote me one this(please), but if what I had read once is correct then I would imagine developers wouldn't bother saving Floating points at all on the CPU.
So we know the chip has paired singles. The registers are 64 bit doubles, so when people talk about paired singles you assume you'll need to split the register in two.
No, the registers are actually 96 bits wide, a double and a single. To load it you have to load in your single, do a merge operation to move it into the upper 32 bits, then load your other one. This makes stacks explode since it takes 3 operations or 12 bytes to save a floating point register to the callee no matter what.
That's simply what I had heard though and I certainly won't say it's true.

If this is really just a higher clocked Broadway then it's nothing like that, the user manual from IBM describes the CPU in great detail. And if they changed anything at all I can't fathom it'd be a move to some weird 96-bit format. So I don't think that developer really understood what he was talking about.
 
128 glfop? Where is that number coming from? The lowest talk about is 176 glfops.
The kits the first demos were running on reportedly had the GPU clocked at 400MHz. Therefore, both the Zelda and the Japanese Garden demo were running on a 128GFLOPS GPU - if the system really only has 160 ALUs.
 
Status
Not open for further replies.
Back
Top