Wii U hardware discussion and investigation *rename

Status
Not open for further replies.
It's not the process node, it's the numbers. 240 shaders would be 60 shaders per SIMD block which doesn't fit with the number of SRAM blocks per SIMD block.

4 SRAM blocks per VLIW5 vector unit is the only pattern we've seen with AMD PC GPUs - Brazos, Llano and all the dedicated cards AFAIK. I'm assuming this is the case with Wii U too - hence the 160 shaders. 240 shaders would be 2.67 SRAM banks per VLIW5 vector unit, which doesn't fit.
 
It's not the process node, it's the numbers. 240 shaders would be 60 shaders per SIMD block which doesn't fit with the number of SRAM blocks per SIMD block.

4 SRAM blocks per VLIW5 vector unit is the only pattern we've seen with AMD PC GPUs - Brazos, Llano and all the dedicated cards AFAIK. I'm assuming this is the case with Wii U too - hence the 160 shaders. 240 shaders would be 2.67 SRAM banks per VLIW5 vector unit, which doesn't fit.

I see. I'm still kinda thinking that you guys are overestimating how powerful 160 R700-based shaders are, though, but for now I'll accept 160 as the truth
 
To be honest we don't know the "truth" - in the true sense of the word. We only have opinions which are based on the evidence we have, and which are no doubt weighted in some way by our own personal biases.

The overwhelming evidence, IMO, says 160 or 320 shaders. And I think 160 is by far the more likely of the two. Entropy sees things differently (as he is perfectly entitled to do) and is confident of 320. Weigh things up as best you can then pick whichever seems most likely, but keep in mind that we can't be absolutely certain about these things. So enjoy the ride!
 
Seems unbelievable no developer has revealed the graphic details.

Most likely Nintendo absolutely does not reveal the key specs, even to hands on programmers. Thus they're left to guesstimate. A few probably have an idea, thus, but not too many, keeping leaks plugged.
 
If Jim Morrison at Chipworks makes the flat-out written statement: "This chip is fabricated in a 40 nm advanced CMOS process at TSMC.", well, that is pretty much as factual as it gets. This is what they do in the industry. He knows. If he didn't, he wouldn't make that statement about one of their jobs, nor has he retracted or qualified it afterwards but let it stand.

Except his flat out written statement is only partially correct. It is not fabbed at TSMC, It is fabbed at Renesas. He can't tell TSMC vs Renesas from the work they did, but he can tell 40nm vs 55nm.

Here is a source for it being fabbed at Renesas:
http://www.eetasia.com/ART_8800678216_499489_NT_f90242a2.HTM


If you don't have a login you can get to that same article from google without a login via:
http://www.google.com/url?sa=t&rct=...3DfFwTTWt5pAKSA&bvm=bv.48293060,d.cGE&cad=rja
 
Seems unbelievable no developer has revealed the graphic details.

Most likely Nintendo absolutely does not reveal the key specs, even to hands on programmers. Thus they're left to guesstimate. A few probably have an idea, thus, but not too many, keeping leaks plugged.

Now why would internal Nintendo developers leak the specs of the WiiU? Lets face it, that's pretty much the only folks developing for the platform, Nintendo themselves.
 
I don't know. I just really can't see 160SPs matching current-gen, especially if some form of GPGPU is being used to make up for the pathetic CPU.

The same guy who gave us the CPU specs tell us that it is on par with xcpu in performance, probably able to cope with more flexible code too. And you even get a bit more of performance from what is offload to the DSP.

Still 160 SPUs would be about 170 gflops, that mean the Wii u best case scenario would be as powerfull as 60-70% Xenos, even if the real world performance of xenos is low, and ugpu is 100%, it is very hard to believe that any dev could make up for 30-40% of the others, that said may of the ports are even doing native 720p.


By "custom" I was thinking along the lines of "between 4xxx and 5xxx" with possibly some kind of GPU compute features added (like Xenos' memexport). But firmly rooted in the PC VLIW5 line of technology.

There is a lot of custom work, not only we cant find anything equal in other gpus (close but not equal), looking at this photo even the best guess keeps 10+ blocks that no one knows what are doing, the neogaf thread that I put above have a lot of guess.

wiiudie_blocks.jpg



Anyway one of the advantages is that it looks like it can cope with more kinds of code allowing for more diverge engines.

Anyway some coments of Wii U devs

On Wii U you can take many different approaches to tackle a problem.

We didn’t have such problems. The CPU and GPU are a good match. As said before, today’s hardware has bottlenecks with memory throughput when you don’t care about your coding style and data layout. This is true for any hardware and can’t be only cured by throwing more megahertz and cores on it. Fortunately Nintendo made very wise choices for cache layout, ram latency and ram size to work against these pitfalls. Also Nintendo took care that other components like the Wii U GamePad screen streaming, or the built-in camera don’t put a burden on the CPU or GPU.



DSPs in current hardware are mainly used to take tasks away from the CPU. As we take audio in our games very seriously we were happy to see that the DSP can handle all the tasks we throw at it. We use it for 3D audio, lowpass filtering and many other things.

We can’t be too specific on the Wii U hardware but you can’t compare anyway an OpenGl/DirectX driver version to the actual Wii U GPU. I can only assure that the Wii U GPU feature set allows to do many cool things that are not possible on any current console. The Wii U has enough of potential for the next years to create jaw-dropping visuals. Also remember the immense improvement we saw on the PS3 and XBOX360 over the years. I’m really excited to see what developers will show on the Wii U in the years to come.

http://www.notenoughshaders.com/2012/11/03/shinen-mega-interview-harnessing-the-wii-u-power/

The Wii U GPU is several generations ahead of the current gen. It allows many things that were not possible on consoles before. If you develop for Wii U you have to take advantage of these possibilities, otherwise your performance is of course limited. Also your engine layout needs to be different. You need to take advantage of the large shared memory of the Wii U, the huge and very fast EDRAM section and the big CPU caches in the cores. Especially the workings of the CPU caches are very important to master. Otherwise you can lose a magnitude of power for cache relevant parts of your code. In the end the Wii U specs fit perfectly together and make a very efficient console when used right.

http://hdwarriors.com/wii-u-specs-f...gpu-several-generations-ahead-of-current-gen/

We only know that you need to treat the Wii U differently than other consoles, because of a very different and in our view more accessible architecture. There is a lot of power to unleash in the Wii U. Enough power for many years to come, at least from our point of view.

http://hdwarriors.com/wii-u-has-a-lot-of-power-to-unleash-power-for-years-to-come/

Tessellation itself is not resource heavy on recent GPUs but it depends on actual usage. Although even previous consoles had these features you saw it only very rarely used. People often think of it as an easy way to get free ‘level of detail’. That doesn’t work. It’s because of certain visual problems associated with adaptive tessellation

http://hdwarriors.com/shinen-on-the-practical-use-of-adaptive-tessellation-upcoming-games/



Given Monolith X I am ready to believe him, although I would still love to know more about it :D I doubt that unless some dev leak some official documentation we will never know.

Whatever it does it is better, lower power and very very differently.
 
Last edited by a moderator:
Still 160 SPUs would be about 170 gflops, that mean the Wii u best case scenario would be as powerfull as 60-70% Xenos, even if the real world performance of xenos is low, and ugpu is 100%, it is very hard to believe that any dev could make up for 30-40% of the others, that said may of the ports are even doing native 720p.
You're assuming the Wii U clocks are the same as Xenos. Have clock speeds been confirmed?
 
You're assuming the Wii U clocks are the same as Xenos. Have clock speeds been confirmed?

That is under the rumored 550Mhz. For them to be on par xenos probably needed to have no more than 50% real world performance (that assuming 80-90% for upgu, easier to belive given Nitendo history and Edram design) and at the very least in the more optimized engines xenos should be better than that.

Although like I said I still think that there is more than 1 way of optimizing ugpu, given some coments.


BTW this is what I meant by DX10(.1) being more efficient a few pages ago.

http://www.youtube.com/watch?v=a7-_Uj0o2aI

That is a 10-25% improvement "for free", even having the same gflops.
 
Except his flat out written statement is only partially correct. It is not fabbed at TSMC, It is fabbed at Renesas. He can't tell TSMC vs Renesas from the work they did, but he can tell 40nm vs 55nm.

Thanks. Do we know for sure that it's Renesas 40nm?

Even if it is, a different process and possibly a different chip layout tool and/or team could have a big impact on die area and transistor density, I guess. Assuming the SRAM banks are the same capacity - and I don't see why we shouldn't - then transistor density does seem to be lower.

brazos_wiiu_simdfeuev.jpg
 
Still 160 SPUs would be about 170 gflops, that mean the Wii u best case scenario would be as powerfull as 60-70% Xenos, even if the real world performance of xenos is low, and ugpu is 100%, it is very hard to believe that any dev could make up for 30-40% of the others, that said may of the ports are even doing native 720p.

As I have recently learned (late to the party) Xenos is supposed to be 216 GFLOPs. A 172 GFLOP Wii U would be in easy striking distance of Xenos, especially with higher fillrate, no edram resolve penalty, no tiling related penalties, larger texture caches, etc etc
 
As I have recently learned (late to the party) Xenos is supposed to be 216 GFLOPs. A 172 GFLOP Wii U would be in easy striking distance of Xenos, especially with higher fillrate, no edram resolve penalty, no tiling related penalties, larger texture caches, etc etc

isn't that with os reserve?

presumably wii u gpu would have an os reserve too.
 
I think that's just the raw capabilities of the machine, peak theoretical FLOPS and that.

I always heard 240 gflops for Xenos...

So when you said 216, I assumed it was 240-10%(reserves)=240-24=216...

Number fit too pat to be a coincidence.

I think every console has to have reserves from the GPU, so Wii U will too.
 
Xenos is (Vec4+scaler) * 16 * 3 = 240 SPs. My memory is foggy, but if doing MADs you might only be able to use the vector unit which would result in 192 GFLOPS at 500MHz. There might be some case where there's enough input (GPR) bandwidth to use the scalar at the same time, but it probably doesn't happen in practice as often as it could with VLIW5.

There's no way to reserve 10% of the SPs so any "reservation" is likely time slicing and it's a hand wavy estimate.
 
Regarding FLOPS, searching around I found this post here on B3D:

http://forum.beyond3d.com/showpost.php?p=519618&postcount=3

It gives a link to this article:

http://game.watch.impress.co.jp/docs/20050520/x360_g.htm

And includes a translation of the relevant part of the article. I couldn't follow the translation in that post, so I tried a newer translation, thinking modern translation services might be better:

3 blocks in 16-based unit, shader unit of 48 based in total. It is a vector image that consists of floating point numbers in (FP) called (SIMD) computing unit as an entity. Can be carried out in the (clock) 1 cycle at the same time (one element FP) scalar FP and operations (product Wasan) vector operations of four elements. 360-GPU × 48 = 432FLOP next to (2 +1 Math scalar element 4 ×), Xbox 1 per cycle because it is 500MHz drive, peak performance will be 432 * 500MHz = 216GFLOPS just shaders.

I still can't follow the last bit of that. :(
 
48 vec4+scalar alus was correct, but I don't know where the 432 came from. Seems like bad math. Anyway I guess this is slightly off topic.
 
Status
Not open for further replies.
Back
Top