Wii U hardware discussion and investigation *rename

Status
Not open for further replies.
Isn't R600 whats inside the 360?

No, it's quite a bit behind. This article will tell you more about it:

http://www.beyond3d.com/content/articles/4/1

One of the big differences between Xenos's shaders R600 is that Xenos is vec4 + scalar while R600 is VLIW5 with a fair amount of independence between the five instructions. The latter gives more scheduling flexibility than the former, so you can probably get higher average ALU utilization out of it.
 
IIRC the low end HD2400/HD3400 series used a conventional 64bit crossbar rather than a ring bus, as it was simpler and used less power. The ring bus was not integral to the r6xx architecture.
Actually the whole HD3000 series ditched the ring bus. Only the R600 (HD2900) itself had it, all smaller incarnations (also HD2600 with RV630) and also the RV670 (HD3870) did not.
 
Just theory that a more modern architecture from 2010 would be a lot more efficient than one from 2004.

judging by the ports your theory would be wrong, you would think a console in 2012 should out perform 360 easily but the bottle-necks prevent it from doing so.
 
Last edited by a moderator:
judging by the ports your theory would be wrong, you would think a console in 2012 should out perform 360 easily but the bottle-necks prevent it from doing so.

Not necessarily. We can't be conclusive on that without knowing the development time/resources that went into each 'port' for a start.
His theory could still be correct. It's possible that although there are bottlenecks to code designed for a CPU-heavy system, the WiiU can still 'outperform' 360 in certain tasks which take advantage of its strengths. (I've highlighted 'Possible' there so as not to corner myself ;))

I'm not saying you're wrong, but its very difficult to use multi-platform game performance as a conclusive metric for hardware 'performance' overall - especially given the rather drastic differences in CPU 'power' between those two systems. What the ports may tell us is that there's evidence of WiiUs slower/less meaty CPU and possibly the slower RAM - but not whether the WiiU is outperformed by 360 or not. I'm not even sure how you'd measure that tbh. Its not like we can just run 3DMark on them :)


I miss the good old days of polygon counting.
 
Not necessarily. We can't be conclusive on that without knowing the development time/resources that went into each 'port' for a start.
His theory could still be correct. It's possible that although there are bottlenecks to code designed for a CPU-heavy system, the WiiU can still 'outperform' 360 in certain tasks which take advantage of its strengths. (I've highlighted 'Possible' there so as not to corner myself ;))

I'm not saying you're wrong, but its very difficult to use multi-platform game performance as a conclusive metric for hardware 'performance' overall - especially given the rather drastic differences in CPU 'power' between those two systems. What the ports may tell us is that there's evidence of WiiUs slower/less meaty CPU and possibly the slower RAM - but not whether the WiiU is outperformed by 360 or not. I'm not even sure how you'd measure that tbh. Its not like we can just run 3DMark on them :)


I miss the good old days of polygon counting.

these bottle necks that wiiu has are not even present in console's made in 2005, so wouldn't that debunk the theory?
 
judging by the ports your theory would be wrong, you would think a console in 2012 should out perform 360 easily but the bottle-necks prevent it from doing so.
Unless we understand how that thing even works, we won't know if there actually are bottlenecks. It's more likely that most devs simply haven't figured out how to deal with stuff such as the strange memory architecture yet.
 
these bottle necks that wiiu has are not even present in console's made in 2005, so wouldn't that debunk the theory?

Not really, improvements have been made since 2005-2006 in the field of memory, software stack and graphics efficiency.

When someone finally spill the beans on the hardware we can either deny or confirm the current best rumors.
 
Unless we understand how that thing even works, we won't know if there actually are bottlenecks. It's more likely that most devs simply haven't figured out how to deal with stuff such as the strange memory architecture yet.

I think it's more likely ram bandwidth and cpu being crap. who knows if they will ever figure this gpu out, its been 3 days and we still can't confirm if it's 160 SP or 320 SP.
 
Wii U speculation: the gift that keeps on giving

My current preferred hypothesis is:

The WiiU GPU is 40nm, the shaders are in 8 blocks of 20 and the reason they're big / low density is that they're shrinks of 80/65 nm R6xx based designs originally mooted for a "HD Gamecube" platform that never came to pass. It's the only current hypothesis that adequately explains:

- Shader block sizes (why they look 55nm)
- Number of register banks
- edram density
- Marcan's "R600" references
- The "old" architectures for both the CPU and GPU
- The very Xbox 360 like level of performance

It might also explain the transparency performance if Nintendo decided to ditch the b0rked MSAA resolve / blend hardware in R6xx, or if the edram was originally intended to be on an external bus.

If Nintendo had intended to release a "GC HD in 2006/2007 then R6xx on 80/65 nm and a multicore "overclocked" Gekko on 90 or 65 nm are precisely what they would have gone for. 90% of the 360 experience at 1/2 of the cost and 1/3 of the power consumption. Use 8~16 MB of edram and 256~512 MB of GDDR3 2000 and you've almost got Xbox 360 performance in a smaller, quieter machine that doesn't RRoD and vibrate.

Would have been quite something.

Edit: for anyone on NeoGaf reading this, I'm not suggesting that the Wii U design was lying around for years, I'm suggesting that the core technology used in the Wii U may have been decided on a long time ago when Nintendo had different objectives. Specifics of the Wii U design, such as clocks, edram quantity, process, main memory type, all that custom stuff for handling two displays etc would be much newer.
 
Last edited by a moderator:
My current preferred hypothesis is:

The WiiU GPU is 40nm, the shaders are in 8 blocks of 20 and the reason they're big / low density is that they're shrinks of 80/65 nm R6xx based designs originally mooted for a "HD Gamecube" platform that never came to pass. It's the only current hypothesis that adequately explains:

- Shader block sizes (why they look 55nm)
- Number of register banks
- edram density
- Marcan's "R600" references
- The "old" architectures for both the CPU and GPU
- The very Xbox 360 like level of performance

It might also explain the transparency performance if Nintendo decided to ditch the b0rked MSAA resolve / blend hardware in R6xx, or if the edram was originally intended to be on an external bus.

If Nintendo had intended to release a "GC HD in 2006/2007 then R6xx on 80/65 nm and a multicore "overclocked" Gekko on 90 or 65 nm are precisely what they would have gone for. 90% of the 360 experience at 1/2 of the cost and 1/3 of the power consumption. Use 8~16 MB of edram and 256~512 MB of GDDR3 2000 and you've almost got Xbox 360 performance in a smaller, quieter machine that doesn't RRoD and vibrate.

Would have been quite something.

anyway to prove this theory? i guess ill wait for the smart people to agree or diagree with you.
 
anyway to prove this theory?

Only way is to dig into the hardware and see which GPU generation it most closely resembles. If it's R6xx it's probably true, if it's R7xx onwards it's probably not. If it's anything later it's definitely not.

Nintendo had no way of knowing if the Wii bet would pay off and support their home console business. Up until it was clear they had a winner on their hands, parallel development of a more powerful competitor to the PS3 and Xbox 360 (the specs for which were leaked in early 2004) would have been a very smart thing for them to do. After that, there would be no point in letting the R&D go to waste for their next novel-controller based system especially as they were once again not targeting cutting edge graphics.
 
My current preferred hypothesis is:

The WiiU GPU is 40nm, the shaders are in 8 blocks of 20 and the reason they're big / low density is that they're shrinks of 80/65 nm R6xx based designs originally mooted for a "HD Gamecube" platform that never came to pass. It's the only current hypothesis that adequately explains:

- Shader block sizes (why they look 55nm)
- Number of register banks
- edram density
- Marcan's "R600" references
- The "old" architectures for both the CPU and GPU
- The very Xbox 360 like level of performance

It might also explain the transparency performance if Nintendo decided to ditch the b0rked MSAA resolve / blend hardware in R6xx, or if the edram was originally intended to be on an external bus.

If Nintendo had intended to release a "GC HD in 2006/2007 then R6xx on 80/65 nm and a multicore "overclocked" Gekko on 90 or 65 nm are precisely what they would have gone for. 90% of the 360 experience at 1/2 of the cost and 1/3 of the power consumption. Use 8~16 MB of edram and 256~512 MB of GDDR3 2000 and you've almost got Xbox 360 performance in a smaller, quieter machine that doesn't RRoD and vibrate.

Would have been quite something.
Function, that is one far-fetched hypothesis, to put it mildly. The last 3 points in particular:

- The Yamato stack on my netbook has R600 "references" in the shape of various strings in the stack binaries - that does not make the Yamato R600-related in any shape or form, at least not more than R700 is. That just means the codebase used for building the stack has legacy/cross-device AMD tech, as many elements from the stack (e.g. the compiler) are shared across AMD's entire coodebase, and many elements that have first appeared in a given AMD design have retained their names through later generations, etc.

- "Old" architecture of CPU and GPU - frankly, I don't even know what that refers to. Am I missing some context here?

- The "very 360-like level or performance" - Latte is not a 1TFLOP GPU - what exactly should the prima-vista level of performance be - nothing like the 360? Or do you have some precise measurements you base that on?

As re the first 3 points - aren't those still subject to debate? Perhaps I've missed some newsbit, bit have there been confirmations on the fab tech of the Latte die?
 
Last edited by a moderator:
I think it's more likely ram bandwidth and cpu being crap. who knows if they will ever figure this gpu out, its been 3 days and we still can't confirm if it's 160 SP or 320 SP.
  • RAM bandwidth is unknown, could be several hundred GB/s if used correctly.
  • What exactly makes the CPU "crap" in your opinion?
  • I think we would have seen much, much bigger issues with the games so far if there were only 160 ALUs.
 
Being ported from a larger process doesn't make you end up with large transistors. You can look at several examples of shrinks where the layout looks extremely similar but they still get a huge density improvement. There's especially no reason for the SRAM banks to be larger than they should be.

If XBox 360 and Wii U have similar texel bandwidth and a game is texture fetch limited then all the extra ALU power in the world won't help it. And it's not surprising that cross platform games using the same engine will use shaders that are suitably balanced for the TMU:ALU capabilities of XBox 360 and PS3.

We don't really know if this is or isn't the case; with what we've seen so far both the hypothetical 160SP or 320SP variants are feasible (or something else entirely different of course). If unsigned code exploits are really already working on Wii U then it won't be that long before someone can run some tests and figure it out themselves.

As far as eDRAM bandwidth is concerned, we can see there are 1024 columns present. It's a reasonable assumption that you get one bit at a time out of them, and probably no more than one per clock. So that'd give an upper limit of 70.4GB/s if a 550MHz clock is used. Of course, an asynchronous clock could be used, but I think they'd avoid this if they can since it adds buffering requirements and latency if the clock isn't an integer multiple. If we're talking DDR3 bandwidth then there's really no question that the upper limit is 12.8GB/s.
 
Last edited by a moderator:
these bottle necks that wiiu has are not even present in console's made in 2005, so wouldn't that debunk the theory?


What theory? I'm not presenting any theories I dont think. And of course those bottlenecks "existed" in 2005....I'm not quite sure what you mean there.

The point I'm making is that something having a bottleneck (wow, saying that word too much now, sounds weird) simply means theres some aspect of the hardware which is holding back the performance of another. That can and will happen on any peice of hardware to some extent. The potential problem when porting a game from platform A to platform B, is that your code may be optimised to take advantage of one aspect of platform A, which isnt present on platform B. Maybe thats easy to overcome if you have the time/money/inclination but maybe you didn't have those things or maybe it just isn't easy to overcome. We dont know those variables so can't really use that scenario as a metric for judging system B's overall 'performance'. IMO.
 
Function, that is one far-fetched hypothesis, to put it mildly.

Yeah, but I wanted to try something that might justify having large 20 shader blocks as opposed to super dense 40 shader blocks that blow past Brazos and Llano for density.

- The Yamato stack on my netbook has R600 "references" in the shape of various strings in the stack binaries - that does not make the Yamato R600-related in any shape or form, at least not mores than R700 is. That just means the codebase used for building the stack has legacy/cross-device AMD tech, as many elements from the stack (e.g. the compiler) are shared across AMD's entire coodebase, and many elements that have first appeared in a given AMD design have retained their names through later generations, etc.

Well that's interesting to know, and means that what marcan found isn't proof of my hypothesis.

- "Old" architecture of CPU and GPU - frankly, I don't even know what that refers to. Am I missing some context here?

Old as opposed to new (or current). The CPU has been stated to be basically the same as a 1998 model in terms of features and performance/clock, just with higher clocks. Something like Bobcat should be massively faster and you don't need three Broadways for BC. As for the GPU, VLIW5 is several generations old now and R7xx wasn't even the most recent version of it. Time for customisation may be an issue that mean newer designs couldn't be used, but even so such an old technology base seems a little strange when the highly customised Durango is looking at GCN+.

- The "very 360-like level or performance" - Latte is not a 1TFLOP GPU - what exactly should the prima-vista level of performance be - nothing like the 360? Or do you have some precise measurements you base that on?

Not exceeding the 360 in a 2012 design is quite something, when even a half crippled Llano from 2011 can do it on the CPU side and a mobile version can easily do it on the GPU side. I'm looking for a deeper reason for such an incredibly low level of performance although I'll freely admit that there doesn't actually have to be one.

As re the first 3 points - aren't those still subject to debate? Perhaps I've missed some newsbit, bit have there been confirmations on the fab tech of the Latte die?

Nope, but the only fab size that seems to make sense wrt to the 32 MB of edram is 40nm. I may have messed up the sums (wouldn't be the first time) but I don't think 55nm could provide the density and I don't think anyone else's sums showed that it could either.
 
Old as opposed to new (or current). The CPU has been stated to be basically the same as a 1998 model in terms of features and performance/clock, just with higher clocks.
We don't know anything about the performance/clock, but the pretty massive cache alone should bring substantial improvements over Broadway. Feature wise, even Gekko was not the same as the 1998 model PPC750, and Espresso at the very least adds SMP on top of that, which no prior PPC750 ever had.
 
Being ported from a larger process doesn't make you end up with large transistors. You can look at several examples of shrinks where the layout looks extremely similar but they still get a huge density improvement. There's especially no reason for the SRAM banks to be larger than they should be.

I had thought that existing designs shrunk to smaller nodes tended to be less dense than new designs specifically targeting that node (Xenos/Xenon shrinks for example, vs "native" 65 nm GPUs). I was also intrigued by Fourth Storm's comments here:

Something that struck me was that R600 SPUs were practically identical to their R700 counterparts but less dense on the same process. Hmmm.

If true, I could image R600 SPUs at 40nm being less dense than those of Brazos, perhaps making the shader blocks look like they were super dense 40's or less dense 55nm 20's (the current debate). No idea if the sram blocks in Fourth Storm's example were larger though.

If XBox 360 and Wii U have similar texel bandwidth and a game is texture fetch limited then all the extra ALU power in the world won't help it. And it's not surprising that cross platform games using the same engine will use shaders that are suitably balanced for the TMU:ALU capabilities of XBox 360 and PS3.

Yeah, and I made that point last night. Interestingly even a 160 shader Wii U could pull ahead of the 360 in TMU bound situations thanks to its higher clock, despite possibly having less main memory BW.

I still think that a 160 shader Latte might be able to outperform the 360

We don't really know if this is or isn't the case; with what we've seen so far both the hypothetical 160SP or 320SP variants are feasible (or something else entirely different of course). If unsigned code exploits are really already working on Wii U then it won't be that long before someone can run some tests and figure it out themselves.

Yeah, and I'm not claiming one way or the other. I'm flip-flopping on 160 / 320 shaders on a day to day basis on which seems more likely. I'm certainly not claiming to have a definite answer. I'm hoping that someone like marcan really gets to the bottom of the hardware soon!
 
Status
Not open for further replies.
Back
Top