Wii U hardware discussion and investigation *rename

Exophase · Nov 27, 2012

Not only is that die much smaller but it's most likely made on an older process. So RV710 sounds way too pessimistic.

It is however possible that Nintendo is blowing a little of that die space including Hollywood for backwards compatibility purposes. I'm not saying Nintendo had to do this, but I wouldn't put it past them. In this case I would at least hope that the eDRAM is shared with it, but it'd still cost a couple dozen mm^2 or so.

I.S.T. · Nov 27, 2012

Also, isn't the Wii U GPU manufactured at ol' 40nm? The die size of a RV710 at 55nm wouldn't really be comparable...

BobbleHead · Nov 27, 2012

ERP said:
But all of those components are modular, there is no need for it to be based on a particular retail GPU.

This is true of all the recent AMD GPUs. You all should really not get hung up on finding an exact retail GPU version of the WiiU. It doesn't exist. The specific combination of the different components in WiiU does not match any other instance of the family it is based on. Anyone who says "it's clearly an rv750!" or "no way, it's obviously an rv720!" is wrong.

Both of these statements are equally true:
rv710 is a variant of rv770.
rv770 is a variant of rv710.

If the WiiU is a 7xx, then does it really matter if it started with 710 and a few numbers were increased or if it started with 770 and a few numbers were reduced?

Nintendo will never release the exact config or clockspeed, but both will eventually be unofficially figured out via developer leaks and die shot analysis.

Kaotik · Nov 27, 2012

Clockwork said:
Remind me not to try to type replies on my phone.

The die size of the Wii U GPU is more than double that of the RV710 you mention.

Even accounting for the EDRAM and other logic it would be way off.

Indeed, with eDRAM there's in fact still room for a RV730 in that chip (knowing that 730 was done at 55nm, and this is 40nm, of course)

Shifty Geezer · Nov 27, 2012

BobbleHead said:
Nintendo will never release the exact config or clockspeed, but both will eventually be unofficially figured out via developer leaks and die shot analysis.

Is it likely that the config can be guessed at from a general arrangement of components? eg. 710 is:

80SP's
4 ROP's
8 TMU's

If the SP:ROP:TMU ratios are kept the same, assuming they are balanced like this as a minimum spec (high-end chips could add more SPs, but you'll want a minimum for texturing and rendering at 720p), and we assume 8 ROPs to match PS360, that'd be a config of

160 SPs
8 ROPs
16 TMUs

That'd be a little too large and hot I think, so perhaps they'd then reduce it to 120 SPs. Would they touch the TMUs? If BW is an issue, perhaps so. We can be confident that this GPU+eDRAM adds up to the 156mm^2, only without knowledge of whether Hollywood is included. I think that's enough for the more well-informed to present an R7xxx config that'd fit the 120ish mm^2 at 40 nm. But of guesswork involved, but it should be the closest we've yet got.

almighty · Nov 27, 2012

Shifty Geezer said:
160 SPs
8 ROPs
16 TMUs

That'd be a little too large and hot I think, so perhaps they'd then reduce it to 120 SPs. Would they touch the TMUs? If BW is an issue, perhaps so. We can be confident that this GPU+eDRAM adds up to the 156mm^2, only without knowledge of whether Hollywood is included. I think that's enough for the more well-informed to present an R7xxx config that'd fit the 120ish mm^2 at 40 nm. But of guesswork involved, but it should be the closest we've yet got.

Would 160SP's or even 120SP's for that matter make the GPU many times faster then Xenos with 48SP's?

8 ROP's is a must as it would be seriously crippled compared to Xenos and RSX.

On the comment of it being smaller then Wii U's GPU do we have confirmation at all that Wii U's GPU is indeed on a 40nm process or is that just people guessing?

Surely as well as the EDRAM sucking up extra transistors there would also be added logic for the tablet?

Gipsel · Nov 27, 2012

Shifty Geezer said:
160 SPs
8 ROPs
16 TMUs

That'd be a little too large and hot I think,

No, it would not (compare with RV730 and think of a shrinked version, or look at the difference between Cedar and Caicos in 40nm, or start with Redwood).

Shifty Geezer said:
so perhaps they'd then reduce it to 120 SPs. Would they touch the TMUs?

Yes, they would. The TMUs are an integral part of the SIMDs/CUs starting with the R700 generation. AMD realized two possibilities: half sized SIMDs (8*5=40 SPs, Wavefront size 32) with 4 TMUs (RV710 and RV730) and full size SIMDs (16*5=80 SPs, Wavefront size 64) with 4 TMUs (RV770 and later on RV790 and RV740).

Edit:

almighty said:
Would 160SP's or even 120SP's for that matter make the GPU many times faster then Xenos with 48SP's?

Xenos has basically almost R600 style SIMDs which should count as 240 SPs for a comparison (3 full size 16*5=80 SPs SIMDs). One just needs to keep in mind they are less flexible as they are not VLIW5 but vec4+1.
I guess 160SPs at the same clock would have a hard job to be significantly faster than Xenos (it would sometimes struggle just to keep up and one would likely need 16 TMUs like Xenos for it). But the ROPs work quite a bit different on recent GPUs, that adds uncertainty.

almighty · Nov 27, 2012

Gipsel said:
Xenos has basically almost R600 style SIMDs which should count as 240 SPs for a comparison (3 full size 16*5=80 SPs SIMDs). One just needs to keep in mind they are less flexible as they are not VLIW5 but vec4+1.
I guess 160SPs at the same clock would have a hard job to be significantly faster than Xenos (it would sometimes struggle just to keep up and one would likely need 16 TMUs like Xenos for it). But the ROPs work quite a bit different on recent GPUs, that adds uncertainty.

Wouldn't the architectural improvement that AMD made with R700 over R600 cover some of the issues that could be faced from porting from 360 to Wii U?

Kaotik · Nov 27, 2012

Gipsel said:
Xenos has basically almost R600 style SIMDs which should count as 240 SPs for a comparison (3 full size 16*5=80 SPs SIMDs). One just needs to keep in mind they are less flexible as they are not VLIW5 but vec4+1.

To my understanding they're closer to R5xx vertex shaders than R600 shaders, even though they're unified like R600's

Gipsel · Nov 27, 2012

almighty said:
Wouldn't the architectural improvement that AMD made with R700 over R600 cover some of the issues that could be faced from porting from 360 to Wii U?

Besides the LDS, the major difference is the arrangement of the TMUs. And that was mainly done to enable easier scaling to higher SIMD counts (quite succesful if you look at RV730 or RV770). It is not inherently more efficient (the TMU load balancing is probably even slightly better with the R600 design, but it doesn't scale well).
If you come from Xenos, the largest difference from an arithmetic performance point of view are probably the VLIW5 shader, which are more efficient for general purpose shader code than the quite rigid vec4+1 setup. But the latter is probably quite close for "traditional" shader code. Another larger potential difference are the ROPs imo.

Kaotik said:
To my understanding they're closer to R5xx vertex shaders than R600 shaders, even though they're unified like R600's

The R5xx vertex shader part is the vec4+1 arrangement instead of VLIW5. But for comparisons with later GPUs, it's better to think of it like slightly less flexible R600 SIMDs (and the TMU arrangement of Xenos was also close to R600).

fellix · Nov 27, 2012

Gipsel said:
AMD realized two possibilities: half sized SIMDs (8*5=40 SPs, Wavefront size 32) with 4 TMUs (RV710 and RV730) and full size SIMDs (16*5=80 SPs, Wavefront size 64) with 4 TMUs (RV770 and later on RV790 and RV740).

Why would halved multiprocessor alter the wavefront size? All AMD's VLIW architectures executes the wavefronts over 8 cycles -- two WF interleaving each other for simple and effective utilization.

Full-sized SIMD: 16 lanes * 8 cycles / 2 WF = 64 WF size.
A halved SIMD setup will simply keep the 8-cycle latency, minus the interleaving: 8 lanes * 8 cycles = 64 WF size.

Gipsel · Nov 27, 2012

fellix said:
Why would halved multiprocessor alter the wavefront size? All AMD's VLIW architectures executes the wavefronts over 8 cycles -- two WF interleaving each other for simple and effective utilization.

Full-sized SIMD: 16 lanes * 8 cycles / 2 WF = 64 WF size.
A halved SIMD setup will simply keep the 8-cycle latency, minus the interleaving: 8 lanes * 8 cycles = 64 WF size.

No it doesn't. GPUs with halved SIMD sizes (RV610/620 had even quarter size

) still interleave 2 wavefronts. The wavefronts are getting smaller with reduced SIMD sizes. One can actually ask some APIs (like CAL) for the wavefront size and it returns 32 for GPUs with half size SIMDs and 64 for full size SIMDs (edit: and iirc it is also officially documented somewhere; edit2: AMD's Mica Villmow confirms it here).
The reason is probably some pipelining issues with the register file accesses or branches or whatever. It's simply easier to reduce the wavefront size as everything else stays the same that way.

fellix · Nov 27, 2012

Thanks for the clarification.

My puzzlement came from a statement by Nvidia few years ago to the developers, that they will continue to keep the warp-size at 32 intact for all the GPU implementations. I thought AMD was sort on the same bandwagon.

Gipsel · Nov 27, 2012

fellix said:
Thanks for the clarification.

My puzzlement came from a statement by Nvidia few years ago to the developers, that they will continue to keep the warp-size at 32 intact for all the GPU implementations. I thought AMD was sort on the same bandwagon.

They are now as starting with the R900/Northern Islands generation all GPUs (also the smallest, Caicos) have a wavefront size of 64 (R800/Evergreen only had Cedar with half size SIMDs and if one counts APUs also Ontario/Zacate) and future incarnations of GCN will probably keep this.

Mobius1aic · Nov 27, 2012

I think RV710, Cedar, or even Caicos are out of the question when you consider the die size and the performance they provide. Caicos comes close and at 750 MHz, matches the 240 GFLOPS of Xenos, but still only has 8 TMUs and 4 ROPs. RV730 or Redwood continues to make the most sense, unless Nintendo had AMD build something custom like a 240 SP, 24 TMU, 12 ROP part with a nice clock speed like 750 MHz or so to guarantee a truly substantial boost over Xenos. Though reducing RV730 to 40 nm or using Redwood would just make more financial sense, I think. Fun to speculate though, but the even considering eDRAM, the die size involved with that + RV710, Cedar or Caicos die sizes doesn't add up.

stiftl · Nov 27, 2012

Even if they had to include a whole Hollywood chip into the design (which they didn't according to the Iwata asks) this would probably leave something like 100mm² for the GPU alone (without eDRAM). A 400 or 480 shader units isn't out of the question IMO, so something like Redwood or Turks (I know these are 1 resp. 2 generations younger, but just to get a feeling).

dumbo11 · Nov 27, 2012

Based on the latest DF article, anyone got thoughts on this type of crazy argument:
- the Wii-U compresses the video and sends it to the tablet.
- if the image sent to the tablet is torn, the stream gets artifacts and looks ugly.

That would leave developers stuck between a rock and a hard place in terms of features if they intend to use the tablet, and might explain why the Wii-U seems to perform below it's spec?

liolio · Nov 27, 2012

stiftl said:
Even if they had to include a whole Hollywood chip into the design (which they didn't according to the Iwata asks) this would probably leave something like 100mm² for the GPU alone (without eDRAM). A 400 or 480 shader units isn't out of the question IMO, so something like Redwood or Turks (I know these are 1 resp. 2 generations younger, but just to get a feeling).

Whereas I would be pleased by the idea that the WiiU has a redwood under the hood, I feel startling the fact that the system performs so badly. I run a redwood on the laptop I'm currently typing on, it easily out performs the 360.
To me looking at redwood or Turk or AMD APU, the only conclusion I can draw by watching at how a game like CoD struggles while rendering at 880x720 is that the overall design sucks badly. It will get better but such crappy results should not happen to begin with.

Anybody that would release such a hardware platform (not even as a console) would get mocked for a reason. I guess Nintendo is something special as Apple with the difference that Apple comes with good hardware, their last CPU is as good as it get within its power budget.

Exophase · Nov 27, 2012

stiftl said:
Even if they had to include a whole Hollywood chip into the design (which they didn't according to the Iwata asks) this would probably leave something like 100mm² for the GPU alone (without eDRAM).

Could you link me to the interview that referred to Wii GPU support? Or preferably if you could give me the snippet that addresses it.

Shifty Geezer · Nov 27, 2012

dumbo11 said:
Based on the latest DF article, anyone got thoughts on this type of crazy argument:
- the Wii-U compresses the video and sends it to the tablet.
- if the image sent to the tablet is torn, the stream gets artifacts and looks ugly.

That would leave developers stuck between a rock and a hard place in terms of features if they intend to use the tablet, and might explain why the Wii-U seems to perform below it's spec?

There may be something to that. The compression is probably MJPEG, which isn't going to work if you suddenly change three quarters of the framebuffer while its supposed to be being compressed. This could see Nintendo enforcing a vertical sync to ensure the FB is complete before being compressed and broadcast.

Anyone know how remote play on PSP/Vita works by comparison on games that tear?

Wii U hardware discussion and investigation *rename

Exophase

I.S.T.

BobbleHead

Kaotik

Drunk Member

Shifty Geezer

uber-Troll!

almighty

Gipsel

almighty

Kaotik

Drunk Member

Gipsel

fellix

Gipsel

fellix

Gipsel

Mobius1aic

Quo vadis?

stiftl

dumbo11

liolio

Aquoiboniste

Exophase

Shifty Geezer

uber-Troll!

Similar threads