Wii U hardware discussion and investigation *rename

Grall · Jun 19, 2013

Commenter,
PC games suffer from an immense amount of driver overhead compared to consoles, you can't compare straight off like that. Each draw call on a PC brings with it waitstates on the order of 30k cycles or something incredibly ridiculous like that due to caches getting flushed etc while the driver switches back and forth between kernel and user mode, additional layers of hardware abstraction that a console simply does not need, and so on.

swaaye · Jun 19, 2013

Grall said:
Commenter,
PC games suffer from an immense amount of driver overhead compared to consoles, you can't compare straight off like that. Each draw call on a PC brings with it waitstates on the order of 30k cycles or something incredibly ridiculous like that due to caches getting flushed etc while the driver switches back and forth between kernel and user mode, additional layers of hardware abstraction that a console simply does not need, and so on.

It doesn't sound that bad.
http://forum.beyond3d.com/showpost.php?p=1725711&postcount=413

So yeah, I'm not buying the argument that you can't do pretty much whatever you want already on PC. Like I said, some of the best looking stuff (Frostbite, etc) uses a few thousand draw calls, and you could use 10x more and still be fine. Sure the overhead will be lower on consoles, but prove to me that it matters.

Though as you say there are other factors as well.

function · Jun 20, 2013

Commenter said:
Just for reference. An A4-3300 runs with a 160SP card and it seems to be struggling quite badly with Assassins Creed 3 at low resolution and settings. Worse than the framerates on the Wii U at least.

http://www.youtube.com/watch?v=Cyljv8IUWns

The Wii U GPU is clocked some 24% faster than an A4-3300. It should also have more bandwidth thanks to the edram, very likely has double the ROPs of the A4-3300 (8 vs 4), and might even have double the TMUs. Oh and it's not running Windows, which probably helps a lot too.

Still looking very good for 160 shaders IMO.

(((interference))) · Jun 20, 2013

Yup, seems like 160 SP.

I mean, if 160 SPs can account for Wuu perf levels, why do we think Nintendo would put a more expensive part in?

Entropy · Jun 20, 2013

I hoped someone else would do this, but...

SoreSpoon said:
The consensus has been 160 ever since someone suggested the possibility.

No.

Back in February, it was quite conclusively shown that the die area of the "Latte" SPU blocks closely match a set of 320 SPUs. The comparison with Zacate shows that it could even accomodate 320 of it's DX11 VLIW5 version, the small difference made up for with the fact that Latte was launched with another two years of experience with the process as well as lower clock targets.

Alternatively, those who preferred to do their estimations based on the earlier version of the architecture, scaled the SPU-blocks from the RV770 55nm die shot (since there are no RV740 die shots available) and came to the same conclusion.

The 160 SPU claim requires that you turn a blind eye to the actual physical evidence, and instead look at a small set of games ported from PS360, and assume that any frame-rate issues you observe is due to a lack of GPU ALU FLOPS, as opposed to any other hardware issue such as differences in CPU architectures or memory bandwidth, and ignoring all the issues inherent in porting existing code to a new platform on a commercial schedule.

Those of us who have no problem accepting the physical evidence, have simply not had any reason to participate in a discussion based on such ... odd ... premises.

TheLump · Jun 20, 2013

(((interference))) said:
Yup, seems like 160 SP.

I mean, if 160 SPs can account for Wuu perf levels, why do we think Nintendo would put a more expensive part in?

It's just like how the fanboy dreams of Power 7 PPC cores came crashing down when that marcan homebrew guy outed it was just three overclocked Broadway cores (which some people were joking suggesting as a CPU choice, to troll the Nintendo fanboys more than anything)

Why has it got to be something to do with "fanboys"? As far as I'm aware the people theorising 160 SP now were the same people who originally theorised 320 SP earlier on. There was no agenda from "fanboys" trying to make it look better than it was: just clever people progressively* trying to work out the details from what little evidence they have.

No need to bring that sort of rubbish into the discussion, imo.

*it's also still in progress, as neither 160 SP or 320 SP has categorically been proven. Could be more, could be less. But I'm not clever enough to have an input there

function · Jun 20, 2013

Entropy said:
I hoped someone else would do this, but...

No.

Back in February, it was quite conclusively shown that the die area of the "Latte" SPU blocks closely match a set of 320 SPUs. The comparison with Zacate shows that it could even accomodate 320 of it's DX11 VLIW5 version, the small difference made up for with the fact that Latte was launched with another two years of experience with the process as well as lower clock targets.

Alternatively, those who preferred to do their estimations based on the earlier version of the architecture, scaled the SPU-blocks from the RV770 55nm die shot (since there are no RV740 die shots available) and came to the same conclusion.

The 160 SPU claim requires that you turn a blind eye to the actual physical evidence, and instead look at a small set of games ported from PS360, and assume that any frame-rate issues you observe is due to a lack of GPU ALU FLOPS, as opposed to any other hardware issue such as differences in CPU architectures or memory bandwidth, and ignoring all the issues inherent in porting existing code to a new platform on a commercial schedule.

Those of us who have no problem accepting the physical evidence, have simply not had any reason to participate in a discussion based on such ... odd ... premises.

So the number of sram banks doesn't class as physical evidence? Can you point me to any AMD VLIW5 variant that matches your proposed "twin ported sram (iirc)" 320 shader arrangement?

Also you don't know what process the chip is made on - there's only a single uncorroborated statement by Chipworks who don't actually appear to have put the chip under an electron microscope.

Nah, I'll take the 160 shader option as my preferred hypothesis. That's based on the weight of evidence, and with no directly contradicting physical evidence, at least as I see it.

Entropy · Jun 20, 2013

function said:
So the number of sram banks doesn't class as physical evidence? Can you point me to any AMD VLIW5 variant that matches your proposed "twin ported sram (iirc)" 320 shader arrangement?

The SRAM bank arrangement is orthogonal to the issue of the density of the SPU array itself.
(And besides, there is no reason to assume that the SRAM would be arranged in exactly the same way on a new GPU).

Also you don't know what process the chip is made on - there's only a single uncorroborated statement by Chipworks who don't actually appear to have put the chip under an electron microscope.

If Jim Morrison at Chipworks makes the flat-out written statement: "This chip is fabricated in a 40 nm advanced CMOS process at TSMC.", well, that is pretty much as factual as it gets. This is what they do in the industry. He knows. If he didn't, he wouldn't make that statement about one of their jobs, nor has he retracted or qualified it afterwards but let it stand.

In fact this is a much stronger data point than the VLIW5 assumption for instance or the amount of EDRAM. So we know the process, we can measure and calculate densities directly. There is no question that it comes out to 320SPUs assuming reasonably close relation to AMDs other products.

Nah, I'll take the 160 shader option as my preferred hypothesis.

You're free to prefer whatever the heck you want. Active minds have been chewing on bad data based on flawed premises for months, and that's OK and presumably provides some satisfaction or pleasure for the involved parties. But a guy like me haven't been interested in this thread much since the first week or so since the die shot was published, and what could be extracted from it had been done. Personally, I'm interested in the CPU to GPU interface, but I wouldn't dream of trying to draw conclusions from it based on the frame rates of a Batman port.... Particularly if I actually had physical data already telling me the answer.

But this is a discussion forum, and people will discuss even in the absence of useful data. And i'm fine with that, as long as I'm not implied to be part of some "consensus".

pc999 · Jun 21, 2013

Yes this is the first time I saw it too.

http://www.neogaf.com/forum/showthread.php?t=511628

Entropy said:
Back in February, it was quite conclusively shown that the die area of the "Latte" SPU blocks closely match a set of 320 SPUs. The comparison with Zacate shows that it could even accomodate 320 of it's DX11 VLIW5 version, the small difference made up for with the fact that Latte was launched with another two years of experience with the process as well as lower clock targets.

Personally looking again at this

http://uk.ign.com/videos/2013/06/12/rewind-theater-x-e3-2013-trailer

Makes hard to believe that wii u gpu is only 30-50% better, best case for 320 spus, than xb360. I mean that trailer shows a bigger landscape than anything last gen, at the same time more detail than anything last gen, all of that with some impressive soft self shadowing in the open out door scenes, that is something that I haven't seen any game from last gen do not even last of us (it is very easy to spot on the mechs, also easy on the big weapons that characters transports) and other shading fx. Certainly it easily surpasses what is showed in The Last Of Us and considering the difference of experience with shading and the respective console that both teams have, it is huge.

The DX 11 gpu would also take a lot of strain from the cpu (even not doing complex gpgpu) that wouldn't be as much for number crunching (for animation, audio and the like), but would actually have a nice architecture for things like AI, data searching, thread management... That said we didnt saw any of the "tipical DX11 shading fx" that we have been showed in tech demos from the likes of AMD or Nvidea, maybe it just dont have the raw power for that.

What about 320 DX10 SPUs? Or even 400? How would that compare to the Zacate?

That not even talking about the big parts that we cant even speculate about it (we know there is a tessellator somewhere, a audio dsp, and a ARM core to but would that take so much space?).

Anyway quite unusual chips on both cpu and gpu...

Rodéric · Jun 21, 2013

Well, don't forget the Wii U has 1Gio of memory for games alone, whereas the PS360 have 512Mio minus what the OS takes.
Also the X360 CPU should really be considered a 6 cores in-order (that part matters a lot performance wise) 1.6GHz chip, and some of that is reserved.
(Not sure what Nintendo does for the Wii U.)

It looks good given the field of view for sure, but that might also come from "advanced" (if you forget they existed in hardware 10 years ago already

) GPU features such as Virtual Texturing, that would save even more memory (at the expense of bandwidth).

It is probably beyond what a PS360 could do, but it's quite hard to tell for sure, and even harder to state why (memory ? computing power ? additional GPU features ?)

function · Jun 21, 2013

Entropy said:
The SRAM bank arrangement is orthogonal to the issue of the density of the SPU array itself.
(And besides, there is no reason to assume that the SRAM would be arranged in exactly the same way on a new GPU).

I disagree. There is a strong reason to expect that the number and size of the sram banks should resemble other VLIW5 designs, because to have anything else would have required engineering resources that haven't been seen to be re-used anywhere else, and to do so for no apparent performance increase (same register size and BW per SIMD unit) but with increased design complexity.

I also don't accept that SRAM bank sizes (in terms of storage and die area) are orthogonal to die area they take up (I don't understand your thinking on this). In fact I'd say that the die area that sram banks of a given (memory) size are likely to be very strongly influenced by process node and transistor density. I would actually expect that they would be more likely to have relatively consistent die area across different GPUs produced on the same node than the SIMD units themselves, which on the Wii U may include modified designs and additional features and be laid out using different tools (etc).

This post (and the one it quotes) describe why the sram banks are an issue for your "320 shader" hypothesis. This is very much based on the kind "physical evidence" that you indicated you were unique in considering, and that's why I felt compelled to respond.

http://forum.beyond3d.com/showpost.php?p=1703033&postcount=4524

If Jim Morrison at Chipworks makes the flat-out written statement: "This chip is fabricated in a 40 nm advanced CMOS process at TSMC.", well, that is pretty much as factual as it gets.

We perhaps disagree somewhat on what a "fact" is. He may of course be right, but we don't know how he came to this conclusion. He might know for sure (an trusted inside source or a test of some kind conducted on the silicon) and be relaying a fact, or he might be making an assumption and presenting an opinion. Or possibly a bit of both (e.g. it's definitely TSMC but on their 55nm edram friendly process). I don't know how you would know the node for sure without using an electron microscope, but then again I might just be being very naive.

How would one know the process node just by looking at a die shot? Assuming it doesn't say on the chip, of course.

In fact this is a much stronger data point than the VLIW5 assumption for instance or the amount of EDRAM. So we know the process, we can measure and calculate densities directly. There is no question that it comes out to 320SPUs assuming reasonably close relation to AMDs other products.

I think SRAM is the strongest piece of evidence, as it represents how the chip actually functions, but I guess we'll just have to agree to disagree and all that.

But this is a discussion forum, and people will discuss even in the absence of useful data. And i'm fine with that, as long as I'm not implied to be part of some "consensus".

That's fine, you shouldn't feel your name is attached to something that you don't agree with.

function · Jun 21, 2013

Rodéric said:
Well, don't forget the Wii U has 1Gio of memory for games alone, whereas the PS360 have 512Mio minus what the OS takes.
Also the X360 CPU should really be considered a 6 cores in-order (that part matters a lot performance wise) 1.6GHz chip, and some of that is reserved.
(Not sure what Nintendo does for the Wii U.)

It looks good given the field of view for sure, but that might also come from "advanced" (if you forget they existed in hardware 10 years ago already ) GPU features such as Virtual Texturing, that would save even more memory (at the expense of bandwidth).

It is probably beyond what a PS360 could do, but it's quite hard to tell for sure, and even harder to state why (memory ? computing power ? additional GPU features ?)

In your opinion would a 6450 with 8 ROPs, console clocks and no real bandwidth constraints be able to hang with the PS360 in a closed box, console environment?

SoreSpoon · Jun 21, 2013

function said:
The Wii U GPU is clocked some 24% faster than an A4-3300. It should also have more bandwidth thanks to the edram, very likely has double the ROPs of the A4-3300 (8 vs 4), and might even have double the TMUs. Oh and it's not running Windows, which probably helps a lot too.

Still looking very good for 160 shaders IMO.

The Wii U is using an older architecture, though. A 24% increase in clocks wouldn't make up for that.

Though, it's not like they can be compared directly either way.

I don't know. I just really can't see 160SPs matching current-gen, especially if some form of GPGPU is being used to make up for the pathetic CPU.

function · Jun 21, 2013

SoreSpoon said:
The Wii U is using an older architecture, though. A 24% increase in clocks wouldn't make up for that.

Though, it's not like they can be compared directly either way.

I don't know. I just really can't see 160SPs matching current-gen, especially if some form of GPGPU is being used to make up for the pathetic CPU.

That's an interesting point, might be good to track down some VLIW5 benchmarks from across GPU generations. Although the Wii U might not strictly adhere to any particular generation (in the middle? some custom stuffs?).

Shifty Geezer · Jun 21, 2013

It's gonna be VLIW or GCN (and not GCN, obviously). Creating a whole new GPU architecture is costly and makes no sense - AMD already had viable options for Nintendo to pick from.

SoreSpoon · Jun 21, 2013

Personally, all things considered, I'm going with 240SPs. Seems to make the most sense.

function · Jun 21, 2013

Shifty Geezer said:
It's gonna be VLIW or GCN (and not GCN, obviously). Creating a whole new GPU architecture is costly and makes no sense - AMD already had viable options for Nintendo to pick from.

By "custom" I was thinking along the lines of "between 4xxx and 5xxx" with possibly some kind of GPU compute features added (like Xenos' memexport). But firmly rooted in the PC VLIW5 line of technology.

Exophase · Jun 21, 2013

Rodéric said:
Also the X360 CPU should really be considered a 6 cores in-order (that part matters a lot performance wise) 1.6GHz chip, and some of that is reserved.

I don't agree with this. It can execute an instruction every cycle for a single thread at 3.2GHz. The threaded can be executed round-robin but that's not the only mode.

It's a terrible 3.2GHz CPU though. Being in-order is only the tip of the iceberg.

function · Jun 21, 2013

SoreSpoon said:
Personally, all things considered, I'm going with 240SPs. Seems to make the most sense.

Don't think that could be made to fit with the 4 SIMD blocks and the number of SRAM banks in each SIMD array I'm afraid. At this point only 160 (or 320 if you're confident about transistor density and ignore the sram issue) fit the die shot.

SoreSpoon · Jun 21, 2013

function said:
Don't think that could be made to fit with the 4 SIMD blocks and the number of SRAM banks in each SIMD array I'm afraid. At this point only 160 (or 320 if you're confident about transistor density and ignore the sram issue) fit the die shot.

You can't really say that without knowing what process is used. You're assuming 55nm. If it's actually 45 or 40nm, it might be possible.

Wii U hardware discussion and investigation *rename

Grall

Invisible Member

swaaye

Entirely Suboptimal

function

None functional

(((interference)))

Entropy

TheLump

function

None functional

Entropy

pc999

Rodéric

a.k.a. Ingenu

function

None functional

function

None functional

SoreSpoon

function

None functional

Shifty Geezer

uber-Troll!

SoreSpoon

function

None functional

Exophase

function

None functional

SoreSpoon

Similar threads