Wii U hardware discussion and investigation *rename

Status
Not open for further replies.
Regarding the LoT comparison. Actually pretty impressive (knowing what we know about the WiiU's innards) to maintain ~same framerates as the other two whilst having v-sync locked. Shame about the loading times but I'd be interested to see if a DD version solves that? Yeah they've left the texture pack out on 360 but if anything that would have decreased the performance wouldn't it? Makes it an unfair comparison though, yes.

I'm not sure how to read the framerate test, but if clip 1 has 5471 frames of which 29% teared on X360, it produced 3884 unique frames. The Wii U on the other hand has 5223 unique frames.Given the average FPS and frames, this clip is 207 seconds on X360 and 192 seconds on Wii U.

So, does this mean that X360 was able to produce 3884 frames in 207 seconds and the Wii U 5223 frames in 192 seconds (which indicates ~1.4 times more power)?
 
A torn frame represent a partial frame drop, not a complete frame as you are counting. If the tear happens in the top 10% of the frame, you are losing 10% of the unique pixels of the new frame. If it happens in the middle, you are losing half. Only completely dropped frames count as you are counting them. If we take a random 33% tear position, one third of 29% of frames are not completely drawn. That means 1/3 of 29% is the difference, or about 10% - Wii U would be drawing about 10% more pixels.

I'm not even sure that's right, because the non-drawn part of the framebuffer is still rendered and displayed the next frame. It's the movement of the tear point that determines how much is being lost. Dropped pixels is not something that can be calculated without recording every tear position.
 
Different systems have different architectures, therefore strengths and weaknesses, running the same code on them will end up with different results, it's so obvious I didn't see the point to say it, but you're a most likely right to write it down for everyone to learn.

One point was that using multi platform ports to make statements about the WiiU GPU specifically is foolish. Neither advances in programmability nor in raw computational power has any opportunity to play a role, unless the game is recoded to be something new from a graphics point of view. There are bottlenecks in other areas that most likely will hamper performance on the WiiU, which are hard to avoid for titles that target PS360.

Another point is that Single Figure of Merit thinking will trip you up, and sharply limit your understanding in architectural discussions. Basically, anyone who makes a statement along the lines of "console X is 30% more powerful than console Y", demonstrates their cluelessness. And when the Single Figure of Merit of the day in forum wars is GPU marketing FLOPS, the one thing that is most decidedly NOT measured by WiiU multi platform ports, the WiiU discussion specifically gets both embarassing and bizarre.

I'm still hoping for some "best practices" input from developers on how to make the most out of the WiiU architecture, maybe even with some comments on how this compares to the other consoles.
 
A torn frame represent a partial frame drop, not a complete frame as you are counting. If the tear happens in the top 10% of the frame, you are losing 10% of the unique pixels of the new frame. If it happens in the middle, you are losing half. Only completely dropped frames count as you are counting them. If we take a random 33% tear position, one third of 29% of frames are not completely drawn. That means 1/3 of 29% is the difference, or about 10% - Wii U would be drawing about 10% more pixels.

I'm not even sure that's right, because the non-drawn part of the framebuffer is still rendered and displayed the next frame. It's the movement of the tear point that determines how much is being lost. Dropped pixels is not something that can be calculated without recording every tear position.

I'm not sure. Though reevaluating this makes me think they reported the unique frames per second. That boils down to a ~150 second clip in both PS3 and X360 cases. Look at it again when I get home
 
Last edited by a moderator:
Looking at the tests for Splinter Cell, it seems like biggest handicap isn't the lack of TMUs or even the CPU, it's the lack of fast storage for streaming - something that having more main ram can't necessarily make up for. The 360 Arcade has thrown the Wii U a lifeline in terms of multiplatform games.

God knows how screwed the Wii U would be with a fast moving, free roaming sandbox game like GTA V that uses mandatory installs for streaming both textures and geometry.

Regarding the TMUs, it seems possible now that a number of port issues could be down to simply not having enough of them. Issues like missing vegetation that used transparencies, frame rate drops when there were lots of transparent texture overdraw (fire and smoke in Blops), a reduction in motion blur passes for Tekken, less than ideal aniso, and stuff and things like that could have been due to the performance impact of lots of samples required in a short period vs not so many texture mapping units. Hard to say that any one of those things is down to having 8 TMUs but maybe now there's a pattern starting to emerge?

Other than the 8 TMUs thing, and possibly a slight disadvantage with some BW heavy ROP action vs Xenos (or possibly not) it does seem like the Wii U GPU should be better though. Which is cool, just as long as you don't dwell on just how incredibly ancient Xenos is, and the fact that 4Bone are launching in a couple of months.
 
Different systems have different architectures, therefore strengths and weaknesses, running the same code on them will end up with different results, it's so obvious I didn't see the point to say it, but you're a most likely right to write it down for everyone to learn.
Sometimes there are unexpected strengths to them... As you can read in this article an almost blind little girl can play videogames because she is using the Wii U's tablet to play, otherwise she can't see what's happening on the TV screen at all.

http://www.zenspath.com/home/2013/8...my-daughter-play-her-first-game-and-shes.html

p.s. if this is off-topic, I don't mind having mods moving the post, but I didn't know where to add the news tbh.
 
darkblu's rather excellent research into how fast a PPC 750 CL was at floating point operations compared to a variety of other CPUs turned up some surprising results (to me anyway) and showed that the 750 CL was faster than Bobcat clock for clock and also that the PPE is a bit of a beast (at least at this particular synthetic bench). His results can be found here:

http://www.neogaf.com/forum/showpost.php?p=50767125&postcount=3756

I thought it would be interesting to look for more general information on processor performance, in particular int performance.

NBench is a single threaded benchmark from back before time began. Here is the wikipedia page, followed by the Nbench results that someone has collated over many years (there are lots):

http://en.wikipedia.org/wiki/NBench
http://www.tux.org/~mayer/linux/results2.html

Ignoring for a moment the differences in Linux version and compiler version and stuff (which do sometimes seem to result in some noteworthy differences), and just focusing on int performance (there doesn't seem to be any SIMD optimised going on given the benchmark's age), there are some interesting results. Here are just a few:

(These are just the int results, all data on Linux and gcc ver cut out to make easier to read, and no I don't know how to even tab):

Intel Atom CPU N270 (1.60GHz) 7.267
Dual Intel Atom N270 @ 1.60GHz 9.094 (<--WAT)
4 CPU Intel Atom 330 @1.60GHz [64bit mode] 8.725
4 CPU Intel Atom 330 @1.60GHz [32bit mode] 8.422
Dual Cell BE (Sony PS3) 3192MHz 6.803
IBM Broadway Nintendo Wii 750CL 729MHz 5.863

There are so many variables that no doubt you can't read anything definitive into these results, but from the limited data available, it appears that a 1.25 gHz 750 CL might very well outperform a 3.2 gHz PPE on the int portion of this (admittedly single threaded) benchmark. Unless I'm reading this wrong, this makes the 750 CL look pretty mean compared to other "lightweight" CPU cores and a good choice given the kind of die area and power consumption that Nintendo were looking at.

An even bigger surprise is possibly that Atom is smashing the twice-as-fast PPE though. I mean, I didn't see that coming. And both of those CPU's get big gains from multithreaded too, so neither can claim the single-threaded-benchmark handicap over the other.

Could anyone who knows about this stuff put these results into any kind of perspective?
 
Shifty Geezer said:
A torn frame represent a partial frame drop, not a complete frame as you are counting. If the tear happens in the top 10% of the frame, you are losing 10% of the unique pixels of the new frame. If it happens in the middle, you are losing half. Only completely dropped frames count as you are counting them. If we take a random 33% tear position, one third of 29% of frames are not completely drawn. That means 1/3 of 29% is the difference, or about 10% - Wii U would be drawing about 10% more pixels.
To get back on it, yes you are right. Your explanation applies when a number of subsequent frames is torn. Thanks for pointing that out. I assumed a well distributed spread of torn frames across the entire set of frames, thus, for example a pattern of 2 good frames and 1 torn frame. In that case my calculation is correct if I'm not mistaken. Glancing at the much better Digital Foundry report it looks like both situations apply.

Shifty Geezer said:
I'm not even sure that's right, because the non-drawn part of the framebuffer is still rendered and displayed the next frame. It's the movement of the tear point that determines how much is being lost. Dropped pixels is not something that can be calculated without recording every tear position.
No you are right, the difference of the position of the tear line tells how much extra time it required to render compared to the baseline. However, if the next frame 'fixes' the torn situation, the 3 frames displayed count as 2 unique ones.

Function said:
Looking at the tests for Splinter Cell, it seems like biggest handicap isn't the lack of TMUs or even the CPU, it's the lack of fast storage for streaming - something that having more main ram can't necessarily make up for. The 360 Arcade has thrown the Wii U a lifeline in terms of multiplatform games.
Agreed, I wonder if they stream these textures from main memory or eDram. The former would be more plausible I guess.

Function said:
God knows how screwed the Wii U would be with a fast moving, free roaming sandbox game like GTA V that uses mandatory installs for streaming both textures and geometry.
'X' is expected to be such a game. And features nextgen-ish shading. Some people refer to this as impossible on PS360.

Function said:
Regarding the TMUs, it seems possible now that a number of port issues could be down to simply not having enough of them. Issues like missing vegetation that used transparencies, frame rate drops when there were lots of transparent texture overdraw (fire and smoke in Blops), a reduction in motion blur passes for Tekken, less than ideal aniso, and stuff and things like that could have been due to the performance impact of lots of samples required in a short period vs not so many texture mapping units. Hard to say that any one of those things is down to having 8 TMUs but maybe now there's a pattern starting to emerge?
Well one of the things I noticed in DF's PC part of the Splinter Cell review is that it does vertex processing on multiple CPU cores. Something the Wii U won't be good at. This could be an explanation for slowdowns as well. Then rendering smoke particle on CPU may result in bad performance too. Perhaps far fetched? Anyway, the original Wii didn't suffer from transparency issues (never tested it, but to me the documentation seemed to imply that the pixel engine performs a framebuffer blend per clock per ROP).

Function said:
Other than the 8 TMUs thing, and possibly a slight disadvantage with some BW heavy ROP action vs Xenos (or possibly not) it does seem like the Wii U GPU should be better though. Which is cool, just as long as you don't dwell on just how incredibly ancient Xenos is, and the fact that 4Bone are launching in a couple of months.
Yes it should, its a pitty it doesn't really show though:) Lower ROP performance could be because of a lower than originally thought clock.
 
Agreed, I wonder if they stream these textures from main memory or eDram. The former would be more plausible I guess.

With the streaming issues I'm meaning streaming from the optical drive and into main memory. It's not just the lower peak transfer rate from optical, it's the much greater access latency (laser head seek and rotational latency) of optical. When you add the two together it's quite possible that accessing data from an optical disk is several times slower than access the same data installed from the same disk but to a mechanical HDD.

'X' is expected to be such a game. And features nextgen-ish shading. Some people refer to this as impossible on PS360.

Unless X is built around a mandatory install on a much faster device than a bluray drive it's unfortunately not possible for the game to go through as much unique, streamed data as fast as a game like GTA 5 can. Greater repetition of what you already have in memory and slower transitions to new environmental assets are what you'll see as a result ... or what you would see, it there was anything to compare it to directly (but there isn't).

X will look great, but it won't be able to do what GTA 5 can do. Even BF3 and Splinter Cell couldn't bank on a mandatory install for anything that needed it, as they also had to have a disk only version.

In the future Wii U games could mandate an install on a flash drive of a certain speed over a USB port. That would actually be cheap and pretty awesome upgrade for the Wii U, and if you mandated flash you could potentially leapfrog the PS360 and really make use of the 1GB of main ram from streaming, free roaming environments. Think a $20 add-on like the N64 memory extension.

A fast, tiny little USB thumb drive could really do wonders for the Wii U - it would be the most accessible upgrade path ever. Probably won't happen though, user base is small enough as it is without fragmenting it.

Well one of the things I noticed in DF's PC part of the Splinter Cell review is that it does vertex processing on multiple CPU cores. Something the Wii U won't be good at. This could be an explanation for slowdowns as well. Then rendering smoke particle on CPU may result in bad performance too. Perhaps far fetched? Anyway, the original Wii didn't suffer from transparency issues (never tested it, but to me the documentation seemed to imply that the pixel engine performs a framebuffer blend per clock per ROP).

I think there might be something in the CPU think. Aggressive culling during high LOD cutscenes, and lots of ray casting, path finding and collision detection during major firefights might be flop heavy and SIMD friendly work that simply overwhelms Espresso.

When I was referring to transparencies I was meaning at the texture sampling stage rather than at the ROP stage btw. If you're overdrawing a single pixel with lots of simple, transparent textured polys then you should end up with a high ratio of texture reads to ALU operations. I was thinking that maybe that hurts an 8 TMU console more than a 16 or 24 TMU console.

Yes it should, its a pitty it doesn't really show though:) Lower ROP performance could be because of a lower than originally thought clock.

I wonder if Latte could use asymmetrical clocks for different parts of the GPU? Could marcan's 550 mhz figure not be giving us the entire picture of speeds across the GPU and edram?
 
Probably won't happen though, user base is small enough as it is without fragmenting it.
A flashdrive would easily fit inside a game disc case, and flash is cheap enough these days to not actually impact the sticker price either, at least if Nintendo was willing to take a small revenue hit for the good of its own flagship console. The wuu has loads of USB ports, users could just have the flashdrive plugged in all the time.

This is actually a great idea, which means it's 100% certainty Nintendo won't EVER DO IT.

I was thinking that maybe that hurts an 8 TMU console more than a 16 or 24 TMU console.
You really think wuu has less TMUs than ancient PS360? It would be a strange move to do something like that, it would risk tripping up game portability between platforms for example. But then again, Nintendo hasn't ever really cared much about third parties, so who the hell knows with them... :p

I wonder if Latte could use asymmetrical clocks for different parts of the GPU?
No evidence exists that it does AFAIK, so all that remains is baseless speculation...
 
With the streaming issues I'm meaning streaming from the optical drive and into main memory. It's not just the lower peak transfer rate from optical, it's the much greater access latency (laser head seek and rotational latency) of optical. When you add the two together it's quite possible that accessing data from an optical disk is several times slower than access the same data installed from the same disk but to a mechanical HDD.
Ah I didn't get that, my excuses. X doesn't show a large variety in textures which enforces your statement.

In the future Wii U games could mandate an install on a flash drive of a certain speed over a USB port. That would actually be cheap and pretty awesome upgrade for the Wii U, and if you mandated flash you could potentially leapfrog the PS360 and really make use of the 1GB of main ram from streaming, free roaming environments. Think a $20 add-on like the N64 memory extension.
That would be a great idea. If it supports USB3 ofcourse.

When I was referring to transparencies I was meaning at the texture sampling stage rather than at the ROP stage btw. If you're overdrawing a single pixel with lots of simple, transparent textured polys then you should end up with a high ratio of texture reads to ALU operations. I was thinking that maybe that hurts an 8 TMU console more than a 16 or 24 TMU console.
I'm not sure about that. I think considering a 160SP GPU, a simple tri with up to 4 texture samples per pixel could have a perfect balance between TMUs and ROPS. But it wouldn't be possible to do stuff like TEX1*TEX2 + TEX3 in one go.

I wonder if Latte could use asymmetrical clocks for different parts of the GPU? Could marcan's 550 mhz figure not be giving us the entire picture of speeds across the GPU and edram?
I never got how he retrieved those numbers anyway. He must have had access to a devkit if you ask me (unless Nintendo actually ships with a kernel that has debug symbols in it). The Wii devkit defines some constants for busspeed and multiplier, perhaps Wii U devkit too? Still not sure how how he determined the GPU clock then.

(* Perhaps I degraded myself to the stone age again, since I think a lot of noobish stuff ofcourse)
 
Was it ERP who mentioned that he had overheard some higher up say the WiiU CPU was something like 90% of the PS3? Perhaps they were talking WiiU core to PPE core FP performance in a similar test? The data that darkblu gather that function posted might seem to corroborate that (~86%).
 
Was it ERP who mentioned that he had overheard some higher up say the WiiU CPU was something like 90% of the PS3? Perhaps they were talking WiiU core to PPE core FP performance in a similar test? The data that darkblu gather that function posted might seem to corroborate that (~86%).

This test shows Broadway, not Espresso. So Wii CPU is 86% of PS3 PPE. If we blindly interpolate this figure based on Marcan's 1.2Ghz speed findings, an Espresso core could be ~1.4 times faster than PS3 PPE core. Seems unlikely that it actually is, but it could be.
 
Hurm, PPE has 128-bit SIMD @3.2GHz, espresso has paired doubles (64 bit) @~1.2GHz. Doesn't quite add up, does it. :)

Of course, integer performance is a different matter. Espresso should stomp PPE handily there.
 
Hurm, PPE has 128-bit SIMD @3.2GHz, espresso has paired doubles (64 bit) @~1.2GHz. Doesn't quite add up, does it. :)

Of course, integer performance is a different matter. Espresso should stomp PPE handily there.
Agreed, (Dark)Blu's test points out that PPE is almost 5 times faster regarding FP. Which is in line with the numbers you mention. [EDIT] Function posted the NBENCH integer results, that show 729Mhz Broadway performs only slightly worse than a 3200Mhz PPE
 
When looking at these comparisons just remember that they're using pretty old versions of GCC on a uarch that very desperately relies on good code ordering and avoiding a lot of problem scenarios, and needs the SIMD instructions to be explicitly utilized. I don't know the quality of whatever commercial games used (IBM's compiler? I have heard good things about it) but I do have every confidence that this version of GCC would have been very bad at it and had little to no vectorization. And game developers very much did fall back on hand-coded assembly when push came to shove.

This is also ignoring that nbench is a pointless and pretty awful set of micro-benchmarks.

So I don't really think this is all that illustrative of game performance.
 
When looking at these comparisons just remember that they're using pretty old versions of GCC on a uarch that very desperately relies on good code ordering and avoiding a lot of problem scenarios, and needs the SIMD instructions to be explicitly utilized.

Are you talking about the 750 or PPE here?

No, that's not sarcasm btw, it's me being clueless. I would have thought that the PPE would rely more on good code ordering than the 750, which is sort-of OoOE (a bit). But maybe not, I dunno ...

I don't know the quality of whatever commercial games used (IBM's compiler? I have heard good things about it) but I do have every confidence that this version of GCC would have been very bad at it and had little to no vectorization. And game developers very much did fall back on hand-coded assembly when push came to shove.

I was assuming no vectorisation whatsoever in the Nbench results, which is why I only looked at int bencmarks (for whatever that might be worth).

This is also ignoring that nbench is a pointless and pretty awful set of micro-benchmarks.

You're such a killjoy! :p

Couldn't be much different to darkblu's interesting flop micro benchmark though? Hardly comprehensive but better than nothing surely? Or maybe not ... ?
 
Are you talking about the 750 or PPE here?

No, that's not sarcasm btw, it's me being clueless. I would have thought that the PPE would rely more on good code ordering than the 750, which is sort-of OoOE (a bit). But maybe not, I dunno ...

Talking about the PPE.

I was assuming no vectorisation whatsoever in the Nbench results, which is why I only looked at int bencmarks (for whatever that might be worth).

This CPU has integer SIMD too.
 
Status
Not open for further replies.
Back
Top