Wii U hardware discussion and investigation *rename

Discussion in 'Console Technology' started by TheAlSpark, Jul 29, 2011.

Thread Status:
Not open for further replies.
  1. DRS

    DRS
    Newcomer

    Joined:
    May 22, 2009
    Messages:
    135
    Likes Received:
    0
    I'm not sure how to read the framerate test, but if clip 1 has 5471 frames of which 29% teared on X360, it produced 3884 unique frames. The Wii U on the other hand has 5223 unique frames.Given the average FPS and frames, this clip is 207 seconds on X360 and 192 seconds on Wii U.

    So, does this mean that X360 was able to produce 3884 frames in 207 seconds and the Wii U 5223 frames in 192 seconds (which indicates ~1.4 times more power)?
     
  2. Shifty Geezer

    Shifty Geezer uber-Troll!
    Moderator Legend

    Joined:
    Dec 7, 2004
    Messages:
    44,106
    Likes Received:
    16,898
    Location:
    Under my bridge
    A torn frame represent a partial frame drop, not a complete frame as you are counting. If the tear happens in the top 10% of the frame, you are losing 10% of the unique pixels of the new frame. If it happens in the middle, you are losing half. Only completely dropped frames count as you are counting them. If we take a random 33% tear position, one third of 29% of frames are not completely drawn. That means 1/3 of 29% is the difference, or about 10% - Wii U would be drawing about 10% more pixels.

    I'm not even sure that's right, because the non-drawn part of the framebuffer is still rendered and displayed the next frame. It's the movement of the tear point that determines how much is being lost. Dropped pixels is not something that can be calculated without recording every tear position.
     
  3. Entropy

    Veteran

    Joined:
    Feb 8, 2002
    Messages:
    3,360
    Likes Received:
    1,377
    One point was that using multi platform ports to make statements about the WiiU GPU specifically is foolish. Neither advances in programmability nor in raw computational power has any opportunity to play a role, unless the game is recoded to be something new from a graphics point of view. There are bottlenecks in other areas that most likely will hamper performance on the WiiU, which are hard to avoid for titles that target PS360.

    Another point is that Single Figure of Merit thinking will trip you up, and sharply limit your understanding in architectural discussions. Basically, anyone who makes a statement along the lines of "console X is 30% more powerful than console Y", demonstrates their cluelessness. And when the Single Figure of Merit of the day in forum wars is GPU marketing FLOPS, the one thing that is most decidedly NOT measured by WiiU multi platform ports, the WiiU discussion specifically gets both embarassing and bizarre.

    I'm still hoping for some "best practices" input from developers on how to make the most out of the WiiU architecture, maybe even with some comments on how this compares to the other consoles.
     
  4. DRS

    DRS
    Newcomer

    Joined:
    May 22, 2009
    Messages:
    135
    Likes Received:
    0
    I'm not sure. Though reevaluating this makes me think they reported the unique frames per second. That boils down to a ~150 second clip in both PS3 and X360 cases. Look at it again when I get home
     
  5. function

    function None functional
    Legend

    Joined:
    Mar 27, 2003
    Messages:
    5,854
    Likes Received:
    4,411
    Location:
    Wrong thread
    Not on LoT, but the ever-super Digital Foundry have now delivered the goods!

    http://www.eurogamer.net/articles/digitalfoundry-splinter-cell-blacklist-face-off

    It's a mixed bag for the Wii U, with some good news and some bad news, and a pattern of strengths and weaknesses (probably) continuing to emerge (maybe).
     
    #5385 function, Aug 27, 2013
    Last edited by a moderator: Aug 28, 2013
  6. function

    function None functional
    Legend

    Joined:
    Mar 27, 2003
    Messages:
    5,854
    Likes Received:
    4,411
    Location:
    Wrong thread
    Looking at the tests for Splinter Cell, it seems like biggest handicap isn't the lack of TMUs or even the CPU, it's the lack of fast storage for streaming - something that having more main ram can't necessarily make up for. The 360 Arcade has thrown the Wii U a lifeline in terms of multiplatform games.

    God knows how screwed the Wii U would be with a fast moving, free roaming sandbox game like GTA V that uses mandatory installs for streaming both textures and geometry.

    Regarding the TMUs, it seems possible now that a number of port issues could be down to simply not having enough of them. Issues like missing vegetation that used transparencies, frame rate drops when there were lots of transparent texture overdraw (fire and smoke in Blops), a reduction in motion blur passes for Tekken, less than ideal aniso, and stuff and things like that could have been due to the performance impact of lots of samples required in a short period vs not so many texture mapping units. Hard to say that any one of those things is down to having 8 TMUs but maybe now there's a pattern starting to emerge?

    Other than the 8 TMUs thing, and possibly a slight disadvantage with some BW heavy ROP action vs Xenos (or possibly not) it does seem like the Wii U GPU should be better though. Which is cool, just as long as you don't dwell on just how incredibly ancient Xenos is, and the fact that 4Bone are launching in a couple of months.
     
  7. Cyan

    Cyan orange
    Legend

    Joined:
    Apr 24, 2007
    Messages:
    9,734
    Likes Received:
    3,460
    Sometimes there are unexpected strengths to them... As you can read in this article an almost blind little girl can play videogames because she is using the Wii U's tablet to play, otherwise she can't see what's happening on the TV screen at all.

    http://www.zenspath.com/home/2013/8...my-daughter-play-her-first-game-and-shes.html

    p.s. if this is off-topic, I don't mind having mods moving the post, but I didn't know where to add the news tbh.
     
  8. function

    function None functional
    Legend

    Joined:
    Mar 27, 2003
    Messages:
    5,854
    Likes Received:
    4,411
    Location:
    Wrong thread
    darkblu's rather excellent research into how fast a PPC 750 CL was at floating point operations compared to a variety of other CPUs turned up some surprising results (to me anyway) and showed that the 750 CL was faster than Bobcat clock for clock and also that the PPE is a bit of a beast (at least at this particular synthetic bench). His results can be found here:

    http://www.neogaf.com/forum/showpost.php?p=50767125&postcount=3756

    I thought it would be interesting to look for more general information on processor performance, in particular int performance.

    NBench is a single threaded benchmark from back before time began. Here is the wikipedia page, followed by the Nbench results that someone has collated over many years (there are lots):

    http://en.wikipedia.org/wiki/NBench
    http://www.tux.org/~mayer/linux/results2.html

    Ignoring for a moment the differences in Linux version and compiler version and stuff (which do sometimes seem to result in some noteworthy differences), and just focusing on int performance (there doesn't seem to be any SIMD optimised going on given the benchmark's age), there are some interesting results. Here are just a few:

    (These are just the int results, all data on Linux and gcc ver cut out to make easier to read, and no I don't know how to even tab):

    Intel Atom CPU N270 (1.60GHz) 7.267
    Dual Intel Atom N270 @ 1.60GHz 9.094 (<--WAT)
    4 CPU Intel Atom 330 @1.60GHz [64bit mode] 8.725
    4 CPU Intel Atom 330 @1.60GHz [32bit mode] 8.422
    Dual Cell BE (Sony PS3) 3192MHz 6.803
    IBM Broadway Nintendo Wii 750CL 729MHz 5.863

    There are so many variables that no doubt you can't read anything definitive into these results, but from the limited data available, it appears that a 1.25 gHz 750 CL might very well outperform a 3.2 gHz PPE on the int portion of this (admittedly single threaded) benchmark. Unless I'm reading this wrong, this makes the 750 CL look pretty mean compared to other "lightweight" CPU cores and a good choice given the kind of die area and power consumption that Nintendo were looking at.

    An even bigger surprise is possibly that Atom is smashing the twice-as-fast PPE though. I mean, I didn't see that coming. And both of those CPU's get big gains from multithreaded too, so neither can claim the single-threaded-benchmark handicap over the other.

    Could anyone who knows about this stuff put these results into any kind of perspective?
     
  9. DRS

    DRS
    Newcomer

    Joined:
    May 22, 2009
    Messages:
    135
    Likes Received:
    0
    To get back on it, yes you are right. Your explanation applies when a number of subsequent frames is torn. Thanks for pointing that out. I assumed a well distributed spread of torn frames across the entire set of frames, thus, for example a pattern of 2 good frames and 1 torn frame. In that case my calculation is correct if I'm not mistaken. Glancing at the much better Digital Foundry report it looks like both situations apply.

    No you are right, the difference of the position of the tear line tells how much extra time it required to render compared to the baseline. However, if the next frame 'fixes' the torn situation, the 3 frames displayed count as 2 unique ones.

    Agreed, I wonder if they stream these textures from main memory or eDram. The former would be more plausible I guess.

    'X' is expected to be such a game. And features nextgen-ish shading. Some people refer to this as impossible on PS360.

    Well one of the things I noticed in DF's PC part of the Splinter Cell review is that it does vertex processing on multiple CPU cores. Something the Wii U won't be good at. This could be an explanation for slowdowns as well. Then rendering smoke particle on CPU may result in bad performance too. Perhaps far fetched? Anyway, the original Wii didn't suffer from transparency issues (never tested it, but to me the documentation seemed to imply that the pixel engine performs a framebuffer blend per clock per ROP).

    Yes it should, its a pitty it doesn't really show though:) Lower ROP performance could be because of a lower than originally thought clock.
     
  10. function

    function None functional
    Legend

    Joined:
    Mar 27, 2003
    Messages:
    5,854
    Likes Received:
    4,411
    Location:
    Wrong thread
    With the streaming issues I'm meaning streaming from the optical drive and into main memory. It's not just the lower peak transfer rate from optical, it's the much greater access latency (laser head seek and rotational latency) of optical. When you add the two together it's quite possible that accessing data from an optical disk is several times slower than access the same data installed from the same disk but to a mechanical HDD.

    Unless X is built around a mandatory install on a much faster device than a bluray drive it's unfortunately not possible for the game to go through as much unique, streamed data as fast as a game like GTA 5 can. Greater repetition of what you already have in memory and slower transitions to new environmental assets are what you'll see as a result ... or what you would see, it there was anything to compare it to directly (but there isn't).

    X will look great, but it won't be able to do what GTA 5 can do. Even BF3 and Splinter Cell couldn't bank on a mandatory install for anything that needed it, as they also had to have a disk only version.

    In the future Wii U games could mandate an install on a flash drive of a certain speed over a USB port. That would actually be cheap and pretty awesome upgrade for the Wii U, and if you mandated flash you could potentially leapfrog the PS360 and really make use of the 1GB of main ram from streaming, free roaming environments. Think a $20 add-on like the N64 memory extension.

    A fast, tiny little USB thumb drive could really do wonders for the Wii U - it would be the most accessible upgrade path ever. Probably won't happen though, user base is small enough as it is without fragmenting it.

    I think there might be something in the CPU think. Aggressive culling during high LOD cutscenes, and lots of ray casting, path finding and collision detection during major firefights might be flop heavy and SIMD friendly work that simply overwhelms Espresso.

    When I was referring to transparencies I was meaning at the texture sampling stage rather than at the ROP stage btw. If you're overdrawing a single pixel with lots of simple, transparent textured polys then you should end up with a high ratio of texture reads to ALU operations. I was thinking that maybe that hurts an 8 TMU console more than a 16 or 24 TMU console.

    I wonder if Latte could use asymmetrical clocks for different parts of the GPU? Could marcan's 550 mhz figure not be giving us the entire picture of speeds across the GPU and edram?
     
  11. Grall

    Grall Invisible Member
    Legend

    Joined:
    Apr 14, 2002
    Messages:
    10,801
    Likes Received:
    2,176
    Location:
    La-la land
    A flashdrive would easily fit inside a game disc case, and flash is cheap enough these days to not actually impact the sticker price either, at least if Nintendo was willing to take a small revenue hit for the good of its own flagship console. The wuu has loads of USB ports, users could just have the flashdrive plugged in all the time.

    This is actually a great idea, which means it's 100% certainty Nintendo won't EVER DO IT.

    You really think wuu has less TMUs than ancient PS360? It would be a strange move to do something like that, it would risk tripping up game portability between platforms for example. But then again, Nintendo hasn't ever really cared much about third parties, so who the hell knows with them... :razz:

    No evidence exists that it does AFAIK, so all that remains is baseless speculation...
     
  12. DRS

    DRS
    Newcomer

    Joined:
    May 22, 2009
    Messages:
    135
    Likes Received:
    0
    Ah I didn't get that, my excuses. X doesn't show a large variety in textures which enforces your statement.

    That would be a great idea. If it supports USB3 ofcourse.

    I'm not sure about that. I think considering a 160SP GPU, a simple tri with up to 4 texture samples per pixel could have a perfect balance between TMUs and ROPS. But it wouldn't be possible to do stuff like TEX1*TEX2 + TEX3 in one go.

    I never got how he retrieved those numbers anyway. He must have had access to a devkit if you ask me (unless Nintendo actually ships with a kernel that has debug symbols in it). The Wii devkit defines some constants for busspeed and multiplier, perhaps Wii U devkit too? Still not sure how how he determined the GPU clock then.

    (* Perhaps I degraded myself to the stone age again, since I think a lot of noobish stuff ofcourse)
     
  13. AstoundingHolmes

    Newcomer

    Joined:
    Jan 31, 2013
    Messages:
    118
    Likes Received:
    0
    Was it ERP who mentioned that he had overheard some higher up say the WiiU CPU was something like 90% of the PS3? Perhaps they were talking WiiU core to PPE core FP performance in a similar test? The data that darkblu gather that function posted might seem to corroborate that (~86%).
     
  14. DRS

    DRS
    Newcomer

    Joined:
    May 22, 2009
    Messages:
    135
    Likes Received:
    0
    This test shows Broadway, not Espresso. So Wii CPU is 86% of PS3 PPE. If we blindly interpolate this figure based on Marcan's 1.2Ghz speed findings, an Espresso core could be ~1.4 times faster than PS3 PPE core. Seems unlikely that it actually is, but it could be.
     
  15. Grall

    Grall Invisible Member
    Legend

    Joined:
    Apr 14, 2002
    Messages:
    10,801
    Likes Received:
    2,176
    Location:
    La-la land
    Hurm, PPE has 128-bit SIMD @3.2GHz, espresso has paired doubles (64 bit) @~1.2GHz. Doesn't quite add up, does it. :)

    Of course, integer performance is a different matter. Espresso should stomp PPE handily there.
     
  16. DRS

    DRS
    Newcomer

    Joined:
    May 22, 2009
    Messages:
    135
    Likes Received:
    0
    Agreed, (Dark)Blu's test points out that PPE is almost 5 times faster regarding FP. Which is in line with the numbers you mention. [EDIT] Function posted the NBENCH integer results, that show 729Mhz Broadway performs only slightly worse than a 3200Mhz PPE
     
  17. Exophase

    Veteran

    Joined:
    Mar 25, 2010
    Messages:
    2,406
    Likes Received:
    430
    Location:
    Cleveland, OH
    When looking at these comparisons just remember that they're using pretty old versions of GCC on a uarch that very desperately relies on good code ordering and avoiding a lot of problem scenarios, and needs the SIMD instructions to be explicitly utilized. I don't know the quality of whatever commercial games used (IBM's compiler? I have heard good things about it) but I do have every confidence that this version of GCC would have been very bad at it and had little to no vectorization. And game developers very much did fall back on hand-coded assembly when push came to shove.

    This is also ignoring that nbench is a pointless and pretty awful set of micro-benchmarks.

    So I don't really think this is all that illustrative of game performance.
     
  18. function

    function None functional
    Legend

    Joined:
    Mar 27, 2003
    Messages:
    5,854
    Likes Received:
    4,411
    Location:
    Wrong thread
    Are you talking about the 750 or PPE here?

    No, that's not sarcasm btw, it's me being clueless. I would have thought that the PPE would rely more on good code ordering than the 750, which is sort-of OoOE (a bit). But maybe not, I dunno ...

    I was assuming no vectorisation whatsoever in the Nbench results, which is why I only looked at int bencmarks (for whatever that might be worth).

    You're such a killjoy! :p

    Couldn't be much different to darkblu's interesting flop micro benchmark though? Hardly comprehensive but better than nothing surely? Or maybe not ... ?
     
  19. Exophase

    Veteran

    Joined:
    Mar 25, 2010
    Messages:
    2,406
    Likes Received:
    430
    Location:
    Cleveland, OH
    Talking about the PPE.

    This CPU has integer SIMD too.
     
  20. function

    function None functional
    Legend

    Joined:
    Mar 27, 2003
    Messages:
    5,854
    Likes Received:
    4,411
    Location:
    Wrong thread
    Okay, thought so. Thanks.

    Oh boy, of course it does. I messed up.

    http://en.wikipedia.org/wiki/AltiVec

    So does the 750 CL's paired singles allow what is effectively 2 x 32-bit int simd?
     
Loading...
Thread Status:
Not open for further replies.

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...