Wii U hardware discussion and investigation *rename

Discussion in 'Console Technology' started by TheAlSpark, Jul 29, 2011.

Thread Status:
Not open for further replies.
  1. shinobi

    Banned

    Joined:
    Jan 26, 2013
    Messages:
    64
    Likes Received:
    0
    thanks! really appreciate your time.
     
  2. shinobi

    Banned

    Joined:
    Jan 26, 2013
    Messages:
    64
    Likes Received:
    0
    thanks for the answer, and yea i asked again cause some guy was saying 550 gflops on neogaf and he was suppose to be some kind if tech expert where everybody was believing him.
     
  3. fellix

    fellix Hey, You!
    Veteran

    Joined:
    Dec 4, 2004
    Messages:
    3,503
    Likes Received:
    420
    Location:
    Varna, Bulgaria
    [​IMG]

    Scaled comparison of single banks from both EDRAM pools.
     
  4. Li Mu Bai

    Regular

    Joined:
    Oct 18, 2003
    Messages:
    540
    Likes Received:
    7
    Location:
    AZ
    I believe this is due to architectural differences, unfamiliarity with how to maximize/optimise the unorthodox Espresso chip configuration, immature library documentation & toolsets, optimal BW use, engine adaptation & porting etcetera. Also iirc, the most mature SDK was released in November of 2012. Essentially, without a somewhat extensive retooling PS3 & 360 ports will always suffer by comparison. I can assure you that 'X', Bayonetta 2, (may steal Nintendo's E3 show) 3D Mario, Zelda, Retro's project, (my personal system seller along with 'X') SMTXFE, as well as other proprietary software will not suffer from the aforementioned issues. The CPU is not a major bottleneck btw. Ancel, Shin'en, Frozenbyte, Gearbox, as well as my own sources haven't cited the CPU as a significant hurdle to extracting performance from the hardware. Even Pikmin will have no problem with transparencies, nor complex environmental geometry as will be shown in some of the aforementioned software.

    Stronger than current gen, these bottlenecks are grossly exaggerated. As in any system, engines must be tailored to exploit platform strengths while minimizing weaknesses. As I've said before, the CPU is indeed weaker than the 360's, but this is a heavily GPU reliant system. Also slower ram is unusable now? A truly functional tessellation unit, better texure IQ, 50%+ more dedicated system ram, advanced realtime lighting & shadowing capabilities, taken as a whole this bests the current generation. Albeit not by any vast margin. In these discussions I find we always discount the tablet's capabilities, rendering a totally unique game scene completely independent of the screen. Replete with shading & geometry.
     
    #4484 Li Mu Bai, Feb 5, 2013
    Last edited by a moderator: Feb 5, 2013
  5. Shifty Geezer

    Shifty Geezer uber-Troll!
    Moderator Legend

    Joined:
    Dec 7, 2004
    Messages:
    42,316
    Likes Received:
    13,915
    Location:
    Under my bridge
    This is a scary notion from DF:

     
  6. Li Mu Bai

    Regular

    Joined:
    Oct 18, 2003
    Messages:
    540
    Likes Received:
    7
    Location:
    AZ
    I had no idea the documentation was that dire, yikes! Though this does explain some of the porting issues...... this will definitely negatively impact 3rd party support.
     
    #4486 Li Mu Bai, Feb 5, 2013
    Last edited by a moderator: Feb 5, 2013
  7. Brad Grenz

    Brad Grenz Philosopher & Poet
    Veteran

    Joined:
    Mar 3, 2005
    Messages:
    2,531
    Likes Received:
    2
    Location:
    Oregon
    No wonder the official specs for Nintendo hardware never leak. All they give the devs is a black box!
     
  8. Gipsel

    Veteran

    Joined:
    Jan 4, 2010
    Messages:
    1,620
    Likes Received:
    264
    Location:
    Hamburg, Germany
    The small pool is probably just 2MB size in total. It carries simply more overhead.
    But have you noticed that the rightmost blocks of the small pool has a different size? It is not composed of 16x16 blocks, but is 18 wide. Just redundancy (But if that's the case, why doesn't the large pool show this? Should be more important there.) or some sign of parity/ECC protection?
     
    #4488 Gipsel, Feb 5, 2013
    Last edited by a moderator: Feb 5, 2013
  9. function

    function None functional
    Legend Veteran

    Joined:
    Mar 27, 2003
    Messages:
    5,191
    Likes Received:
    2,364
    Location:
    Wrong thread
    Oh well, there goes my "4 MB fast buffer" hypothesis. Probably some kind of cache (big high BW texture cache?) or there for BC.

    Nice spot!

    Perhaps there are redundant elements within the larger blocks, so no additional ones are needed? Or they just settle for 31.9 MB (for example) and round up to 32 when talking about it one the dev kits (if the OS or pad functions reserve some edram it's not like developers would be able to to tell anyway)?
     
  10. function

    function None functional
    Legend Veteran

    Joined:
    Mar 27, 2003
    Messages:
    5,191
    Likes Received:
    2,364
    Location:
    Wrong thread
    It's normal for newer architectures to be more efficient than older ones, but I'm wondering if it might still be easier to get high effective utilisation of Xenos than Latte.

    The 360 has more TMUs per vector unit, and a "texture data crossbar" (thanx B3D article) that should mean each vector unit has a higher maximal and average input of bilinear filtered texels available. No doubt the newer latte architecture is more efficient, but would this enough to overcome a big relative deficit of TMUs per vector unit and no crossbar?

    The same might be true for the ROPs affecting GPU utilisation too - on the 360 they can always run at full tilt, and there are more of them in relation to the number of vector units. The Wii U has relatively fewer ROPs and they are unlikely to be as efficient as the 360s "magic ROPs" embedded into the daughter die with effectively unlimited BW.

    Just to be clear, I'm talking about % utilisation and not about rawr powah where the Wii U should always be ahead (more shaders, higher clock). The assumption is that the Wii U should always be a lot more efficient, but maybe it's not always that clear cut.

    Edit: In other words, it might be easier to keep the Xenos ALUs busy, based on a very simplistic "paper specs" comparison. Or maybe not. Maybe someone with experience of the Xbox 360 and R7xx development can say?
     
    #4490 function, Feb 5, 2013
    Last edited by a moderator: Feb 5, 2013
  11. Exophase

    Veteran

    Joined:
    Mar 25, 2010
    Messages:
    2,406
    Likes Received:
    430
    Location:
    Cleveland, OH
    Some new comments from a Chipworks employee were posted on the NeoGAF thread:

    This resulted in lots of fanfare that'll probably spill over here - but I think there's one really serious misconception going around. When he says that this is custom he means that it's not using licensed hard macros. That doesn't mean that it's using a totally new GPU microarchitecture, and that information isn't something you can tell just by looking at the die. You don't need to have markings for an implementation of IP licensed in RTL form. For comparison, he also says Apple's A6 is completely custom; we know that Apple is licensing plenty of IP, like the GPU and media encode/decode.

    So we don't know anything about how similar or different it is from other GPUs. There's zero question that Nintendo is licensing the GPU from AMD: http://blogs.amd.com/play/2012/09/2...ertainment-with-proud-technology-partner-amd/ and that technology can be based on anything. There could be some customizations to fit Nintendo's needs but I can't fathom that AMD really designed a whole new GPU just for Wii U...
     
  12. swaaye

    swaaye Entirely Suboptimal
    Legend

    Joined:
    Mar 15, 2003
    Messages:
    8,549
    Likes Received:
    628
    Location:
    WI, USA
    What's with his excitement over embedded DRAM? Consoles have been using it since Gamecube and PS2.
     
  13. Grall

    Grall Invisible Member
    Legend

    Joined:
    Apr 14, 2002
    Messages:
    10,801
    Likes Received:
    2,174
    Location:
    La-la land
    Not sure there's a whole lot of custom logic in that GPU; GPUs are mostly coded using high-level RTL language; one of the few major exceptions was Nvidia's shader processors in 8800GTX series up until 580GTX.

    Looking at the die shot, apart from SRAM and DRAM arrays and I/Os it's almost entirely "sea of transistors" type logic, IE not custom layout...

    Also, I think Nintendo would be unrealistically lucky selling anywhere near 30 million wuu/yr. 15, maybe, but wuu looks more like a gamecube successor to me, which was what, 35 million total lifetime sales...? Seeing that post-Xmas sales curve crash doesn't inspire a lot of confidence in the console's success.

    Btw... Anyone spotted the ARM "starlet" successor core yet? Assuming it's there, of course. Might be near-impossible to find I suppose, considering how small it ought to be. It could masquerade as almost anything on that die. :) Also, there's supposed to be a DSP too somewhere, right?
     
  14. DuckThor Evil

    DuckThor Evil Anas platyrhynchos
    Legend Veteran

    Joined:
    Jul 9, 2004
    Messages:
    5,832
    Likes Received:
    847
    Location:
    Finland
    Jim Morrison :))) seems to be a bit out of his element with those comments.
     
  15. Gipsel

    Veteran

    Joined:
    Jan 4, 2010
    Messages:
    1,620
    Likes Received:
    264
    Location:
    Hamburg, Germany
    I agree. But I must also say that the GPU and the layout of the SIMDs looks a bit strange. The size of the SIMD blocks would be consistent with a ~15% higher density layout than one sees in Brazos. Not completely impossible given the maturity of 40nm, AMD's experience with it, and the low clock target, especially if it uses an older iteration of the VLIW architecture (DX10.1 R700 generation instead of DX11 R800 generation) as base.
    But there is more. I think function noticed already the halved number of register banks in the SIMDs compared to other implementations of the VLIW architecture. I glossed over that by saying than each one holds simply twice the amount of data (8kB instead of 4kB) and everything is fine. It's not like the SRAM stuff takes significantly less space on the WiiU die than it takes on Brazos (it's roughly in line with the assumed generally higher density).
    But thinking about it, each VLIW group needs parallel access to a certain number (four) of individually addressed register banks each cycle. The easiest way to implement this is to use physically separate banks. That saves the hassle of implementing multiported SRAM (but is also the source of some register read port restrictions of the VLIW architectures). Anyway, if each visible SIMD block would be indeed 40 SPs (8 VLIW groups), there should be 32 register banks (as there are on Brazos as well as Llano and Trinity [btw., Trinity's layout of the register files of the half SIMD blocks looks really close to the register files of GCN's blocks containing two vALUs]). But there are only 16 (but obviously twice the size if we are going with the 15% increased density). So either they are dual ported (then the increased density over Brazos is even more amazing) or something really fishy is going on. Before the Chipworks guy said the GPU die is 40nm TSMC (they should be able to tell), I would have proposed to think again about that crazy sounding idea of a 55nm die (with then only 160SPs of course). :shock:
     
  16. Exophase

    Veteran

    Joined:
    Mar 25, 2010
    Messages:
    2,406
    Likes Received:
    430
    Location:
    Cleveland, OH
    I'd still make the distinction between synthesized RTL (with whatever constraints in place) and a full GPU hard macro placed on that die. I figure that's all he really meant by custom, with the remark that the latter would have die markings identifying it (does anyone know if this is really a given/requirement?)

    If we're talking hand layouts I don't think it applies to most of A6 either.

    30m/year is insane. Especially if we're talking on average, where even Wii hasn't come anywhere close to that. I don't think he's that familiar with game console sales.
     
  17. babybumb

    Regular

    Joined:
    Dec 9, 2011
    Messages:
    584
    Likes Received:
    7
    It might be expensive and if it is very bad for Nintendo
     
  18. Gipsel

    Veteran

    Joined:
    Jan 4, 2010
    Messages:
    1,620
    Likes Received:
    264
    Location:
    Hamburg, Germany
    I don't think he meant a custom or hand layout. I would interprete that guy as saying it's probably a custom designed GPU in the sense, that AMD created a fully custom RTL description for Nintendo (design owned by Nintendo, not AMD), not just some small adjustments to a licensed Radeon GPU design. I don't know if I should believe that.
     
  19. shinobi

    Banned

    Joined:
    Jan 26, 2013
    Messages:
    64
    Likes Received:
    0
    160SP a good possibility?:shock:
     
    #4499 shinobi, Feb 5, 2013
    Last edited by a moderator: Feb 5, 2013
  20. This Way Out

    Newcomer

    Joined:
    Dec 6, 2012
    Messages:
    4
    Likes Received:
    0
    As a complete novice when it comes to matters of such technical nature, I have a question that may or may not make me look like an idiot. But since I'd like to know the answer, I'm sure you guys here are the best to answer - plus there is no such as a stupid question, right?

    Anyway, given so much of the die space is unaccounted for, yes, one of the possible suggestions is that that is for fixed function features allow for greater performance - though I'm certainly in no place to suggest it or confirm either way.

    That said, given the low power draw of the system, could even be remotely possible that developers don't have access to the full power of every feature at once?

    What I mean is that, could it be that there are X amount of units soley for simple vertex manipulation and developers could use those if they wished but then that would reduce the amount of programmable shaders they had access too. The point being they were faster at a specific operation but it might not be useable in all games and in those cases, you can rely on the other shaders but you are ultimately sacrificing total performance for a custom appearance.

    If everything was usable at once with maximum efficiency, surely if there was a power gap to be exploited, there would be a shred of evidence to support it by now, correct?

    Is that a silly thing thing to suggest, a possiblitiy or how they work anyway?
     
Loading...
Thread Status:
Not open for further replies.

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...