Wii U hardware discussion and investigation *rename

Discussion in 'Console Technology' started by TheAlSpark, Jul 29, 2011.

Thread Status:
Not open for further replies.
  1. Esrever

    Regular

    Joined:
    Feb 6, 2013
    Messages:
    846
    Likes Received:
    647
    Not too much, 160 shaders with 8 ROPs and 16 TMUs is possible just as 320 with 16 TMUs and 4 ROPs. I am not 100% if its 20 SP per block but I think its still very possible. 160:16:8 would be exactly double the numbers on an ati 4550. 320:16:8 would be double the numbers on an amd 6450.


    They are probably lots of bottlenecks with work arounds that developers aren't familiar with yet. Even with 320 shaders there could be problems with other parts of the system that requires more optimization to get running as well as current gen consoles.
     
  2. lwill

    Newcomer

    Joined:
    Apr 11, 2007
    Messages:
    110
    Likes Received:
    0
    I understand your stance on considering 160sp from a technical point of view, but from looking at the game titles, I don't see how that is feasible. Developers already has to optimize their games for the CPU that is definitely weaker in the tasks that the current-gen CPUs are the strongest at, so I don't see how Wii U's launch ports would be as roughly on-par as they are if the GPU was also lacking raw power compare to the other systems.

    I agree with your second part of your post. This GPU seems to have a very unusual architecture, so it is not surprising to see some weird issues even if the system has a stronger GPU.
     
  3. Gipsel

    Veteran

    Joined:
    Jan 4, 2010
    Messages:
    1,620
    Likes Received:
    264
    Location:
    Hamburg, Germany
    It couldn't have been write only. Otherwise it wouldn't have been possible to get the readback from atomic operations ;). Actually, it was explicitly labeled as R/W cache in Cypress block diagrams. There were 8 sections of 16 kB each, aligned to the adress spaces handled by the 8 memory channels.

    edit:
    fellix posted the part of the block diagram I'm talking about already some time ago.
     
    #4643 Gipsel, Feb 9, 2013
    Last edited by a moderator: Feb 9, 2013
  4. 3dcgi

    Veteran Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    2,493
    Likes Received:
    474
    Maybe we're talking about different things, but there's no read/write cache in Cypress. I was referring to the write combining cache used for non frame buffer writes. It was read/write in R600. In Cypress atomics were handled by the LDS/GDS.
     
  5. Gipsel

    Veteran

    Joined:
    Jan 4, 2010
    Messages:
    1,620
    Likes Received:
    264
    Location:
    Hamburg, Germany
    Atomics on global memory were handled by this cache. The GDS also has atomic operations, but are not used for global atomics. The GDS atomics are used in OpenCL exclusively for the atomic counters, which are much faster than global atomics for this exact reason. Cypress actually has both, write combining buffers and a R/W cache for global memory/UAVs. I edited my post above already, but I just steal the part of the official Cypress block diagram fellix posted 3 years ago.

    [​IMG]

    As I said, there are eight 16kB slices (aligned to the memory channels) of R/W cache handling the global atomics in addition to the write combining buffers (8 times 4 kB). It's kind of hard to make good use of it, but at least it serves the purpose of providing the global atomics (I think the only justification to call it a R/W cache is that one can access cached memory through it with atomic ops, but that is neither very convenient nor extremely fast). The R600 R/W cache basically served no purpose why I called it a glorified write combining buffer. ;)
     
    #4645 Gipsel, Feb 9, 2013
    Last edited by a moderator: Feb 9, 2013
  6. DRS

    DRS
    Newcomer

    Joined:
    May 22, 2009
    Messages:
    135
    Likes Received:
    0
    8 texturemaps per texture unit and 4 rops?

    BTW, does the 360 support early Z checking? If there is a one way path from GPU to ROPS I'd guess not?
     
  7. AlphaWolf

    AlphaWolf Specious Misanthrope
    Legend

    Joined:
    May 28, 2003
    Messages:
    9,470
    Likes Received:
    1,686
    Location:
    Treading Water
    I know it's a long thread, but it has been covered many times. It has strengths and weaknesses compared to ps360. Ultimately the games themselves tell you what you need to know, if you're waiting for developers to take years to release the hidden power of the wiiu you can stop. Games might get a bit better, but it doesn't look to have any magic packed into its 40W.
     
  8. 3dcgi

    Veteran Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    2,493
    Likes Received:
    474
    I see what you mean. I never considered that to be a cache and didn't realize it was described as such.
     
  9. Blazkowicz

    Legend

    Joined:
    Dec 24, 2004
    Messages:
    5,607
    Likes Received:
    256
    They have the 3DS, with a terrible dual core CPU (ARM11, as in Rapsberry Pi) and the GPU is a mixed bag but at least has vertex shaders and looks more advanced than the Wii one.
    http://en.wikipedia.org/wiki/PICA200

    Given the acute slowness of the 3DS CPU I hope they have been smart enough to learn programming it and thus have some multicore programming experience.
     
  10. XpiderMX

    Veteran

    Joined:
    Mar 14, 2012
    Messages:
    1,768
    Likes Received:
    0
    I really doubt that 3DS development can be compared to a multicore powerpc and modern shaders.
     
  11. Exophase

    Veteran

    Joined:
    Mar 25, 2010
    Messages:
    2,406
    Likes Received:
    430
    Location:
    Cleveland, OH
    Yes it supports early Z (and hierarchical Z). http://www.beyond3d.com/content/articles/4/5

    Early Z or late Z, the operations are still the same. The ROPs don't know or care when it's done, they just need to have the Z part decoupled from the color part.
     
  12. DRS

    DRS
    Newcomer

    Joined:
    May 22, 2009
    Messages:
    135
    Likes Received:
    0
    Thanks for the clarification. I recon the operations are the same, but in early Z case triangle setup should query the ROP, while in late Z case the ROP itself can handle it.

    Anyways, to clarify my line of thoughts, if the 360 wouldn't be able to do it, WiiU GPU could get away with somewhat lower rendering power and 160GFLOPS might be sufficient.


    Yes, and people also refer to 40W average as having all USB ports hooked up, which doesn't seem to be an average usage case to me. Also, games (such as Doom 3) copy vertex data from memory, animate it and write it to another location for the GPU to pick up, bumping up the bandwidth requirements.
     
    #4652 DRS, Feb 10, 2013
    Last edited by a moderator: Feb 10, 2013
  13. haihoo

    Newcomer

    Joined:
    Nov 2, 2012
    Messages:
    14
    Likes Received:
    0
  14. Shifty Geezer

    Shifty Geezer uber-Troll!
    Moderator Legend

    Joined:
    Dec 7, 2004
    Messages:
    44,106
    Likes Received:
    16,898
    Location:
    Under my bridge
    Not very constructive, sadly. A Nintendo fansite where the developer is wanting to drum up interest in their new game, they're going to be in full PR mode. "Powerful" is unqualified. It'll be interesting to compare the game though to see what they manage. It was very much in need of better IQ on PS3.
     
  15. function

    function None functional
    Legend

    Joined:
    Mar 27, 2003
    Messages:
    5,854
    Likes Received:
    4,411
    Location:
    Wrong thread
    So who wants to take bets on it looking like the 360 version, possibly with a few improved textures, but a less stable frame rate?
     
  16. Barbarian

    Regular

    Joined:
    Jun 27, 2005
    Messages:
    289
    Likes Received:
    15
    Location:
    California, USA
    That's correct. The Xbox360 has HiZ for sure, but unfortunately does not have EarlyZ. The per-pixel depth values are processed via LateZ (ie post pixel shader) directly in the ROPs.
     
  17. Exophase

    Veteran

    Joined:
    Mar 25, 2010
    Messages:
    2,406
    Likes Received:
    430
    Location:
    Cleveland, OH
    Looks like that is the case.. I figured the eDRAM logic handled color and depth separately but I guess not. It's confusing since a lot of stuff out there refers to the hi-Z as a form of early-Z (which it is I guess)
     
  18. function

    function None functional
    Legend

    Joined:
    Mar 27, 2003
    Messages:
    5,854
    Likes Received:
    4,411
    Location:
    Wrong thread
    With all the peeping at die shots (which has been tremendous fun) I think we might have gotten tunnel vision and be losing the "big picture". The question of "320 vs 160" shaders is still unanswered and stepping back should help us answer it.

    The current popular hypothesis that Latte is a 16:320:8 part @ 550 mHz. Fortunately, we can see how such a part runs games on the PC. You know, the PC, that inefficient beast that's held back by Windows, thick APIs, Direct X draw-call bottlencks that break the back of even fast CPUs, and all that stuff. Here is a HD 5550, a VLIW5 GPU with a 16:320:8 configuration running at @550 mhz:

    http://www.techpowerup.com/reviews/HIS/Radeon_HD_5550/7.html

    And it blows past the 360 without any problems. It's not even close. And that's despite being on the PC!

    Now lets scale things back a bit. This is the Llano A3500M w/ Radeon 6620G - a 20:400:8 configuration GPU, but it runs @ 444 MHz meaning it has exactly the same number of gflops and TMU ops as the HD 5550, only it's got about 20% lower triangle setup and fillrate *and* it's crippled by a 128 bit DDR 1333 memory pool *and* it's linked to a slower CPU than the above benchmark (so more likely to suffer from Windows/DX bottlenecks). No super fast pool of edram for this poor boy!

    http://www.anandtech.com/show/4444/amd-llano-notebook-review-a-series-fusion-apu-a8-3500m/11
    http://www.anandtech.com/show/4444/amd-llano-notebook-review-a-series-fusion-apu-a8-3500m/12

    And it *still* comfortably exceeds the 360 in terms of the performance that it delivers. Now lets look again at the Wii U. Does it blow past the 360? Does it even comfortably exceed the 360? No, it:

    keeps
    losing
    marginally
    to
    the
    Xbox
    360

    ... and that's despite it *not* being born into the performance wheelchair that is the Windows PC ecosystem. Even if the Wii U can crawl past the 360 - marginally - in a game like Trine 2 it's still far below what we'd expect from a HD5550 or even the slower and BW crippled 6620G. So why is this?

    It appears that there two options. Either Latte is horrendously crippled by something (API? memory? documentation? "drivers"?) to the point that even equivalent or less-than equivalent PC part can bounce its ass around the field, or ... it's not actually a 16:320:8 part.

    TL: DR version:
    Latte seems to be either:
    1) a horrendously crippled part compared to equivalent (or lower) PC GPUs, or
    2) actually a rather efficient 160 shader part

    Aaaaaaand I'll go with the latte(r) as the most likely option. Face it dawgs, the word on the street just don't jive with the scenes on the screens.
     
  19. Syferz

    Newcomer

    Joined:
    Jul 16, 2012
    Messages:
    157
    Likes Received:
    41
    I didn't know you did comedy function, looking at call of duty port, it is fairly clear that they took the 360 game and forced it to run on Wii U. It runs the same exact frame resolution, same as and same image quality. Now if it had an ALU problem vs 360, then it wouldn't be able to produce the multiplayer game on both the tv and gamepad in split screen, since that would require more polygon pushing power, not less... The reality is these ports are running modified 360 code, something similar to what vigil did to get ds2 running on the Wii U in only a few weeks.
     
  20. function

    function None functional
    Legend

    Joined:
    Mar 27, 2003
    Messages:
    5,854
    Likes Received:
    4,411
    Location:
    Wrong thread
    So you think games are running massively more efficiently on the PC than on the Wii U?

    I don't actually think the Wii U has an "ALU problem" compared to the 360. I think it gets by okay compared to the PS360, especially considering the power draw. I think the "problem" is that a in a bizarre feat of back-peddling and entrenchment the 360 - a seven year old system - is seen as some kind of benchmark for "console power" for a brand new $350 / £300 games console pimped by Nintendo as being some kind of 3rd party dream box.
     
Loading...
Thread Status:
Not open for further replies.

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...