Wii U hardware discussion and investigation *rename

Discussion in 'Console Technology' started by TheAlSpark, Jul 29, 2011.

Thread Status:
Not open for further replies.
  1. Grall

    Grall Invisible Member
    Legend

    Joined:
    Apr 14, 2002
    Messages:
    10,801
    Likes Received:
    2,176
    Location:
    La-la land
    That's probably fruitless speculation I'd think. It may also be that core1 has 2MB L2 only because the chip would have been pad limited otherwise - this is nothing but speculation as well, I might add, but seeing how tiny the die is, it just could be true...

    Even so, there's an awful lot of die area inbetween the L2 and cores in particular, as well as along the top edge of the image that isn't labelled as anything specifically. Do those portions actually serve a purpose, or is it for all intents and purposes just dead space?
     
  2. creaks

    Newcomer

    Joined:
    Apr 9, 2013
    Messages:
    81
    Likes Received:
    0






    Well i was specifically referring to chips with a large amount of cores.....

    The more capacity your cache has the less you have to resort to accessing main memory, the closer your average memory acess latency is to the super fast latency of your cache, and farther from the slow latency of main mem.

    The lower your cache size, the higher the probability you will fail to access cache (cause its not available) 'missing' the cache, missing a read from an instruction cache can grind the entire thread to halt until the time it takes to access main mem. Can seriously cripple performance.

    But another core, is an entire extra core that can process simutaneously, thats a big deal. I mean these are both really important to performance.

    On principle, on a chip that already has a lot of cores, I would absolutely choose more cache over another core, but iirc, after you get to 1MB your miss prob is really really low, and increasing further just wont help much more.

    But on a tricore, id choose another core.

    Then again, i dont design custom cpu's so....
     
  3. Entropy

    Veteran

    Joined:
    Feb 8, 2002
    Messages:
    3,360
    Likes Received:
    1,377
    While the overall reasoning is sound, I'd shy away from picking a cut off number on cache size.
    First because average hit rate (averaged over a wide set of applications) increases along a smooth function, making any cut off point rather arbitrary (Why 89%, why not 92% or 94?).

    Second, because cache hit rates are very application specific, depending on the code AND the data set, and the average latency of a cache hit, vs. the average latency of a cache miss, to what extent the bursted data from the main memory is useful or not.... It's really complex, and doesn't let itself get boiled down to a single ideal cache size for either L1, L2 or L3.
    At some point using additional die area for CPUs vs. cache will indeed balance out fairly evenly when looked at over a great number of general applications, but since it is so application and data access dependent it is pretty damn difficult to pinpoint the exact optimum. However, I wouldn't sweat it too much, since the optimum seems to typically be fairly flat, i.e. overall the differences of going either way will be modest when you are close to the optimum. And - the optimum is typically for general code, quite low on cores. (For general x86 code it is lower than 4 and actually lower than 2 last I looked. Tons of caveats though, and a console CPU isn't running general x86 code either.).
     
    #5103 Entropy, Jun 2, 2013
    Last edited by a moderator: Jun 2, 2013
  4. Exophase

    Veteran

    Joined:
    Mar 25, 2010
    Messages:
    2,406
    Likes Received:
    430
    Location:
    Cleveland, OH
    Is it possible that it's area to run traces to cope for the eDRAM's latency without having to buffer it? I hope that doesn't sound totally silly.
     
  5. creaks

    Newcomer

    Joined:
    Apr 9, 2013
    Messages:
    81
    Likes Received:
    0


    heh, thanks for the info. and possibly a brain hemorrage :p
     
  6. pc999

    Veteran

    Joined:
    Mar 13, 2004
    Messages:
    3,628
    Likes Received:
    31
    Location:
    Portugal
    Dont forget that Wii cpu had 1Mb L2 too (IIRC), it may be there for BC and easy porting of Nintendo engines!



    Plus large cache make it easier to develop for, cooler and probably quite good for real world performance (and fremerate) too? I prefer to have less and stable than seing it stutter each time it takes a few more things and isnt really up to the task because of unpredictable performance.


    We spended years hearing who hard the ps360 memory (sub)sytems hit on the performance, then we have a console that is 1,5-2x better in raw numbers and have a great memory system plus four time the RAM. It should be quite interesting seeing what they can extract from it.
     
  7. Exophase

    Veteran

    Joined:
    Mar 25, 2010
    Messages:
    2,406
    Likes Received:
    430
    Location:
    Cleveland, OH
    Wii's Broadway CPU only has 256KB of L2 cache. It's a shrink of the old Gamecube Gekko CPU which also had 256KB of L2. 256KB of on-die L2 cache was standard for processors of that era - Coppermine P3, Willamette P4, and Palomino AthlonXP all had 256KB of L2.
     
  8. Blazkowicz

    Legend

    Joined:
    Dec 24, 2004
    Messages:
    5,607
    Likes Received:
    256
    And Pentium Pro, too.
     
  9. kalelovil

    Regular

    Joined:
    Sep 8, 2011
    Messages:
    568
    Likes Received:
    104
    The Pentium Pro's L2 cache was on-package but not on-die.
     
  10. Exophase

    Veteran

    Joined:
    Mar 25, 2010
    Messages:
    2,406
    Likes Received:
    430
    Location:
    Cleveland, OH
    I was thinking more processors that were actually current in fall 2001 when Gamecube came out :p Pentium Pro's L2 cache was actually on-package, 256KB would have been a lot to put on-die in 1995.
     
  11. ltcommander.data

    Regular

    Joined:
    Apr 4, 2010
    Messages:
    616
    Likes Received:
    15
    http://barefeats.com/g4up2.html

    For interest, I haven't seen any benchmarks between the 512kB L2 PowerPC 750FX and the 1MB L2 PowerPC 750GX, but there are some benchmarks available comparing the 512kB L2 PowerPC 7447A and 1MB L2 PowerPC 7448. In the game results, doubling the L2 cache to 1MB yields a 13% fps boost in Doom 3 and a 11% boost in Halo which is pretty effective. The caveat is that the QuickSilver Power Mac these G4 processor upgrade kits were tested on had a 133MHz FSB and PC133 SDRAM so they were probably bottlenecked to system memory, which could exaggerate the benefits of more L2 cache.
     
  12. creaks

    Newcomer

    Joined:
    Apr 9, 2013
    Messages:
    81
    Likes Received:
    0
    G4's? Dont those have altivecs? I remember paired singles used to be pretty good. Until things like the altivec. I wonder if nintendo improved the paired singles vector math any. Or perhaps they see gpgu as a solution/compromise.

    I actually have some 750cx vs fx vs gx documents around. Details differences in cache size, branch prediction, and what not. Ill dig it up.
     
  13. pc999

    Veteran

    Joined:
    Mar 13, 2004
    Messages:
    3,628
    Likes Received:
    31
    Location:
    Portugal
    Thanks for the correction.


    Isnt the main problem the latency? Didnt that increased with DDR3, maybe a faster FSB helped but on the otherside DDR3 may made those improvements mute?
     
  14. Grall

    Grall Invisible Member
    Legend

    Joined:
    Apr 14, 2002
    Messages:
    10,801
    Likes Received:
    2,176
    Location:
    La-la land
    AMD's VLIW5 arch is very impractical for GPGPU; performance is shit most of the time. It was designed for graphics, not compute. Considering how weak wuu's gpu is to begin with, we won't be seeing much GPGPU on it I'm wagering. Since it's based on R4000-era tech it most likely lacks all GCN optimizations for co-scheduling compute and graphics tasks and so on; it probably won't mix well, kind of like running compute tasks on an older GPU in windows and trying to use the desktop for anything at the same time...
     
  15. lwill

    Newcomer

    Joined:
    Apr 11, 2007
    Messages:
    110
    Likes Received:
    0
    Considering the customization done to this GPU from its original r700 base, I'm not sure if we can be certain that whatever is in Latte are VLIW5s
     
  16. Grall

    Grall Invisible Member
    Legend

    Joined:
    Apr 14, 2002
    Messages:
    10,801
    Likes Received:
    2,176
    Location:
    La-la land
    If it isn't VLIW5, then why did they start off with R700? It'd be like building a custom truck by taking an existing truck, tearing it all down and throwing everything away, then re-building a new truck with all-new components. Doesn't make sense, and I'm sure this isn't what they've done.
     
  17. lwill

    Newcomer

    Joined:
    Apr 11, 2007
    Messages:
    110
    Likes Received:
    0
    Nintendo apparently started developing Latte in 2009 when r700 was the current generation of AMD GPU's. IIRC, development for the chip was not done until 2011, so there was time for the chip to evolve from its base. Considering the difference in the appearance of the internal works of the processor compared to the original r700, it is not something to completely ignore.
     
  18. Grall

    Grall Invisible Member
    Legend

    Joined:
    Apr 14, 2002
    Messages:
    10,801
    Likes Received:
    2,176
    Location:
    La-la land
    Doesn't make sense to fixate on a piece of hardware which will become completely obsoleted when you don't intend to launch for nearly half a decade. Did either Sony or MS focus on obsoleted tech in their consoles? No. Methinks nintendo picked R700 not because they've been slow-cooking wuugpu since 2009, but rather because it was cheaper than any other IP that AMD could offer, and that actual hardware development was (way) shorter than four years.

    Shit, in four years nintendo could have had AMD brew up an entirely novel architecture from start to finish rather than start off with an existing one.

    There's no reason to assume that the base is anything but what it would look like it is, especially with nintendo. VLIW5 is part of the fundamental workings of that hardware design, you change that and you need to change pretty much everything. Even going for cayman's slightly refined VLIW4 (which also is mostly crap for GPGPU one might add) means you get DX11 as well, which we have no information whatsoever that wuu supports.

    Custom layout is one thing, fundamental inner workings something else. Occam's razor speaks against that, plus nintendo cheapskatedness as well.
     
  19. AzaK

    Newcomer

    Joined:
    Jun 10, 2012
    Messages:
    43
    Likes Received:
    0


    I think you need to read all of the Latte thread on NeoGaf before making such blanket statements. There are folks there spending hours upon hours looking into the chip and coming up with very interesting theories

    You may still be right but I don't think it's as black and white as you're making out.
     
  20. function

    function None functional
    Legend

    Joined:
    Mar 27, 2003
    Messages:
    5,854
    Likes Received:
    4,411
    Location:
    Wrong thread
    Nintendo went all in and spent years developing a very highly customised, almost unique GPU arch in order to do something that R7xx could do on its own - deliver Xbox 360 level performance.

    Nope. Really not feeling it.
     
Loading...
Thread Status:
Not open for further replies.

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...