NVIDIA Maxwell Speculation Thread

Discussion in 'Architecture and Products' started by Arun, Feb 9, 2011.

Tags:
  1. Picao84

    Veteran

    Joined:
    Feb 15, 2010
    Messages:
    2,109
    Likes Received:
    1,196
    Thanks.

    Anyone else is finding this tidbit a bit too much for a 28nm 60W chip?

    EDIT 1 - If it is true, WOW having the power of GTX480 on a decent, not so expensive, laptop :D WANT!!!!

    EDIT 2 - However, with such low memory bandwidth, it will probably will be quite a bit slower than GTX480 at high resolutions/4x AA.
     
    #861 Picao84, Feb 12, 2014
    Last edited by a moderator: Feb 12, 2014
  2. fellix

    Veteran

    Joined:
    Dec 4, 2004
    Messages:
    3,552
    Likes Received:
    514
    Location:
    Varna, Bulgaria
    Weird.

    So, the new GPC configuration is 5 multiprocessors now? How much for the big Maxwell -- 4xGPC & 2560 ALUs? That would make for 450~480mm² die on 28nm, if the GPC share is roughly 50% of the whole IC logic.
     
  3. mczak

    Veteran

    Joined:
    Oct 24, 2002
    Messages:
    3,022
    Likes Received:
    122
    Yes I don't think it can quite reach GTX 480 performance in general, the numbers just don't add up. There might be some benchmarks where it's really close though.

    And I really had to laugh about this:
    You'd think the SMX reorganization would be a much bigger change compared to a (rather trivial) increase of cache size (compared to gk208 which already has quadrupled L2 cache size per MC over gk107 anyway it's only a doubling in any case)... Seems to imply though gpus follow the way of cpus - traditionally gpus had tiny l2 caches (but lots of "cache" as registers). Maybe it really helps for some new framebuffer compression tricks (I'm still amazed if these products really use sub-6Ghz gddr5 memory). GF100 just had 768kB (and that was considered a lot already as GT200 only had 256kB) so seeing 2MB in some midrange offering certainly ups the stakes. Heck even Hawaii only has 1MB... That of course assumes the 2MB is actually true (I have no idea if this source is trustworthy, it certainly sounds like a lot!).
     
  4. dnavas

    Regular

    Joined:
    Apr 12, 2004
    Messages:
    375
    Likes Received:
    7
    The hierarchy is logical, considering they care about data locality.
    If the light blue is cache, and the darker blue are tex/rop, red is dispatch and orange/yellow is scheduler, where are the sfus?
     
  5. iMacmatician

    Regular

    Joined:
    Jul 24, 2010
    Messages:
    797
    Likes Received:
    223
    My guess is 2 GPCs for GM206, 4 for GM204, and around 6 for GM200, depending on the number of SMMs per GPC (it changed from GK104 to GK110).

    How significant architecturally would the SMX reorganization be?

    Is it possible that there are no SFUs anymore?
     
  6. DSC

    DSC
    Banned

    Joined:
    Jul 12, 2003
    Messages:
    689
    Likes Received:
    3
    256KB - GK107
    384KB - GK106
    512KB - GK208
    512KB - GK104
    768KB - GF100/GF110
    1536KB - GK110

    Is it really possible for 2MB L2 in a low-end GPU on the same 28nm without ballooning the die size?
     
  7. silent_guy

    Veteran Subscriber

    Joined:
    Mar 7, 2006
    Messages:
    3,754
    Likes Received:
    1,382
    You never know with these journalists, but I'll charitable assume that this is a typo and that it should be 'memory' instead of 'GPU'. :wink:
     
  8. mczak

    Veteran

    Joined:
    Oct 24, 2002
    Messages:
    3,022
    Likes Received:
    122
    Well if they changed as much as implied by those diagrams that looks quite like a significant architectural change to me.

    I hope so as I predicated that for Kepler already :).


    I can't see why not. Kabini's 2MB L2 cache (which I don't think is anything special or particularly dense) is below 20mm² including tags I believe (never saw a number for that, just a guess from die shot). Granted you'd probably need more cache bandwidth than what Kabini provides but I don't think it should be a particular problem from a size point of view. The problem with large l2 caches in gpus just has been that they didn't offer that much of a performance benefit presumably (hence instead of larger l2 cache they rather put one more smx on the die or something along these lines). But maybe they are good for perf/w...
     
  9. dnavas

    Regular

    Joined:
    Apr 12, 2004
    Messages:
    375
    Likes Received:
    7
    Seems more likely that they just aren't shown, but who knows. I'm also intrigued by "the number of instructions per clock cycle has been increased" because "holy hot clock cycle reincarnation" and "wait, this is like Fermi++, wth was Kepler then"....
     
  10. AnarchX

    Veteran

    Joined:
    Apr 19, 2007
    Messages:
    1,559
    Likes Received:
    34
  11. Alexko

    Veteran Subscriber

    Joined:
    Aug 31, 2009
    Messages:
    4,541
    Likes Received:
    964
    Or "from" instead of "to".

    I don't think it's that big of a deal. If you look at a Kaveri die shot you'll see that the 4MB of L2 don't take up a very large part of the die, and that's for cache with tighter latency requirements than what you'd need in a GPU.

    Granted it's GloFo's 28nm process instead of TSMC's, but they probably have similar SRAM densities.


    http://cdn2.wccftech.com/wp-content/uploads/2014/01/AMD-Kaveri-Die-Shot1.jpg
     
  12. iMacmatician

    Regular

    Joined:
    Jul 24, 2010
    Messages:
    797
    Likes Received:
    223
  13. Picao84

    Veteran

    Joined:
    Feb 15, 2010
    Messages:
    2,109
    Likes Received:
    1,196
    What about:

    Looking at Fermi and Kepler this was not the case? Each SM was a "monoblock"?

    [​IMG]

    [​IMG]

    If this is true, any idea about the consequences?
     
  14. Arun

    Arun Unknown.
    Legend

    Joined:
    Aug 28, 2002
    Messages:
    5,023
    Likes Received:
    302
    Location:
    UK
    Since they support 48KB Shared Memory + 16KB L1 per 192 ALU SMX on Kepler, each 32 ALU SM will need to have at least 48KB Shared Memory for backwards compatibility. That's a LOT more shared memory (and associated bandwidth) than on Kepler!
     
  15. OlegSH

    Regular

    Joined:
    Jan 10, 2010
    Messages:
    805
    Likes Received:
    1,634
    Off couse it is, with 6T SRAM and the same banks/ports count(as in GK107) additional 1792 Kbytes of cache will require 88080384 transistors. 88 mln transistors are very cheap on 28nm and they are even cheaper in terms of area since sram could have more dense layout than the rest of the chip, it's likely just 7-8 mm2 of additional area on 28nm
     
  16. tviceman

    Newcomer

    Joined:
    Mar 6, 2012
    Messages:
    191
    Likes Received:
    0
    Maxwell is going to be a beast. If it's this good on 28nm, I can't wait to see it shrunk down to 20nm, or 16nm / finfet.
     
  17. tviceman

    Newcomer

    Joined:
    Mar 6, 2012
    Messages:
    191
    Likes Received:
    0
    I highly doubt we'll get big maxwell on 28nm. However, if 20nm isn't worth the trouble and finfets are still a ways off, then we'll definitely get GM104 on 28nm.
     
  18. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    12,059
    Likes Received:
    3,119
    Location:
    New York

    It does seem like a throwback to simpler times.
     
  19. UniversalTruth

    Veteran

    Joined:
    Sep 5, 2010
    Messages:
    1,747
    Likes Received:
    22
    If GTX 480 is as fast as a modern GTX 660, then it means that by launch time the videocard (GTX 750 Ti) would be actually showing higher performance results compared to what we have already seen with performance below GTX 650 Ti Boost.. Looks goodie..
     
  20. tviceman

    Newcomer

    Joined:
    Mar 6, 2012
    Messages:
    191
    Likes Received:
    0
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...