NVIDIA Maxwell Speculation Thread

Discussion in 'Architecture and Products' started by Arun, Feb 9, 2011.

Tags:
  1. Ailuros

    Ailuros Epsilon plus three
    Legend Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    9,511
    Likes Received:
    224
    Location:
    Chania
    I always read Damien Triolet's reviews first (depite that online translation is painful since I don't speak french) and it wasn't different this time. If monsieur Triolet is correct (which I have no real reason to doubt, since he's usually very well informed) then I'm not so sure there will be any further Maxwell chips on 28nm.
     
  2. Dave Baumann

    Dave Baumann Gamerscore Wh...
    Moderator Legend

    Joined:
    Jan 29, 2002
    Messages:
    14,090
    Likes Received:
    694
    Location:
    O Canada!
    I think the branding would give a definite indicator as to timing of other parts. In many respects this is very similar to Bonaire's launch.

    While the claims need to be proven (though they were with SHA-256/Bitcoin mining) dedicated Scrypt miners are quoting 5M Hash/sec for 70W and I've seen others quoting similar (0.9Mh/s for 5W); by comparison Hawaii is ~0.8-0.9Mh/s.
     
  3. Picao84

    Veteran

    Joined:
    Feb 15, 2010
    Messages:
    2,109
    Likes Received:
    1,196
    I was reading it, both through google translate and using my mostly rudimentary knowledge of French and could not find that hint. Could you quote it please?
     
  4. Ailuros

    Ailuros Epsilon plus three
    Legend Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    9,511
    Likes Received:
    224
    Location:
    Chania
    I've figured as much.

    Superb :p
     
  5. Picao84

    Veteran

    Joined:
    Feb 15, 2010
    Messages:
    2,109
    Likes Received:
    1,196
    I still cannot get why that means we wont see more 28nm Maxwell's... :???:
    Its just a remark about it being limited to 1 triangle per clock, as expected from having just one GPC?
     
  6. Ailuros

    Ailuros Epsilon plus three
    Legend Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    9,511
    Likes Received:
    224
    Location:
    Chania
    http://forum.beyond3d.com/showpost.php?p=1827815&postcount=961

    Don't get too hung up in my ramblings since I essentially was wrong since Maxwell still has dedicated FP units; however the entire 2x times efficiency thing sounded a wee bit strange and I was reconsidering tviceman's questions about the interdie connect.

    Now what Mr. Triolet here implies is that when you have 1 GPC (ie 1 raster/1 trisetup) is that you don't need any highly complex interdie connect and can cut back severely in that department. I assume that that difference could be absorbed by a smaller process like 20SoC (+30% improvement over 28HP at best?) and hence my gut feeling that no bigger Maxwell cores under 28nm since GM206 cannot obviously have just one GPC can it?
     
  7. iMacmatician

    Regular

    Joined:
    Jul 24, 2010
    Messages:
    797
    Likes Received:
    223
    So it's GM106 not GM206? (or will there be both?)
     
  8. mczak

    Veteran

    Joined:
    Oct 24, 2002
    Messages:
    3,022
    Likes Received:
    122
    Well no larger Maxwell 28nm chips would mean no GM106 though there were rumors about GM108 which would still be possible. Though I think it's a bit too far fetched to conclude there would be no bigger 28nm Maxwell chips based on that bit (not that I think there really will be a GM106 28nm chip).
    btw I'm wondering how large a SMM is vs. a gk1xx and gk2xx SMX? Presumably it ought to be a good bit smaller, but I wonder how large the difference really is.
     
  9. Picao84

    Veteran

    Joined:
    Feb 15, 2010
    Messages:
    2,109
    Likes Received:
    1,196
    OK, true.
     
  10. mczak

    Veteran

    Joined:
    Oct 24, 2002
    Messages:
    3,022
    Likes Received:
    122
  11. DSC

    DSC
    Banned

    Joined:
    Jul 12, 2003
    Messages:
    689
    Likes Received:
    3
    #1091 DSC, Feb 18, 2014
    Last edited by a moderator: Feb 18, 2014
  12. mczak

    Veteran

    Joined:
    Oct 24, 2002
    Messages:
    3,022
    Likes Received:
    122
    Why is that interesting? These features were already present in the gk20x series (gk208/gk20a, though the latter is just a guess), just like the higher bit shifter capability.
     
  13. DSC

    DSC
    Banned

    Joined:
    Jul 12, 2003
    Messages:
    689
    Likes Received:
    3
  14. mczak

    Veteran

    Joined:
    Oct 24, 2002
    Messages:
    3,022
    Likes Received:
    122
    I'm wondering what kind of new compression scheme nvidia uses. Anandtech's pixel fill number (http://www.anandtech.com/show/7764/the-nvidia-geforce-gtx-750-ti-and-gtx-750-review-maxwell/20) is probably the most impressive benchmark I've seen yet. This is 100% bandwidth limited on all amd cards you see there, and I'm pretty sure that's the case for GTX 750 / 750Ti as well. Yet the 750 manages a score which is 40% higher than the r7 260x, even though the latter has 20% more bandwidth available (and yes kepler was more efficient there already than SI/CI but certainly not to that extent). I dunno maybe the large cache helps, but surely this has to be the reason why this card doesn't seem to need all that much memory bandwidth to perform still quite well.
     
  15. fellix

    Veteran

    Joined:
    Dec 4, 2004
    Messages:
    3,552
    Likes Received:
    514
    Location:
    Varna, Bulgaria
    It could be the effect of the 8-fold increase of the L2 size. :???:
    It's simply more efficient, but still only gets a bit closer to its theoretical maximum.

    This particular 3DMark test measures FP16 blending rate, so it hammers the memory writes quite heavy on all architectures. It's a simple bandwidth deficit issue.
     
  16. dnavas

    Regular

    Joined:
    Apr 12, 2004
    Messages:
    375
    Likes Received:
    7
    Yeah, well, I took both sides of the issue and was wrong twice, so ;^/
    I find it interesting how the dp units are hanging out with the tmus. It's difficult for me to imagine that they'll adopt that for hpc...?

    As I'm in the apparent target market for this (I currently have a 5750, and I AM looking to upgrade), I should probably say a little something. As someone who upgrades infrequently, one of the things I look for are feature sets that will survive the next four years. No HDMI 2.0, no hevc encode OR decode, optional displayport, and 128bit bus made me sad. Low idle power and noise make me happy. :shrug: YMMV. I don't play games, I do video editing, so I'm not a perfect match.
     
  17. mczak

    Veteran

    Joined:
    Oct 24, 2002
    Messages:
    3,022
    Likes Received:
    122
    Yes that why this test is so impressive. It requires 16 bytes / clock (8 read / 8 write). Hence gm107 reaching a bandwidth efficiency of 140% or so (40% over theoretical maximum). That's what I call efficient :).
    It's possible l2 cache helps, though traditionally it does not seem to do much. Or there's some rather impressive lossless compression going on. But maybe it really is just L2 cache size - kepler also had higher bandwidth efficiency (in this test at least) compared to GCN, and it could be for this same reason (since GCN does not use L2 cache for ROPs, and the ROP caches themselves are tiny).
     
  18. revan

    Newcomer

    Joined:
    Nov 9, 2007
    Messages:
    55
    Likes Received:
    18
    Location:
    look in the sunrise ..will find me
  19. psurge

    Regular

    Joined:
    Feb 6, 2002
    Messages:
    955
    Likes Received:
    52
    Location:
    LA, California
    I'm not sure I believe hardware.fr's diagrams on that point. I don't see any justification for their claims in the article, and they've also got the texture cache/L1 size at 24KB per SMM (half the amount per SMX), despite the fact that it is now apparently servicing memory reads/writes from the shader cores. Hopefully they follow up with details on how they came to their conclusions.
     
  20. fellix

    Veteran

    Joined:
    Dec 4, 2004
    Messages:
    3,552
    Likes Received:
    514
    Location:
    Varna, Bulgaria
    Since the L1 is now part of the texture cache, there could be some major changes to it that we don't know for sure yet. It could be larger unified pool or a split design again. If it is the first case, that means now the texture units in Maxwell can access the cache for both read and write op's.
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...