NVIDIA Maxwell Speculation Thread

Discussion in 'Architecture and Products' started by Arun, Feb 9, 2011.

Tags:
  1. What's wrong with 64 ROPs?
    It's as much as Haiti. The next generation of GPUs will probably be branded as "4K GPUs".
    The marketing teams will say these are the graphics cards we'll want for playing games on 4K TVs or multi-monitor setups, while keeping up with the "per-pixel" quality that the new consoles pump out at 900p/1080p.
    It makes sense to start scaling the fillrate capabilities of the new GPUs.
     
  2. mczak

    Veteran

    Joined:
    Oct 24, 2002
    Messages:
    3,022
    Likes Received:
    122
    There's nothing inherently wrong with 64 ROPs. ROPs though eat bandwidth for breakfast, lunch and dinner, so it makes quite sense to have some fixed number per MC (disregarding issues such as with some nvidia cpus which have more ROPs than the rasterizer or smx can potentially keep busy in most cases).
    It should be relatively trivial to double the ROP number per MC without really changing the overall structure, so imho something which would theoretically be well possible for GM2xx. And, if they've got 4 GPC which can rasterize 16 pixels each (like GM107 but unlike Kepler), plus 16 SMM (4 pixels export per clock each) it could potentially be quite useful. So, given the bandwidth efficiency improvements of Maxwell, I wouldn't rule this completely out (it would also help fp32 blend without changing the rops themselves, because for this the Kepler/GM107 ROPs are so slow the performance is way below bandwidth limits in any case).
    btw what is Haiti ;-).
     
    #1962 mczak, Sep 6, 2014
    Last edited by a moderator: Sep 6, 2014
  3. lanek

    Veteran

    Joined:
    Mar 7, 2012
    Messages:
    2,469
    Likes Received:
    315
    Location:
    Switzerland
    Even if they have 16 ROPs they will say that ... Hawaii have a 512bit bus and basically what is needed behind for 64 Rops, but before anything, Hawaii can deal with extremely large area of memory ( 16GB with really good efficiency) .. I dont know, maybe it could benefit Maxwell architecture with his big cache and all. But seriously im not sure of it.
     
    #1963 lanek, Sep 6, 2014
    Last edited by a moderator: Sep 6, 2014
  4. xDxD

    Regular

    Joined:
    Jun 7, 2010
    Messages:
    412
    Likes Received:
    1
    #1964 xDxD, Sep 6, 2014
    Last edited by a moderator: Sep 6, 2014
  5. pjbliverpool

    pjbliverpool B3D Scallywag
    Legend

    Joined:
    May 8, 2005
    Messages:
    9,235
    Likes Received:
    4,259
    Location:
    Guess...
    Obviously conflicts with the last rumour in unit counts (this one being 32 ROP's, 2048 cores and 128 TMU's).

    I find these specs a little more believable tbh since this is basically just a GTX680 built on the Maxwell architecture (granting the increase in cores's per SMX) but if it's 10% faster than the 780Ti with those specs then WOW! That would be a huge efficiency boost assuming it's running at something around 1Ghz. Going from the firestrike scores it would make it around 70% faster than the 680 with the same number of SMX's with a lower TDP to boot!

    I really hope this is true!

    EDIT: If we assume clocks are more likely to be in 770 territory than 680 then the performance increase drops to around 55% over GK104 but using only 74% the TDP. Crazy!
     
  6. boxleitnerb

    Regular

    Joined:
    Aug 27, 2004
    Messages:
    407
    Likes Received:
    0
    TDP is not power consumption.
     
  7. OlegSH

    Regular

    Joined:
    Jan 10, 2010
    Messages:
    798
    Likes Received:
    1,625
    It's 99.9% of it
     
  8. boxleitnerb

    Regular

    Joined:
    Aug 27, 2004
    Messages:
    407
    Likes Received:
    0
    No, because TDP is a static value - real world power is not but it is load and application dependent.
     
  9. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    12,055
    Likes Received:
    3,112
    Location:
    New York

    Good call. Didn't realize GM107 was pumping out 16 pixels per clock from its one rasterizer.

    nVidia's fp16 writes and blends are half-speed so doubling the ROPs is one way to close that gap. Kepler shows some benefit from L2 on blends but is pretty much ROP bound on all cards. Hawaii and Bonaire do much better given full speed fp16 and Tonga is in a whole different class.
     
  10. Ailuros

    Ailuros Epsilon plus three
    Legend Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    9,511
    Likes Received:
    224
    Location:
    Chania
    I could very well be, however try to convince me what you'd need 64 ROPs with relatively as little bandwidth and while you're at it can I have then 96 if not 128 ROPs on GM200?
     
  11. xDxD

    Regular

    Joined:
    Jun 7, 2010
    Messages:
    412
    Likes Received:
    1
    I thought you was referring at the entire specs (cc/tmu/rops) not only at rops
     
  12. Ailuros

    Ailuros Epsilon plus three
    Legend Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    9,511
    Likes Received:
    224
    Location:
    Chania
    With the endless guessing for =/>15 SMMs for the GM204 someone eventually will hit the jackpot LOL :lol:
     
  13. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    12,055
    Likes Received:
    3,112
    Location:
    New York

    If the tile/buffer currently being filled fits fully in L2 effective bandwidth for a blend is 2x off-chip bandwidth. Compression takes that even higher - Tonga's effective bandwidth on blends is over 200% theoretical max.
     
  14. rpg.314

    Veteran

    Joined:
    Jul 21, 2008
    Messages:
    4,298
    Likes Received:
    0
    Location:
    /
    Yes, but you don't have to stream out the amplified geometry. You can stream out just the patches.
     
  15. rpg.314

    Veteran

    Joined:
    Jul 21, 2008
    Messages:
    4,298
    Likes Received:
    0
    Location:
    /
    HSR obviously happens after tessellation.
     
  16. tviceman

    Newcomer

    Joined:
    Mar 6, 2012
    Messages:
    191
    Likes Received:
    0
    I'm sticking with 20 SMM's. GK104 was 4x the cores of GK107, and GF114 was 4x GF107...
     
  17. AnarchX

    Veteran

    Joined:
    Apr 19, 2007
    Messages:
    1,559
    Likes Received:
    34
    16C @ ~1,2GHz, 2MiB L2, 256-Bit 7GHz graphics card in SiSoft DB:
    http://www.sisoftware.eu/rank2011d/...e6dbeacca499ac8af2cffed8bdd8e5d5f380bd85&l=en

    There were some early 6.6GHz memory test runs in May, when GM204 bring-up was shipped at Zauba.

    This cards has some massive Cryptography (High Security) BW and 1:32 DP performance.


    Is it possible to read the L2 cache of NV cards through OpenCL/CUDA/NVAPI? There is even a 3GB / 1.5MiB version - maybe some BW scaling testing.
    2MiB L2 would be a bit below average expectation...
     
    #1977 AnarchX, Sep 7, 2014
    Last edited by a moderator: Sep 7, 2014
  18. Ailuros

    Ailuros Epsilon plus three
    Legend Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    9,511
    Likes Received:
    224
    Location:
    Chania
    As I asked elsewhere: is the application even reading out data correctly?
     
  19. Grall

    Grall Invisible Member
    Legend

    Joined:
    Apr 14, 2002
    Messages:
    10,801
    Likes Received:
    2,176
    Location:
    La-la land
    1:32 DP is very disappointing for distributed computing purposes.
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...