NVIDIA Maxwell Speculation Thread

Discussion in 'Architecture and Products' started by Arun, Feb 9, 2011.

Tags:
  1. Novum

    Regular

    Joined:
    Jun 28, 2006
    Messages:
    335
    Likes Received:
    8
    Location:
    Germany
    DX12 should work on all Kepler an Maxwell chips, just not with all the fancy features, but all the CPU side performance improvements.
     
  2. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    10,245
    Likes Received:
    4,465
    Location:
    Finland
    All the "CPU side performance imprevements" work on Fermi & up, GCN and up and Intel 7th or 7.5th gen and up (can't remember exact Intel gen for sure)
     
  3. Abwx

    Newcomer

    Joined:
    Sep 30, 2014
    Messages:
    5
    Likes Received:
    0
    IPC is about the same as previous gen, clocks were increased by 25% and 32% for the 980 and 970 respectively, using the former as exemple with 2048 EUs you ll get the equivalent of 2048 x 1.25 = 2560 EUs at reference frequency, and with likely better scaling than increasing the EUs count.
     
  4. keldor314

    Newcomer

    Joined:
    Feb 23, 2010
    Messages:
    132
    Likes Received:
    13
    You have to remember that Kepler's ALUs can only be fully utilized at IPC > 1.5 in the code (4 threads can dual issue, but only 6 ALUs behind it). This means that the 2560 case represents a best case for Kepler where the workload has lots of available ILP.

    In the real world, Kepler does worse, since a lot of the time there's no other instruction that can be dual issued. This is why GTX 980 outperforms GTX 780Ti much of the time (and sometimes significantly), even though the 780 has 2880 ALUs.

    In worst case Kepler code, we would have 2048 Maxwell ALUs * 1.25 clock * 1.5 from missing IPC on Kepler = 3840 Kepler ALUs. This is closer to what we see in some compute benchmarks.

    (Note: This is disregarding the memory system and assuming compute bound code. With memory bound... I have no idea. Kepler has more bandwidth, while Maxwell has a bigger cache. Depends on the workload!)
     
    #2504 keldor314, Oct 28, 2014
    Last edited by a moderator: Oct 28, 2014
  5. spworley

    Newcomer

    Joined:
    Apr 19, 2013
    Messages:
    146
    Likes Received:
    190
    Kepler was actually limited by its operand collector, which can only fetch 3 registers per clock. That's OK if you're dual issuing say an add and a store or SFU. Or if you're doing two-argument ALU ops like adds, the collector can amortize the "spare" register and dual issue every other clock keeping all 6 ALUs busy. Kepler's SMX design tries to maximize the chance of dual issue, at the expense of potentially idle ALUs.

    But leads to inefficiency if you're executing a 3-input FMA requiring all three register inputs, leaving no chance for any dual issue. (And as you note, that case leaves 1/3 of your ALU's idle.) So the IPC becomes crucially dependent on the code having a low FMA density. That density in practice may be pretty high for both graphics and GPGPU.

    Maxwell simplifies this. There's only one ALU per scheduler, so it can either dual issue an ALU op and a store/SFU, OR it can do an FMA. In both cases all the ALUs are always occupied. So it forgoes some dual-issue chances, but keeps all its ALUs busy.

    In hindsight, Maxwell's "use all the ALUs efficiently" design was a better design than Kepler's "use all the schedulers efficiently". It probably was not so obvious back in 2009 or so when Kepler was designed.
     
  6. pharma

    Veteran

    Joined:
    Mar 29, 2004
    Messages:
    4,894
    Likes Received:
    4,548
    8GB 980 GTX looming .... latest rumour.

    http://www.gdm.or.jp/voices/2014/1029/90795
     
  7. xDxD

    Regular

    Joined:
    Jun 7, 2010
    Messages:
    412
    Likes Received:
    1

    I think that 8gb in gtx980 is useless: if you want to game at 4k (where 8gb could be usefull) better wait gm200 (or even more...)
     
  8. pharma

    Veteran

    Joined:
    Mar 29, 2004
    Messages:
    4,894
    Likes Received:
    4,548
    Agreed. I think the 8gb model is targeting the gamers with 4k monitors, but should also be useful for gamers using DSR to downsample 4k resolutions to their current resolution.
     
  9. Blazkowicz

    Legend

    Joined:
    Dec 24, 2004
    Messages:
    5,607
    Likes Received:
    256
    It feels useful for off-line 3D rendering and other specific uses.
     
  10. swaaye

    swaaye Entirely Suboptimal
    Legend

    Joined:
    Mar 15, 2003
    Messages:
    9,045
    Likes Received:
    1,119
    Location:
    WI, USA
    I haven't seen a recent game hit even 3GB VRAM utilization even with DSR 2720x1536 or MSAA/SSAA. 4GB seems the ideal amount to pay for with current GPU performance.
     
  11. spworley

    Newcomer

    Joined:
    Apr 19, 2013
    Messages:
    146
    Likes Received:
    190
    An 8GB GTX 980 will make many GPGPU developers very happy!


    Though that would be overshadowed by the chance that the Tesla M20 will be announced or perhaps even released on November 16, the first day of Supercomputing 2014. K20 launched at SC 12, though was announced 6 months earlier.
     
  12. mustrum

    Regular

    Joined:
    Dec 26, 2002
    Messages:
    288
    Likes Received:
    0
    Playing at 5920x1080 i have many games to completely use up all the 4 GIGs of my R290.
    The last one is been playing doing this were Star Point Gemini, Skyrim (yeah texture mods), Star Citizen but there are many more.
     
  13. max-pain

    Regular

    Joined:
    Feb 13, 2004
    Messages:
    309
    Likes Received:
    2
    Wolfenstein uses 3GB VRAM (max detail @ 1920x1080).
    Ryse uses even more, 3.4 GB VRAM (max detail without ssaa @ 1920x1080).
     
  14. CarstenS

    Legend Subscriber

    Joined:
    May 31, 2002
    Messages:
    5,800
    Likes Received:
    3,920
    Location:
    Germany
    I've seen over 6 (before a patch even over 8) Gigabytes of VRAM used in Watch Dogs - Ultra-HD, Ultra-Textures and 8x MSAA. :)
     
  15. Babel-17

    Veteran

    Joined:
    Apr 24, 2002
    Messages:
    1,073
    Likes Received:
    307
    Edit: Woops, I was assuming that the new color compression methods improved memory utilization efficiency and not just bandwidth. That wouldn't be the case?
     
  16. silent_guy

    Veteran Subscriber

    Joined:
    Mar 7, 2006
    Messages:
    3,754
    Likes Received:
    1,382
    It wouldn't be the case if you still need random access to the frame buffer. Which, I think, you still need.
     
  17. swaaye

    swaaye Entirely Suboptimal
    Legend

    Joined:
    Mar 15, 2003
    Messages:
    9,045
    Likes Received:
    1,119
    Location:
    WI, USA
    Maybe that's why it stutters so much.

    MSAA is so useless these days. I experimented with all of the options in Watch Dogs the other day and none of them are particularly impressive. I think TXAA is perhaps the most effective though. Sometimes I feel like just not using AA at all instead of all the halfway effective options.
     
  18. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,579
    Likes Received:
    4,799
    Location:
    Well within 3d
    It could make it somewhat worse. Lossless compression will always have data it cannot compress, and then it must store at least some extra data saying it could not be compressed.
    Due to fluctuating compression rates, the safest course would seem to be allocating for worst-case consumption, rather than finding out at a bad time that there's no room for the overflowing framebuffer.
     
  19. Rys

    Rys Graphics @ AMD
    Moderator Veteran Alpha

    Joined:
    Oct 9, 2003
    Messages:
    4,182
    Likes Received:
    1,579
    Location:
    Beyond3D HQ
    You don't have to store data in the uncompressed case. The decompressor could look for block compression headers. If they're not there, pass the block on unchanged because it's uncompressed.
     
  20. silent_guy

    Veteran Subscriber

    Joined:
    Mar 7, 2006
    Messages:
    3,754
    Likes Received:
    1,382
    What if the uncompressed pixels happen to have the same value as the headers?

    You can't escape the pigeon hole principle...
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...