Nvidia's Next-Generation RTX GPU [3060, 3070, 3080, 3090]

Discussion in 'Architecture and Products' started by Shortbread, Sep 1, 2020.

Tags:
  1. techuse

    Regular Newcomer

    Joined:
    Feb 19, 2013
    Messages:
    743
    Likes Received:
    440
    Perhaps Nvidia's multithreading supersedes any efficiency advantages. Without specific dev effort AMD DX11 CPU rendering stream is still fully single threaded no?
     
  2. Rootax

    Veteran Newcomer

    Joined:
    Jan 2, 2006
    Messages:
    2,061
    Likes Received:
    1,493
    Location:
    France
    If it's a l3 cache problem, 109xx X on x299 should be impacted too, with their mesh architecture ?
     
  3. DegustatoR

    Veteran

    Joined:
    Mar 12, 2002
    Messages:
    2,218
    Likes Received:
    1,626
    Location:
    msk.ru/spb.ru
    DX11 maybe but what about Vulkan?

    Also from the reports of assets loading issues on NV h/w when running CPU limited I do wonder if the initial idea of this being due to DX12 (FL12_0 at least?) resource binding model not being suitable for NV h/w is the culprit here, with NV's DX12 driver doing some pre-processing on such titles to win more performance in GPU limited scenarios? Would be interesting to see what would happen in these games on some really old driver - albeit it can be hard to run some old GPU like 980Ti CPU limited even in 720p I guess.
     
    PSman1700 likes this.
  4. troyan

    Regular Newcomer

    Joined:
    Sep 1, 2015
    Messages:
    331
    Likes Received:
    636
    As far as i see it in most games* DX12 has a overhead within the engine which doesnt exist with DX11 on nVidia hardware. So this is a fixed time which cant be reduced or eliminated with more cores, higher clocks or better IPC. When a game doesnt hammering the DX11 driver with workload (Hitman 2 at the beginning of the first main mission) DX11 is more efficient from a overhead perspective in CPU limited scenarios. And without proper multi-threading like with Control and WoW DX11 delivers even much more performance on processors with >6 cores.

    * A positve example is "Pumpkin Jack". The developer switched to nVidia's UE4 branch for Raytracing and DX12 is 15% faster than DX11 at high framerates.
     
    #804 troyan, Mar 17, 2021
    Last edited: Mar 17, 2021
    PSman1700, DavidGraham and iroboto like this.
  5. Scott_Arm

    Legend

    Joined:
    Jun 16, 2004
    Messages:
    14,761
    Likes Received:
    6,896
    The dx11 and dx12 drivers aren't the same. They have different APIs. It could be that the way AMD's dx12 driver is written that it has a slightly more cache friendly access pattern.
     
  6. techuse

    Regular Newcomer

    Joined:
    Feb 19, 2013
    Messages:
    743
    Likes Received:
    440
    Our best bet is that enough testing happens and Nvidia issues a response.

    GameGPU weighs in.
     
    #806 techuse, Mar 17, 2021
    Last edited: Mar 18, 2021
  7. Scott_Arm

    Legend

    Joined:
    Jun 16, 2004
    Messages:
    14,761
    Likes Received:
    6,896
    @techuse Yah, I just read the gamegpu article. The translation I read was pretty poor, but their testing with ryzen 3600 showed pretty much the same thing, with a vega64 beating the rtx3090 by quite a bit when lowering settings to be cpu limited. From HBU's testing the 5600X didn't really seem to have any problems with having the 3090 keeping close to the 6900XT. I really think it's more than just clock speed and probably has to do with memory latency.
     
    PSman1700, Lightman and DavidGraham like this.
  8. DavidGraham

    Veteran

    Joined:
    Dec 22, 2009
    Messages:
    3,511
    Likes Received:
    4,129
    Yeah here it is:
    https://gamegpu.com/блоги/sravnenie-protsessorozavisimosti-geforce-i-radeon-v-dx12

    This a monumental WTF moment right there, a Vega 64 should under no circumstances be faster than a 3090 no matter what, but it happens in those DX12 games with the Ryzen 3600X. The question now becomes: does an old Core i7/i5 exhibit the same problems?
     
    dskneo, PSman1700, Lightman and 2 others like this.
  9. Scott_Arm

    Legend

    Joined:
    Jun 16, 2004
    Messages:
    14,761
    Likes Received:
    6,896
    It does with a i3 10100 (64K L1, 256K L2, 6MB L3) and ryzen 3600 (64K L1, 512K L2, 16MB L3 with access latency penalties for half), but not really with a ryzen 5600 (64K L1, 512K L2, 32MB L3 with no penalties), which is why I think it's cache related. (edit: also that anecdotal but very similar BFV user video with an old i7)

    Edit: Look at the gamegpu aida scores
    L1 1.3ns
    L2 3.9ns
    L3 14.5ns (will be worse if you read across CCX boundary)
    RAM 85.4ns

    All it would take is for the nvidia to hit higher levels in cache more often or RAM more often and you can get a 10-20% performance difference.

    Modern games probably hit caches hard which will cause more misses for other threads. Open world games like Watch Dogs would probably be the worst. You might not see the issues if you're gpu limited because maybe the memory system is able to keep up since the cpu is waiting on the gpu. If you become cpu limiited, suddenly the cpu threads start going as fast as they can and maybe these smaller or higher latency caches cause the nvidia driver a little more pain.
     
    #809 Scott_Arm, Mar 18, 2021
    Last edited: Mar 19, 2021
    BRiT likes this.
  10. iroboto

    iroboto Daft Funk
    Legend Regular Subscriber

    Joined:
    Mar 6, 2014
    Messages:
    13,044
    Likes Received:
    15,796
    Location:
    The North
    wait.. so the reason nvidia is doing worse on cpu limited scenarios all comes down to the cache of the CPU?

    So drivers are basically not cache hit friendly? weird. I would have figured that would have been the lowest hanging fruit for them.
     
  11. Frenetic Pony

    Regular Newcomer

    Joined:
    Nov 12, 2011
    Messages:
    682
    Likes Received:
    363
    Rockstar makes like a billion dollars a year on GTAV Online and never bothered to parse a single JSON file in a timely manner and so had ungodly load times for customers and the devs alike. At this point I just kind of assume the obvious can be missed even for huge, really successful companies.
     
    T2098, Kej, PSman1700 and 3 others like this.
  12. techuse

    Regular Newcomer

    Joined:
    Feb 19, 2013
    Messages:
    743
    Likes Received:
    440
    I think the most likely scenario is they just don't care enough to fix it. Given how benchmarks are conducted why would they?
     
    Cuthalu likes this.
  13. Scott_Arm

    Legend

    Joined:
    Jun 16, 2004
    Messages:
    14,761
    Likes Received:
    6,896
    Just a guess, but from 3600 to 5600 the biggest advancements for gaming were improving cache and cache latency. Maybe AMD is just a little more efficient in terms of cache alignment, data access patterns because of all the time they spent working with the dog shit Jaguar cpus for console lol.
     
    Lightman likes this.
  14. Putas

    Regular Newcomer

    Joined:
    Nov 7, 2004
    Messages:
    533
    Likes Received:
    176
    Ryzen 3300X should be interesting for latencies.
     
    Lightman and CarstenS like this.
  15. Lurkmass

    Regular Newcomer

    Joined:
    Mar 3, 2020
    Messages:
    309
    Likes Received:
    350
    FWIW I don't think any of the discussion behind multithreading or software vs hardware scheduler crap in the background are related to the reasons why NV sees higher overhead on D3D12 ...

    Root-level views in D3D12 exists to cover the use cases of the binding model that would bad on their hardware but nearly no developers use them because they don't have bounds checking so they hate using the feature for the most part! This ties in with the last sentence but instead of games using SetGraphicsRootConstantBufferView, some games will spam CreateConstantBufferView just before every draw which will add even more overhead. It all starts coincidentally adding up when developers are abusing all these defects behind D3D12's binding model.

    Bindless on NV (unlike AMD) has idiosyncratic interactions where they can't use constant memory with bindless CBVs so they load the CBVs from global memory which is a performance killer (none of this matters on AMD) ...
     
    T2098, Kej, PSman1700 and 9 others like this.
  16. techuse

    Regular Newcomer

    Joined:
    Feb 19, 2013
    Messages:
    743
    Likes Received:
    440
    Why do you consider DX12 binding model defective?
     
  17. Lurkmass

    Regular Newcomer

    Joined:
    Mar 3, 2020
    Messages:
    309
    Likes Received:
    350
    I don't in theory but it's 'how' developers are 'using' it that makes it defective since that just means in practice that D3D12 binding model isn't all that different from Mantle's binding model which was pretty much only designed to run on AMD HW so it becomes annoying for other HW vendors trying to emulate this behaviour to be consistent with their competitor's HW ...

    Microsoft revised the binding model with shader model 6.6 but I don't know if that's in response to what they saw with it's potential to be misused in games ...
     
    Kej, PSman1700, iroboto and 5 others like this.
  18. Scott_Arm

    Legend

    Joined:
    Jun 16, 2004
    Messages:
    14,761
    Likes Received:
    6,896
    @Lurkmass It does seem like this "issue" can pop up on dx11 games.



    This is an anecdotal account, and I'd like to see more testing of dx11, but this user has a huge performance regression after upgrading from r9 390 to a 1660ti on his old i7 playing battlefield v firestorm in dx11. There's a follow up video where he says he "fixed" his issue by giving his cpu a 2% overclock, overclocking his memory substantially plus improving timings, and disabling the fullscreen optimizations setting for battlefield v. He gets a massive improvement in performance much greater than the sum of all of those things.
     
  19. Scott_Arm

    Legend

    Joined:
    Jun 16, 2004
    Messages:
    14,761
    Likes Received:
    6,896
    @Lurkmass Oh, global memory is heap allocated, so cache unfriendly if you're spamming allocations.

    If anyone wants to know why cache hit rates matter, and why non-linear allocations in heap matter, start here

     
    #819 Scott_Arm, Mar 19, 2021
    Last edited: Mar 19, 2021
    iroboto, Lightman, sonen and 2 others like this.
  20. T2098

    Newcomer

    Joined:
    Jun 15, 2020
    Messages:
    48
    Likes Received:
    100
    This one makes sense to me. Nvidia's herculean software engineering effort to extract parallelism/multi-threading in DX11 is amazing, but it can't possibly come for free.

    Especially if it's not hard coded in the driver and it's analyzing everything on the fly at run time and chopping things up into bits that it can spread across multiple CPU cores, you're basically running a small compiler at the same time the game is running, that has its own memory and cache footprint.

    If you've got CPU cores, cache size, and/or CPU memory bandwidth to burn, then this is probably an excellent tradeoff. If any of those 3 things are in short supply, you now have 2 things running concurrently competing for those same resources.

    On a dual or quad core CPU where the inherent multi-threading built into the game engine + driver was probably enough to fully load the CPU down anyway, NV's multithreading magic is probably going to hurt performance a fair bit, burning away all those resources that might have been fully utilized by a more naive driver (Intel/AMD) that just let the DX11 code run as is.

    On a powerhouse system with fast RAM, large caches, and lots of CPU cores (but still a hard ceiling on single core performance) then you want to let NV's DX11 driver run wild and try to spread the load across as many cores as possible, even if by doing so it consumes an entire CPU core worth of overhead.
     
    xpea, Lightman, Malo and 1 other person like this.
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...