Nvidia's Next-Generation RTX GPU [3060, 3070, 3080, 3090]

Discussion in 'Architecture and Products' started by Shortbread, Sep 1, 2020.

Tags:
  1. PSman1700

    Veteran Newcomer

    Joined:
    Mar 22, 2019
    Messages:
    4,546
    Likes Received:
    2,084
    Intel still at the top with their i9.
     
  2. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    11,206
    Likes Received:
    1,775
    Location:
    New York
    Sweet I just upgraded from a 4790k to a 5950x. Would be nice to see clock normalized results too.

    Some of those numbers don’t look right though. Why is a 10700k so much faster than a 9900k. They’re basically the same chip.
     
    #342 trinibwoy, Dec 10, 2020
    Last edited: Dec 10, 2020
  3. iroboto

    iroboto Daft Funk
    Legend Regular Subscriber

    Joined:
    Mar 6, 2014
    Messages:
    13,018
    Likes Received:
    15,763
    Location:
    The North
  4. DegustatoR

    Veteran

    Joined:
    Mar 12, 2002
    Messages:
    2,206
    Likes Received:
    1,599
    Location:
    msk.ru/spb.ru
    Note that the benchmark is done in 720p and while it shows the CPU scaling it's highly unlikely to be close to what you'll get out of these CPUs in 1080p and above - higher resolutions will be GPU limited mostly.
     
    Kyyla, BRiT, PSman1700 and 1 other person like this.
  5. iroboto

    iroboto Daft Funk
    Legend Regular Subscriber

    Joined:
    Mar 6, 2014
    Messages:
    13,018
    Likes Received:
    15,763
    Location:
    The North
    Indeed! I just was blown away how fast CPUs moved since my purchase.
     
    PSman1700 likes this.
  6. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    11,206
    Likes Received:
    1,775
    Location:
    New York
    Went to Microcenter yesterday and saw a guy walking out with the last 3060 Ti. They got a batch of cards in the morning but never updated inventory on the website. Seems the only way to snag a card at MSRP is to camp out everyday or just plain luck.
     
    Lightman and PSman1700 like this.
  7. arandomguy

    Newcomer

    Joined:
    Jul 27, 2020
    Messages:
    125
    Likes Received:
    189
    They aren't all using the same memory speed (and I guess latency, but timings are specified).

    It looks like they are testing each CPU with "stock" memory speeds (maybe even "stock" JEDEC timings? Not specified.). Except at the top with the 2 OC results.

    On a related note the slight scaling for the >8c/16t CPUs may not necessarily be due to core count but larger caches. Might need testing using "emulated" lower core counts which would maintain cache.

    I always forgot the x08 GPU for some reason.

    Also Nvidia might have another chip that was done at Samsung as well. It was never confirmed but Orin is speculated to use Samsung's 8nm automotive node.
     
    #347 arandomguy, Dec 10, 2020
    Last edited: Dec 10, 2020
    glow, CarstenS and T2098 like this.
  8. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    11,206
    Likes Received:
    1,775
    Location:
    New York
    https://www.techspot.com/article/2164-cyberpunk-benchmarks/

    “We're also keen to look at CPU performance as the game appears to be very demanding on that regard. For example, we saw up to 40% utilization with the 16-core Ryzen 9 5950X, and out of interest we quickly tried the 6-core Ryzen 5 3600 as that’s a very popular processor. We're happy to report gameplay was still smooth, though utilization was high with physical core utilization up around 80%.”

    Coverage of RT and DLSS was reserved for a follow up article. At least HWUB is consistent :-D
     
    #348 trinibwoy, Dec 11, 2020
    Last edited: Dec 11, 2020
    Lightman likes this.
  9. BRiT

    BRiT (>• •)>⌐■-■ (⌐■-■)
    Moderator Legend Alpha

    Joined:
    Feb 7, 2002
    Messages:
    18,785
    Likes Received:
    21,087
  10. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    11,206
    Likes Received:
    1,775
    Location:
    New York
    My local Microcenter got a pretty big drop this morning. Lots of 3090’s, 80’s and 70’s from different AIBs. I would guess at least 50 cards total.

    I snagged an ASUS 3090. Officially next gen ready!
     
    Kej, Florin, Babel-17 and 10 others like this.
  11. Man from Atlantis

    Regular

    Joined:
    Jul 31, 2010
    Messages:
    920
    Likes Received:
    734
    I just finished downloading the Cyberpunk.

    ALU/FMA rate is around 1/2, slightly lower than Control's 3/5.

    [​IMG]
     

    Attached Files:

    Lightman, McHuj, Scott_Arm and 10 others like this.
  12. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,286
    Likes Received:
    1,551
    Location:
    London
    Looking at the highlighted "SM Occupancy" row, the "background" behind the mostly green and mustard areas is either dark or light grey. The key seems to imply that those areas are "Active SM unused warp slots" and "Idle SM unused warp slots". Is that the correct interpretation?

    There's 3.6ms of "Dispatch Rays", which is about 14.5% of the frame time.
     
    Lightman, sonen and Man from Atlantis like this.
  13. Man from Atlantis

    Regular

    Joined:
    Jul 31, 2010
    Messages:
    920
    Likes Received:
    734
    Yeah, it seems correct. Also BuildRaytracingAccelerationStructure takes 2.01ms which in total with dispatch RT takes 22.3% of the frame time. DLSS Quality (barrier 958) takes 0.94ms as well.
     
    sonen, pharma and OlegSH like this.
  14. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    11,206
    Likes Received:
    1,775
    Location:
    New York
    Im surprised at the relatively low memory activity during ray dispatch. VRAM is barely touched and L2 hit rates aren’t spectacular. A piece of the puzzle seems to be missing.

    Edit: actually L2 rates look decent. I suppose that means the BVH doesn’t require that much space and can fit into the relatively small L2.
     
    sonen and LeStoffer like this.
  15. Wesker

    Newcomer

    Joined:
    May 3, 2008
    Messages:
    158
    Likes Received:
    76
    RE: Cyberpunk 2077 performance:

    There's a fix/edit for those with Ryzen CPUs to increase minimum frames by around 10-15%:



    tl;dr: there seems to be an issue with the game running a CPU check based on GPUOpen. It basically causes the game to assign non-Bulldozer AMD CPUs less schedulers threads. People suspected it was due to some kind of gimping due to ICC, but this is not the case.

    Hopefully this'll be addressed in an upcoming patch.
     
    sonen likes this.
  16. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,286
    Likes Received:
    1,551
    Location:
    London
    If we were developers with the code at hand, I suppose we could go into more detailed profiling here, e.g. all of the different ray tracing settings, singly and combined, versus off, and various resolutions and DLSS settings.

    It would turn into quite a quagmire!

    There are three "can't launch" metrics:
    • 16.7% of SM Pixel Warp Can't Launch
    • 7.9% of SM Pixel Warp Can't Launch, Register Limited
    • 18.3% of SM Compute Warp Can't Launch, Register Limited
    I'm sad to see that "register limited" is still a non-trivial thing on Ampere, 26.2%.
     
    Lightman likes this.
  17. Rootax

    Veteran Newcomer

    Joined:
    Jan 2, 2006
    Messages:
    2,047
    Likes Received:
    1,477
    Location:
    France
    What the gpu or driver do when the "can't launch" event is happening ?
     
  18. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,286
    Likes Received:
    1,551
    Location:
    London
    Rys woz there:

    Updated getDefaultThreadCount() to better represent other vendor posi… · GPUOpen-LibrariesAndSDKs/cpu-core-counts@49a6e73 (github.com)

    from 27 September 2017 (about half a year after Ryzen first launched). This commit inverts the behaviour on Intel CPUs (is that good, was that intentional?) and seems to knobble Ryzen in this game (isn't there a thread about Cyberpunk 2077 performance yet?). Maybe original Ryzen liked it this way?

    At least developers can change the code.

    Pull request has been opened:

    Fix for incorrect thread count reporting on zen based processors by samklop · Pull Request #3 · GPUOpen-LibrariesAndSDKs/cpu-core-counts (github.com)

    which references this page and slide deck:

    CPU Core Count Detection on Windows® - GPUOpen
    gdc_2018_sponsored_optimizing_for_ryzen.pdf (gpuopen.com)

    with a justification: "But games often suffer from SMT contention on the main thread".

    It's notable that "The sample code exists for Windows XP and Windows 7 now, with Windows 10 coming soon. The Windows 10 sample will take into account how CPU sets are used in Game Mode, a new feature released as part of the big Windows 10 Creators Update earlier this year."

    No change for Windows 10 was ever made. There's no indication for the reason: not worthwhile or makes no difference... The commit:

    Updated Readme to indicate Windows 10 support · GPUOpen-LibrariesAndSDKs/cpu-core-counts@7c2329a (github.com)

    merely implies that Windows 10 is supported.
     
    Lightman, Wesker and Malo like this.
  19. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    11,206
    Likes Received:
    1,775
    Location:
    New York
    I believe the can’t launch numbers represent the difference between the number of active warps that have been launched and the theoretical max. It’s not necessarily a bad thing. You can have full utilization of the chip without launching the max number of warps possible though more is usually better to ensure you have a lot of memory requests in flight.

    Note this only refers to the number of warps being tracked by the SMs. It doesn’t refer to instruction issue each clock which is the actual utilization of the chip.
     
    Rootax likes this.
  20. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,286
    Likes Received:
    1,551
    Location:
    London
    In theory the GPU and the driver can each notice these problems occurring and alter the way that the individual units of the GPU are being used. This might be as simple as changing the rules for priority. It's an extremely difficult subject, because the time intervals are miniscule: it might not be possible to change something quickly enough to get a useful improvement.

    You could argue that GPUs are "too brittle" and there are dozens of opportunities for bottlenecks to form due to the design of the hardware and the count of each type of unit (rasteriser, SM, ROP and all the little buffers). The picture posted earlier shows dozens of metrics. Most of those represent an opportunity for a bottleneck. There's so many little machines inside the "GPU machine", and any of those can cause serious bottlenecks.

    This is why I was a big fan of Larrabee. The opportunities for bottlenecks are almost entirely, solely, software. Sure, you have limitations on clock speed, core count and cache sizes, but there's also a massive opportunity to work smarter not harder. Hardware, on the other hand, does some simple things extraordinarly efficiently and it's really hard to make software keep up and the opportunities for bugs and unexpected behaviour generally seem worse with software. Hardware is efficient partly because it forces a simplified usage model.

    The original design late last century for triangle-based rendering was about a simplified usage model to make a few million transistors do something useful. As time goes by and there's more transistors to play with, it tends to make sense to build GPUs from pure compute (like Larrabee). We're still getting there...

    This is similar to the "mesh shader" question. It's difficult for a developer to create mesh shaders that perform as well as the hardware's vertex shader functionality (and including tessellation and geometry shader stages). But if you're smart with your mesh shader design, instead of merely trying to directly replicate what the hardware units do, you can get substantial performance gains. A lot of developers won't want to write mesh shaders, so there needs to be a solution for them. AMD seems to be using "primitive shaders" at least some of the time for developers who don't want to write mesh shaders. Not forgetting that primitive shaders and mesh shaders are much newer designs than pretty much every game that's ever been released.
     
    pharma, Lightman, PSman1700 and 2 others like this.
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...