NVidia Ada Speculation, Rumours and Discussion

Discussion in 'Architecture and Products' started by Jawed, Jul 10, 2021.

Tags:
  1. xpea

    Regular

    Joined:
    Jun 4, 2013
    Messages:
    551
    Likes Received:
    780
    Location:
    EU-China
    T2098, TopSpoiler, pharma and 5 others like this.
  2. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    12,055
    Likes Received:
    3,109
    Location:
    New York
    Nice analysis. They estimated a small 10% increase in SM area. That doesn’t seem good enough for a significant boost in RT performance. I hope it’s higher. Compute capability of 8.9 does imply overall SM architecture hasn’t changed much vs Ampere.
     
  3. techuse

    Veteran

    Joined:
    Feb 19, 2013
    Messages:
    1,424
    Likes Received:
    908
    Doesn’t give me much confidence in the 2x+ performance rumors.
     
  4. Rootax

    Veteran

    Joined:
    Jan 2, 2006
    Messages:
    2,400
    Likes Received:
    1,845
    Location:
    France
    They can't just throw more transistors to a problem. At one point, doing better/smarter with the same amount of ressources is the key factor.
     
    Entropy, Picao84 and PSman1700 like this.
  5. Samwell

    Newcomer

    Joined:
    Dec 23, 2011
    Messages:
    149
    Likes Received:
    183
    I would believe otherwise. 10% SM size increase would be massive, if most of it belongs to RT improvements. RT units are just a small part of the SM.
     
    Jawed and PSman1700 like this.
  6. Man from Atlantis

    Regular

    Joined:
    Jul 31, 2010
    Messages:
    960
    Likes Received:
    853
    If everything else scales accordingly it should be doubled 18432 vs 10752, 2.25GHz vs 1.90GHz which is about 2x, but doubt it, hopefully RT performance does though.
     
    PSman1700 likes this.
  7. Picao84

    Veteran

    Joined:
    Feb 15, 2010
    Messages:
    2,109
    Likes Received:
    1,195
    If all this is correct, I hope the generation after Ada brings more efficiency, since power seems to scale linearly with performance. Having the AD104 behaving like a 3090 and consuming the same performance of the latter sounds terrible. I don't plan to change my power supply just to have 2x performance. Horrific.
     
    Rootax likes this.
  8. PSman1700

    Legend

    Joined:
    Mar 22, 2019
    Messages:
    7,118
    Likes Received:
    3,088
    Its not that we direly need a 2x increase in raster performance....
     
  9. arandomguy

    Regular Newcomer

    Joined:
    Jul 27, 2020
    Messages:
    251
    Likes Received:
    355
    Raster performance has much more implications further down the product stack. Unless there's some major departure in approach than performance scaling in general should be fairly uniform across workload types. The gap with GD102 over 104 is significantly larger than that of GA102 over 104 on paper. Based on the numbers GD102 should be in the order of 2x in performance over GD104 across a broad set of workloads, whether that be raster or ray trace gaming. The issue here is then if GD102 is not in the 2x range (I think we should also note that the real numbers at this point should be take as rough) and only say 1.5x range than that has implications of how fast GD104 is relative to GA102 and 104. So while GD102 doesn't "need" that 2x gain over GA102 in raster, GD104 does "need" that 1.5x gain in raster over GA104.

    Based on the current information the configuration of each GPC between GA102 and GD102 is the same. This means there is a 1.7x increase and about a 1.2x clock speed increase (ball parking current numbers) which together works out to about 2.05x from a high level perspective, hence that 2x increase number. On a simplistic level the question is then can the memory sub system feed that adequately? Bandwidth based on 24Gbps GDDR6X announcements would be about 1.25x or 1.15x depending on what you want to compare to. It's worth noting that if we compare Navi 21 against Navi 10 there's a ~2.35x increase fed by a 1.15x increase in bandwidth + cache for I guess real gains in raster in the 2x range.
     
    #489 arandomguy, Apr 16, 2022
    Last edited: Apr 16, 2022
  10. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    12,055
    Likes Received:
    3,109
    Location:
    New York
    By raster do you literally mean rasterizing triangles? What games today are even close to raster limited on a 3070 Ti? Based on the leaked specs AD104 actually has one less rasterizer than GA104. It’s probably a wash given higher clocks on Lovelace.

    Shading performance benefits RT just as much as raster or maybe even more so given lower efficiency due to divergence. That’s the thing to keep an eye on.
     
    PSman1700 likes this.
  11. arandomguy

    Regular Newcomer

    Joined:
    Jul 27, 2020
    Messages:
    251
    Likes Received:
    355
    In the context of conversation in general I'm using it in terms of (and I think everyone else is?) simply dividing gaming performance into non-ray tracing/or "raster" and ray tracing.

    At least that's what I thought the context of the discussion was, whether or not GD102 would have 100% improvements in gaming tests without ray tracing and >100% in gaming tests with ray tracing over GA102.
     
  12. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    12,055
    Likes Received:
    3,109
    Location:
    New York
    Got it. If we assume that Ampere is severely memory starved then there could be a lot of upside to the expanded L2 cache even if the raw flops and bandwidth numbers don’t increase as much. The 6900 XT is ~30% faster than the 3070 Ti in rasterization at 4K and it has 20% less bandwidth. Infinity cache is helping a lot.
     
  13. techuse

    Veteran

    Joined:
    Feb 19, 2013
    Messages:
    1,424
    Likes Received:
    908
    Ampere certainly didn't scale accordingly. 3090 only 13% faster than 3080 with 20% more cores and 23% more bandwidth. Scaling is only likely to get worse at even higher core counts.
     
    #493 techuse, Apr 16, 2022
    Last edited: Apr 16, 2022
    trinibwoy likes this.
  14. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,708
    Likes Received:
    2,132
    Location:
    London
    Is that with or without RT?

    RT scaling is more important, in the end.

    Additionally, if the rumours about larger L2 are real then it could mean that the architecture is re-balanced somewhat to take advantage of the L2 boost.
     
  15. techuse

    Veteran

    Joined:
    Feb 19, 2013
    Messages:
    1,424
    Likes Received:
    908
    It’s a similar level of scaling with or without RT if you sample a large number of games. Maybe 1 or 2% higher with RT.
     
  16. arandomguy

    Regular Newcomer

    Joined:
    Jul 27, 2020
    Messages:
    251
    Likes Received:
    355
    Are you comparing the 3090 FE against the 3080 FE? We need to be mindful here because that isn't strictly the same as comparing GD102-300 (RTX 3090) against GD102-200 (RTX 3080) and how well those extra 20% cores and etc. scales.

    The RTX 3090 FE only has a 10% higher power limit than the 3080 FE (350w vs 320w). In practice this mans there is effectively claw back as the 3090 FE ends up clocked lower. Which results in it being closer to 10% faster as opposed to the 20% faster you expect in theory given the hardware differences.

    TPU unfortunately doesn't review the 3090 FE but if we look at the Zotac model they reviewed which uses the same 350w TDP and stock boost table - https://www.techpowerup.com/review/zotac-geforce-rtx-3090-trinity/30.html

    You can see it has noticeably lower clocks than the RTX 3080 FE - https://www.techpowerup.com/review/nvidia-geforce-rtx-3080-founders-edition/32.html

    Whereas if you use this Asus model - https://www.techpowerup.com/review/asus-geforce-rtx-3090-strix-oc/30.html you see the clock speeds end up in line to attempt to normalize more so for clocks you see that performance is closer to the 20% range you expect given the hardware differences - https://tpucdn.com/review/asus-geforce-rtx-3090-strix-oc/images/relative-performance_3840-2160.png

    I think in the context of Ada versus Ampere we also need to be mindful of the above. GD102 possibly being potentially 2x faster than GA102 is not the same as RTX 4090 being 2x faster than RTX 3090. Or GD104 being 1.5x faster than GA104 isn't the as RTX 4070 being 1.5x faster than RTX 3070.
     
    T2098, xpea, DegustatoR and 3 others like this.
  17. pharma

    Veteran

    Joined:
    Mar 29, 2004
    Messages:
    4,887
    Likes Received:
    4,534
    Nvidia Uses GPU-Powered AI to Design Its Newest GPUs | Tom's Hardware (tomshardware.com)
     
    Father_Murphy, PSman1700 and Jawed like this.
  18. del42sa

    Newcomer

    Joined:
    Jun 29, 2017
    Messages:
    208
    Likes Received:
    137
  19. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    12,055
    Likes Received:
    3,109
    Location:
    New York
  20. Seanspeed

    Newcomer

    Joined:
    Apr 23, 2021
    Messages:
    137
    Likes Received:
    204
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...