NVidia Ada Speculation, Rumours and Discussion

Discussion in 'Architecture and Products' started by Jawed, Jul 10, 2021.

Tags:
  1. xpea

    Regular

    Joined:
    Jun 4, 2013
    Messages:
    551
    Likes Received:
    783
    Location:
    EU-China
    Absolutely and it depends on the workload. With RT and DLSS enabled, Ampere has much better FLOPS per watt (with one full node behind!!!) and FLOP per transistor than RDNA2. Now if you look at pure rasterization performance, RDNA2 has the edge. But we are in 2021, not in 2019 anymore. Pure rasterization is not a problem with this generation.
    The same can be said with MI200. A FP64 monster that looks good at first sight but it targets the dying traditional HPC market, where vast majority of the workloads are replaced by AI/ML. It looks like AMD is always one step behind...

    So what is important? What workload matters in 2021 to judge the FLOPs/watts or FLOPs/transistor metrics on a high-end GPU?
     
    PSman1700 likes this.
  2. techuse

    Veteran

    Joined:
    Feb 19, 2013
    Messages:
    1,426
    Likes Received:
    909
    A full node behind? And rasterization performance is not “fine”. We can still make use of many times more. It’s not a solved issue.
     
  3. CarstenS

    Legend Subscriber

    Joined:
    May 31, 2002
    Messages:
    5,800
    Likes Received:
    3,920
    Location:
    Germany
    PSman1700 and xpea like this.
  4. techuse

    Veteran

    Joined:
    Feb 19, 2013
    Messages:
    1,426
    Likes Received:
    909
    The density and power advantages I've seen stated here for RDNA’s 7nm over Ampere’s 10nm are not those of a full node shrink.
     
  5. no-X

    Veteran

    Joined:
    May 28, 2005
    Messages:
    2,451
    Likes Received:
    471
    After the launch of MI100 AMD stated, that customers asked for a powerful FP64 solution, because there are none available and they need to upgrade. But maybe I'm wrong and you know their customers better than they do.

    As for "the dying HPC market":

    https://www.globenewswire.com/news-...e-of-Cloud-Computing-is-Driving-Industry.html
     
    Lightman likes this.
  6. xpea

    Regular

    Joined:
    Jun 4, 2013
    Messages:
    551
    Likes Received:
    783
    Location:
    EU-China
    Correct. That's what I wanted to say. RDNA2 TSMC 7nm vs Ampere Samsung 8nm which is a slightly improved 10nm (like Turing 12nm was a slightly improved 16nm)
    TSMC 7nm is widely considered a full node improvement vs Samsung 10nm derivative. In terms of peak density, it's ~94MTx/mm2 for TSMC 7nm vs ~51MTx/mm2 for Samsung 10/8nm. Of course we can argue that historically a full node was 4 times the density but these days are over...
     
    PSman1700 likes this.
  7. CarstenS

    Legend Subscriber

    Joined:
    May 31, 2002
    Messages:
    5,800
    Likes Received:
    3,920
    Location:
    Germany
    That's true, but then you are comparing different architectures from different companies at different fabs on different processes. Also design plays a large role. Higher clocks often need some additional transistor investments for example.

    We've got one clue though: Compare transistor density between A100-Ampere and RDNA2, which are at least from the same fab and the same process class - but still there are process variants for 7 nm class.
     
  8. xpea

    Regular

    Joined:
    Jun 4, 2013
    Messages:
    551
    Likes Received:
    783
    Location:
    EU-China
    yeah compared to:
    https://www.idc.com/getdoc.jsp?containerId=prUS48127321
    If we link these 2 reports, when Hopper will launch, AI/ML market will already be more than 10 times the size of the traditional HPC market... And the difference will continue to grow quickly
     
    #228 xpea, Aug 5, 2021
    Last edited: Aug 5, 2021
    DavidGraham and PSman1700 like this.
  9. CarstenS

    Legend Subscriber

    Joined:
    May 31, 2002
    Messages:
    5,800
    Likes Received:
    3,920
    Location:
    Germany
    Is that really more than marketing? I mean Nvidia's saying the same thing about AI. And how much better does MI100 fare wrt FP64 than A100? Is the difference enough for their customers to go from nay to yay?

    There's a whole lot of "cloud" and "services" there. Are you sure, they refer to HPC as the classical "FP64-or-bust" segment?
     
  10. no-X

    Veteran

    Joined:
    May 28, 2005
    Messages:
    2,451
    Likes Received:
    471
    I'm sorry, but the fact that AI/ML market is bigger doesn't support your opinion, that HPC market is dying. It doesn't say anything about evolution of HPC market at all. The article clearly states, that HPC market is growing, so your statement was invalid.

    Of course the entire HPC market isn't based on FP64 accelerators. But the same applies to AI/ML market. It also isn't based purely on GPU-accelerators. The point is that demand for FP64-accelerators haven't disappeared. Nvidia doesn't care, so it does make sense for AMD to take advantage of it. AI/ML market is bigger, but competition is much stronger. Anyway, MI200 is going to be quite interestion solution even for AI.
     
    Lightman likes this.
  11. Bondrewd

    Veteran

    Joined:
    Sep 16, 2017
    Messages:
    1,682
    Likes Received:
    846
    Ehhh. Minor update to matrix cores.
    MI300 yes.
    Even nV dudes like that one!
     
  12. xpea

    Regular

    Joined:
    Jun 4, 2013
    Messages:
    551
    Likes Received:
    783
    Location:
    EU-China
    We don't see the same thing. 10 years ago, FP64 HPC was the only market for accelerators. Today, AI/ML replaced the vast majority of FP64 workloads, to the point that AI/ML is already 8.3 times bigger than FP64 HPC. Whatever FP64 HPC is growing or not, it became insignificant compared to AI/ML, thus my term "dying".
     
  13. DegustatoR

    Veteran

    Joined:
    Mar 12, 2002
    Messages:
    3,242
    Likes Received:
    3,405
    I don't know. Going from specs the throughputs for FP32 and FP16 are the same on Ampere. Then you should presumably be able to run them concurrently (you'd need two async workloads of course). But how this is happening and with what speeds should be tested and I haven't seen any data on this.
     
    PSman1700 likes this.
  14. troyan

    Regular

    Joined:
    Sep 1, 2015
    Messages:
    605
    Likes Received:
    1,126
    Only flops per watt matters because transistors are cheap and compute units are very effcient. Biggest problem is data movement.

    FP64 is an ineffcient way to calculate data. Using mixed precision in cases where FP64 isnt necessary increases effciency by x-times. Why settle with 1 Exaflop when you can scale to 32 Exaflops?

    That makes single purpose products like AMD's CDNA less competitive and cost ineffective for most companies and cloud providers. nVidia's datacenter business exploded with Volta (HPC and DL training) and Turing (DL inference), now with GA100 they can tackle every workload with one product.

    The same reason why RDNA2 failed: Being good at "pure" rasterizing isnt good enough today.
     
    #234 troyan, Aug 5, 2021
    Last edited: Aug 5, 2021
  15. JoeJ

    Veteran

    Joined:
    Apr 1, 2018
    Messages:
    1,523
    Likes Received:
    1,772
    Agree. But the question remains: What to do with insane 75tf GPU?
    Scaling console games up this far is pointless.
    Multisampling is not efficient.
    Maxing out RT just to bring it to its knees is not efficient either i guess (we'll see if / how they'll improve).
    So we need to add something new not present in the console game we aim to port.
    Which could be (summing up my previous proposals) volumetric stuff (fog simulation, lighting), layered framebuffer to address SS hacks shortcomings, fancy SM based area shadow techniques. And ofc. GI if compute can do this better than RT. What else?
    No matter what, there should be more than enough async compute work around to compensate speculated issues from running traditional gfx pipeline on chiplets. So even if there is a problem at all, it feels pretty rhetorical to me (would change if chiplets move to entry/mid level).
    Even if we just scale up RT, the BVH building work on very detailed geometry alone would already provide shitloads of async compute work.

    So i don't think there'll be a problem to utilize the GPU, *if* we do this extra work.
    It depends on how many such GPUs get sold to gamers, which should depend on the visual improvements we are able to achieve by cranking up, in relation to the high price of the HW.
    Feels crazy, because on the other hand we surely can sell more games with putting the focus on scaling down (Series-S, SteamDeck, Switch, poor mans PC).
    The expected issues form chiplets yes or no won't be a problem, but the increasing variety in over- and underspecced HW is. Multi gen and platform games become even more expensive to make and more compromised, while the lower and higher ends on HW become more niche, so hard to say what's worth it.
     
  16. yuri

    Regular

    Joined:
    Jun 2, 2010
    Messages:
    283
    Likes Received:
    296
    TBH aiming CDNA (Vega) on pure HPC is not that weird given the SW side of the business. Targeting AI/ML requires top-notch SW. AMD's SW is traditionally far from that.
     
    DavidGraham and xpea like this.
  17. DegustatoR

    Veteran

    Joined:
    Mar 12, 2002
    Messages:
    3,242
    Likes Received:
    3,405
    There is no such question. You forget that 75TF top end means ~25TF low end, and even that will not be enough to run games from last year at maximum settings. The lineup isn't made out of one GPU.

    And even beyond that scaling RT and compute based raster is far from over. Games aren't really hitting the point at which we can say "well, we don't need better graphics now".
     
    DavidGraham, Jawed and PSman1700 like this.
  18. Bondrewd

    Veteran

    Joined:
    Sep 16, 2017
    Messages:
    1,682
    Likes Received:
    846
    it means everything gets more expensive
     
  19. CarstenS

    Legend Subscriber

    Joined:
    May 31, 2002
    Messages:
    5,800
    Likes Received:
    3,920
    Location:
    Germany
    MI200 probably will be, yes. Depending of course of the competition at the time when it actually goes to market and not preliminary shipments for deployment tests. But that's neither Lovelace nor Hopper.
     
  20. Bondrewd

    Veteran

    Joined:
    Sep 16, 2017
    Messages:
    1,682
    Likes Received:
    846
    Nah.
    That's now.
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...