NVidia Ada Speculation, Rumours and Discussion

Discussion in 'Architecture and Products' started by Jawed, Jul 10, 2021.

Tags:
  1. trinibwoy

    trinibwoy Meh Legend

    I don’t see how AD102 gets anywhere near 100 TF. H100 is 60 TF @ 700W on the same node with HBM3. If it does though I would bet on going wide not fast. Nvidia knows they’re up against MCM so AD102 could be another >750mm^2 chip like TU102.

    This is random speculation but adding two more partitions to each SM would do the trick. 144 SMs x 6 x 32 = 27648 “cores”. 1800 MHz boost clock gets you close to 100 TF. Occupancy would be a challenge unless L1 gets a capacity bump though.
     
    PSman1700 likes this.
  2. Qesa

    Qesa Newcomer

    Not that I necessarily think AD102 will be 100 TF, but H100's 60 TF is for double precision, and it expends a ton of power on nvlink. Sorta like saying A100 is 20 TF at 400 W, so GA102 over 30 TF is unrealistic.
     
  3. no-X

    no-X Veteran

    Sorry, that's fallacy. The discussion was about manufacturing price of RTX 3090 Ti. The fact, that Nvidia sold some fully enabled GA102s in 2020 as a part of $4650 product (A6000) doesn't prove that they are cheaper to manufacture than fully enabled Navi 21, which was sold in 2020 as a part of $999 product.
     
  4. Dangerman

    Dangerman Newcomer

    A100 was 19 FP32 Teraflops while a 3090 Ti is just technically under 40 Teraflops. Saying AD102 can never get to just above 100 TF FP32 because of H100 doesn't make sense (if anything FP32 60TF for H100 kinda makes 103 TF AD102 SKU more likely though still be a hungry, hungry, hungry hippo at 600W).

    Though we don't know if AD102 has had extensive redesigns as it seems Nvidia did a lot of changes last year to Lovelace probably to compete against RDNA 3. I mean we could see a "4090" be like 150 TF due to changing the SMs again while still pushing 600 watts.
     
  5. DegustatoR

    DegustatoR Veteran

    Retail prices rarely have any relation to manufacturing costs.
     
    PSman1700 likes this.
  6. trinibwoy

    trinibwoy Meh Legend

    Not quite. GA102 already played the cheap double FP32 card. There isn’t a similar cheap upgrade available for AD102. H100 also played that card and only managed 144 SMs in 815mm^2. Of course H100 has a ton of FP64, no RT cores and half the L2 cache of the rumored AD102 but it’s a better comparison point than A100.

    Not saying it’s impossible but 100 TF at 600mm^2 and 600W sounds like a pipe dream. At best it’s some silly “Liquid Cooled” edition.
     
  7. no-X

    no-X Veteran

    There is no need to disclaim statements which are not contained in my posts.

    If you are convinced, that GeForce RTX 3090 Ti is cheaper to manufacture than Radeon RX 6900 XT, I'm interested in all evidence supporting the statement. The problem is all the direct (memory capacity, type, cooler) and indirect proofs (die size, number of products based on partially disabled dies, Samsung's reputation etc.) indicate quite the opposite situation.
     
  8. DegustatoR

    DegustatoR Veteran

    AFAIR we were talking about chips which don't have memory capacity, coolers, etc. You're trying to prove that a GA102 chip isn't cheaper to produce than N21 chip because 3090 is more expensive than 6900XT in retail. But that's irrelevant. Products cost the way they do because the market is willing to buy them at these prices, not because of how much they cost to produce. And all data we have points to all Ampere chips being cheaper to produce than comparable in performance RDNA2 chips.
     
    DavidGraham, PSman1700 and Qesa like this.
  9. Qesa

    Qesa Newcomer

    The whole argument on cost is a sidetrack anyway. No-X tried to argue a downclocked 3090 ti could only be faster and more efficient than a 6900 xt by going wide and slow a la undervolted fiji vs gtx 980. While totally ignoring that GA102 isn't actually much larger than Navi 21, and is on a significantly worse node which more than explains the size difference and is a handicap to its perf/W (and is also why ampere is cheaper to produce).

    To actually argue the yield point though, AMD sells a single fully-enabled Navi 21 SKU, and it's immediately undercut by a cut down chip that is 35% cheaper and only 5% slower. Only a tiny fraction of people would go for the extra expense, which would indicate AMD isn't getting many fully yielding chips off the line. Nvidia meanwhile has 3 fully enabled SKUs and the professional ones, while niche, aren't immediately undercut by a far better value similarly-performing option.
     
    Last edited: May 3, 2022
    pharma, DavidGraham and PSman1700 like this.
  10. DegustatoR

    DegustatoR Veteran

    I mean 3070 is about on par with 6900XT when running modern games with RT. What does that tell us?
     
    PSman1700 likes this.
  11. Dangerman

    Dangerman Newcomer

    I do think there is an argument the 3090 Ti FE has a lower BOM than the 3090 FE due to having memory chips on one side of the PCB. Easily helps with a less complex PCB & Cooling.
     
  12. DavidGraham

    DavidGraham Veteran

    What does that have to do with die manufacturing costs? The A6000 is a Quadro professional card, you pay extra for the drivers, the support, and the gigantic 48GB of VRAM! Heck, the weasel Radeon Pro W6800 costs 2250$ and is a heavily cut down die that pales in comparison to the A6000 performance wise and ships with far less VRAM (only 16GB). In fact, AMD was never able to put fully enabled Navi21 chips in the professional line to this day, while NVIDIA had full GA102 in droves 18 months ago. What does that tell you about the manufacturing cost of RDNA2 chips?
     
    Last edited: May 3, 2022
  13. Kaotik

    Kaotik Drunk Member Legend

    It's 60 TFLOPS for FP64 tensors, not FP64. It's FP32 60 TFLOPS, FP64 30 TFLOPS, and those are for the 700W model.
    https://www.nvidia.com/en-us/data-center/h100/
     
  14. pharma

    pharma Veteran

    Yep, so we really have no clue about Ada's numbers at this point.
     
    PSman1700 likes this.
  15. Jawed

    Jawed Legend

    What's the die space per TFLOPS in A100? For FP32 and for TF32 (tensor FLOPS)?
     
    techuse likes this.
  16. CarstenS

    CarstenS Legend Subscriber

    How much is that relevant for AD10x? Very large Caches, 6x 1024 Bit HBM Memory Controllers, quite a bit of FP64, lack of RT circuitry and most rasterizers/prim setup are things that might set it apart.
     
    PSman1700 likes this.
  17. troyan

    troyan Regular

    You can use Ampere -> Hopper as a template.
     
    PSman1700 likes this.
  18. Jawed

    Jawed Legend

    We might have some fun, guessing that 100TFLOPS on AD102 at 2.5GHz is at least Xmm².

    Is there a die shot for A100?
     
  19. RecessionCone

    RecessionCone Regular Subscriber

    People forget that A100 has a big cache already.
     
    pharma and PSman1700 like this.
  20. xpea

    xpea Regular

    Most of extra Hopper power goes into FP64 and 18 NVLink 4 at 900GBps that Ada won't have.
    Otherwise, if AD102 has same SM as Ampere, 18452 CUDA cores at 2.5Ghz gives 92 TFLOPS
     
    PSman1700 and pharma like this.
Loading...

Share This Page

Loading...