NVidia Ada Speculation, Rumours and Discussion

trinibwoy · May 3, 2022

Dangerman said:
I was doing some napkin math on AD102 being 2.8Ghz since Greymon mentioned RDNA3's frequency: 2.8 (Ghz) * 18432 * 2 = 103,219.2 GFlops/103.2192 Teraflops. 2.58x the Flops over the 3090 Ti by my calcs.

I don’t see how AD102 gets anywhere near 100 TF. H100 is 60 TF @ 700W on the same node with HBM3. If it does though I would bet on going wide not fast. Nvidia knows they’re up against MCM so AD102 could be another >750mm^2 chip like TU102.

This is random speculation but adding two more partitions to each SM would do the trick. 144 SMs x 6 x 32 = 27648 “cores”. 1800 MHz boost clock gets you close to 100 TF. Occupancy would be a challenge unless L1 gets a capacity bump though.

Qesa · May 3, 2022

trinibwoy said:
I don’t see how AD102 gets anywhere near 100 TF. H100 is 60 TF @ 700W on the same node with HBM3. If it does though I would bet on going wide not fast. Nvidia knows they’re up against MCM so AD102 could be another >750mm^2 chip like TU102.

This is random speculation but adding two more partitions to each SM would do the trick. 144 SMs x 6 x 32 = 27648 “cores”. 1800 MHz boost clock gets you close to 100 TF. Occupancy would be a challenge unless L1 gets a capacity bump though.

Not that I necessarily think AD102 will be 100 TF, but H100's 60 TF is for double precision, and it expends a ton of power on nvlink. Sorta like saying A100 is 20 TF at 400 W, so GA102 over 30 TF is unrealistic.

no-X · May 3, 2022

DegustatoR said:
A6000 was launched in Oct 2020.

DavidGraham said:
Fully enabled GA102 is available since 2020.

Sorry, that's fallacy. The discussion was about manufacturing price of RTX 3090 Ti. The fact, that Nvidia sold some fully enabled GA102s in 2020 as a part of $4650 product (A6000) doesn't prove that they are cheaper to manufacture than fully enabled Navi 21, which was sold in 2020 as a part of $999 product.

Dangerman · May 3, 2022

trinibwoy said:
I don’t see how AD102 gets anywhere near 100 TF. H100 is 60 TF @ 700W on the same node with HBM3. If it does though I would bet on going wide not fast. Nvidia knows they’re up against MCM so AD102 could be another >750mm^2 chip like TU102.

This is random speculation but adding two more partitions to each SM would do the trick. 144 SMs x 6 x 32 = 27648 “cores”. 1800 MHz boost clock gets you close to 100 TF. Occupancy would be a challenge unless L1 gets a capacity bump though.

A100 was 19 FP32 Teraflops while a 3090 Ti is just technically under 40 Teraflops. Saying AD102 can never get to just above 100 TF FP32 because of H100 doesn't make sense (if anything FP32 60TF for H100 kinda makes 103 TF AD102 SKU more likely though still be a hungry, hungry, hungry hippo at 600W).

Though we don't know if AD102 has had extensive redesigns as it seems Nvidia did a lot of changes last year to Lovelace probably to compete against RDNA 3. I mean we could see a "4090" be like 150 TF due to changing the SMs again while still pushing 600 watts.

DegustatoR · May 3, 2022

no-X said:
Sorry, that's fallacy. The discussion was about manufacturing price of RTX 3090 Ti. The fact, that Nvidia sold some fully enabled GA102s in 2020 as a part of $4650 product (A6000) doesn't prove that they are cheaper to manufacture than fully enabled Navi 21, which was sold in 2020 as a part of $999 product.

Retail prices rarely have any relation to manufacturing costs.

trinibwoy · May 3, 2022

Qesa said:
Not that I necessarily think AD102 will be 100 TF, but H100's 60 TF is for double precision, and it expends a ton of power on nvlink. Sorta like saying A100 is 20 TF at 400 W, so GA102 over 30 TF is unrealistic.

Dangerman said:
A100 was 19 FP32 Teraflops while a 3090 Ti is just technically under 40 Teraflops. Saying AD102 can never get to just above 100 TF FP32 because of H100 doesn't make sense (if anything FP32 60TF for H100 kinda makes 103 TF AD102 SKU more likely though still be a hungry, hungry, hungry hippo at 600W).

Not quite. GA102 already played the cheap double FP32 card. There isn’t a similar cheap upgrade available for AD102. H100 also played that card and only managed 144 SMs in 815mm^2. Of course H100 has a ton of FP64, no RT cores and half the L2 cache of the rumored AD102 but it’s a better comparison point than A100.

Not saying it’s impossible but 100 TF at 600mm^2 and 600W sounds like a pipe dream. At best it’s some silly “Liquid Cooled” edition.

no-X · May 3, 2022

DegustatoR said:
Retail prices rarely have any relation to manufacturing costs.

There is no need to disclaim statements which are not contained in my posts.

If you are convinced, that GeForce RTX 3090 Ti is cheaper to manufacture than Radeon RX 6900 XT, I'm interested in all evidence supporting the statement. The problem is all the direct (memory capacity, type, cooler) and indirect proofs (die size, number of products based on partially disabled dies, Samsung's reputation etc.) indicate quite the opposite situation.

DegustatoR · May 3, 2022

no-X said:
There is no need to disclaim statements which are not contained in my posts.

If you are convinced, that GeForce RTX 3090 Ti is cheaper to manufacture than Radeon RX 6900 XT, I'm interested in all evidence supporting the statement. The problem is all the direct (memory capacity, type, cooler) and indirect proofs (die size, number of products based on partially disabled dies, Samsung's reputation etc.) indicate quite the opposite situation.

AFAIR we were talking about chips which don't have memory capacity, coolers, etc. You're trying to prove that a GA102 chip isn't cheaper to produce than N21 chip because 3090 is more expensive than 6900XT in retail. But that's irrelevant. Products cost the way they do because the market is willing to buy them at these prices, not because of how much they cost to produce. And all data we have points to all Ampere chips being cheaper to produce than comparable in performance RDNA2 chips.

Qesa · May 3, 2022

The whole argument on cost is a sidetrack anyway. No-X tried to argue a downclocked 3090 ti could only be faster and more efficient than a 6900 xt by going wide and slow a la undervolted fiji vs gtx 980. While totally ignoring that GA102 isn't actually much larger than Navi 21, and is on a significantly worse node which more than explains the size difference and is a handicap to its perf/W (and is also why ampere is cheaper to produce).

To actually argue the yield point though, AMD sells a single fully-enabled Navi 21 SKU, and it's immediately undercut by a cut down chip that is 35% cheaper and only 5% slower. Only a tiny fraction of people would go for the extra expense, which would indicate AMD isn't getting many fully yielding chips off the line. Nvidia meanwhile has 3 fully enabled SKUs and the professional ones, while niche, aren't immediately undercut by a far better value similarly-performing option.

DegustatoR · May 3, 2022

I mean 3070 is about on par with 6900XT when running modern games with RT. What does that tell us?

Dangerman · May 3, 2022

I do think there is an argument the 3090 Ti FE has a lower BOM than the 3090 FE due to having memory chips on one side of the PCB. Easily helps with a less complex PCB & Cooling.

DavidGraham · May 3, 2022

no-X said:
Sorry, that's fallacy. The discussion was about manufacturing price of RTX 3090 Ti. The fact, that Nvidia sold some fully enabled GA102s in 2020 as a part of $4650 product (A6000) doesn't prove that they are cheaper to manufacture than fully enabled Navi 21, which was sold in 2020 as a part of $999 product.

What does that have to do with die manufacturing costs? The A6000 is a Quadro professional card, you pay extra for the drivers, the support, and the gigantic 48GB of VRAM! Heck, the weasel Radeon Pro W6800 costs 2250$ and is a heavily cut down die that pales in comparison to the A6000 performance wise and ships with far less VRAM (only 16GB). In fact, AMD was never able to put fully enabled Navi21 chips in the professional line to this day, while NVIDIA had full GA102 in droves 18 months ago. What does that tell you about the manufacturing cost of RDNA2 chips?

Kaotik · May 3, 2022

Qesa said:
Not that I necessarily think AD102 will be 100 TF, but H100's 60 TF is for double precision, and it expends a ton of power on nvlink. Sorta like saying A100 is 20 TF at 400 W, so GA102 over 30 TF is unrealistic.

It's 60 TFLOPS for FP64 tensors, not FP64. It's FP32 60 TFLOPS, FP64 30 TFLOPS, and those are for the 700W model.
https://www.nvidia.com/en-us/data-center/h100/

Deleted member 2197 · May 3, 2022

Qesa said:
Not that I necessarily think AD102 will be 100 TF, but H100's 60 TF is for double precision, and it expends a ton of power on nvlink. Sorta like saying A100 is 20 TF at 400 W, so GA102 over 30 TF is unrealistic.

Yep, so we really have no clue about Ada's numbers at this point.

Jawed · May 4, 2022

What's the die space per TFLOPS in A100? For FP32 and for TF32 (tensor FLOPS)?

CarstenS · May 4, 2022

Jawed said:
What's the die space per TFLOPS in A100? For FP32 and for TF32 (tensor FLOPS)?

How much is that relevant for AD10x? Very large Caches, 6x 1024 Bit HBM Memory Controllers, quite a bit of FP64, lack of RT circuitry and most rasterizers/prim setup are things that might set it apart.

troyan · May 4, 2022

You can use Ampere -> Hopper as a template.

Jawed · May 4, 2022

CarstenS said:
How much is that relevant for AD10x? Very large Caches, 6x 1024 Bit HBM Memory Controllers, quite a bit of FP64, lack of RT circuitry and most rasterizers/prim setup are things that might set it apart.

We might have some fun, guessing that 100TFLOPS on AD102 at 2.5GHz is at least Xmm².

Is there a die shot for A100?

RecessionCone · May 4, 2022

CarstenS said:
How much is that relevant for AD10x? Very large Caches, 6x 1024 Bit HBM Memory Controllers, quite a bit of FP64, lack of RT circuitry and most rasterizers/prim setup are things that might set it apart.

People forget that A100 has a big cache already.

xpea · May 4, 2022

RecessionCone said:
People forget that A100 has a big cache already.

Most of extra Hopper power goes into FP64 and 18 NVLink 4 at 900GBps that Ada won't have.
Otherwise, if AD102 has same SM as Ampere, 18452 CUDA cores at 2.5Ghz gives 92 TFLOPS

NVidia Ada Speculation, Rumours and Discussion

trinibwoy

Meh

Qesa

no-X

Dangerman

DegustatoR

trinibwoy

Meh

no-X

DegustatoR

Qesa

DegustatoR

Dangerman

DavidGraham

Kaotik

Drunk Member

Deleted member 2197

Guest

Jawed

CarstenS

Moderator

troyan

Jawed

RecessionCone

xpea

Similar threads