NVidia Ada Speculation, Rumours and Discussion

Status
Not open for further replies.
GH100 got the Gaming-Ampere SMs. So you have to compare Hopper to Volta. For example A100 FP32 and FP64 went only up 30% while the transistor budget increased by 2.6x.

AD102 has 16x more L2 cache, 71% more compute units, 71% more GPCs (rasterizer and ROPs), improved RT Cores (2x the triangle intersection, two new hardware features), new geometry processing (Micro-Meshes), improved TensorCores, new shader reordering function, new optic flow accelator, new video encoder, 40%+ higher clocks and i guess a lot of the Hopper's compute features. Looking at the raw performance of a "full" AD102 it doesnt look so bad.
 
Last edited:
I really want to know where all of the transistor budget went? A100 has 54billion transistors, and 19.5 TF of FP32 compute, fast forward to H100 and it has 80billion transistors and more than the triple the amount of FP32 compute, 67 TF. Yet Ada barely doubled FP32 despite spending close to 3X the transisor count.


A100: 54billion transistors, 19.5 TF of FP32 compute
H100: 80billion transistors, 67TF of FP32 compute

Ampere: 28 billion transistors, 40TF FP32
Ada: 76 billion transistors, ~90TF FP32

Something doesn't add up.

Part of this is an issue of numbers and how they're perceived.

For instance if you actually calculate the numbers out AD102 is 2.7x more transistors than GA102 for 2.25x more TF FP32 (using your 90 tflop FP32 number), this perceptually a much better ratio than roughly handling it as "close to 3x the transistor count" and "barely doubled FP32."

Now your ~90 TF FP32 number (for I assume a full AD102/RTX 3090ti equivalent) is also likely low at only 1.09x more than RTX 4090 (82.58). RTX 3090ti (40 tf) vs RTX 3090 (35.58 tf) is actually 1.12x more. Using that number a hypothetical 4090ti would have 92.49 TF or 2.31x more (we're creeping up).

RTX 4090 is also more cut down than 3090 was relative to the 3090ti, as such an actual full AD102 implementation may have an even higher ratio than that if not power restricted (and 3090ti was generous with power over 3090 with a higher boost clock spec used to calculate stock tflops). If we say move that up to ~98 TF (using about the same 150mhz boost 3090ti vs 3090) than that ends up at 2.45x TF more.

That would also still be a boost clock of under 2.7ghz, which judging by the leaks the silicon likely can do higher even. There was rumors of 600w configurations with over 100 TF from what I remember? 100 TF would make it 2.5x.

So ultimately AD102 vs GA102 could be more along the lines of 2.7x transistors for 2.5x more TF FP32, which sounds a lot better than "barely doubled FP32" for "close to 3x the transistor count."
 
Last edited:
GH100 got the Gaming-Ampere SMs. So you have to compare Hopper to Volta. For example A100 FP32 and FP64 went only up 30% while the transistor budget increased by 2.6x.

AD102 has 16x more L2 cache, 71% more compute units, 71% more GPCs (rasterizer and ROPs), improved RT Cores (2x the triangle intersection, two new hardware features), new geometry processing (Micro-Meshes), improved TensorCores, new shader reordering function, new optic flow accelator, new video encoder, 40%+ higher clocks and i guess a lot of the Hopper's compute features. Looking at the raw performance of a "full" AD102 it doesnt look so bad.
Micro meshes are part of RT core, not geometry side. OFA is also improved rather than new?
 
No, Micro Meshes are a new way of processing geometry - from the whitepaper:
Oh, tried looking for anything outside RT core from the whitepaper, must have missed that since all the searches took me to RT stuff
 
I really want to know where all of the transistor budget went? A100 has 54billion transistors, and 19.5 TF of FP32 compute, fast forward to H100 and it has 80billion transistors and more than the triple the amount of FP32 compute, 67 TF. Yet Ada barely doubled FP32 despite spending close to 3X the transisor count.


A100: 54billion transistors, 19.5 TF of FP32 compute
H100: 80billion transistors, 67TF of FP32 compute

Ampere: 28 billion transistors, 40TF FP32
Ada: 76 billion transistors, ~90TF FP32

Something doesn't add up.
I think theres 56 billion for the sm cores and 20 billion transistors went all to the l2 cache and tensor cores
 
Generally speaking, Nvidia usually undercuts its partners when it launches its GPUs. It comes out first with its Founder’s Edition cards, which have always been less expensive than add-in board (AIB) prices. Then, a few weeks later the AIB boards come out with their own versions, which are usually more expensive than Nvidia’s cards. That’s because they have custom PCBs, advanced cooling, and so forth. This gives Nvidia the first bite at the apple and lets it suck up some early adopters. That doesn’t seem to be the case this time. Nvidia priced its RTX 4090 at $1,599, and on Newegg, some partner boards are listed at…$1,599. That’s a first, and a promising sign that GPU price gouging might finally be over. In the past, a GPU at that price would have debuted at $1,999 or higher.
Screen-Shot-2022-09-30-at-8.21.20-AM-768x372.png

It’s an encouraging sign that AIBs are offering RTX 4090s for the same price as Nvidia’s cards. Sure, there are some overclocked models with huge coolers going for higher prices, but at least we seem to have options.
 
The era of crappy reference models is officially behind us. There isn't much that AIB models do that the Founder's Edition doesn't. It used to be that NVIDIA underclocked their GPUs in relation to what they were truly capable of. See the 980/Ti clocked at 900-980MHz when even bad ones could easily hit 1050MHz+. It made sense not to get the reference cards with bad blower coolers and non-guaranteed OC performance. At the time, you could find some AIBs that were faster than the reference models by 10%+ out of the box with lower temperatures and less noise.

Now though? You're lucky to get a 5% boost with aftermarket models and sometimes, their coolers are even worse than the Founder's Edition. To top it all off, NVIDIA has figured out how to maximize their clocks leaving little incentive to get AIB models that aren't even much better anyway.
 
4080 12gb for all its intend being called a ’4070’ actually performing like a 3090ti makes that gpu a quite good value and is promising for gpus further scaling down.
Thats a 4070 (sometimes even called 4060) performing like the fastest last gen GPU the 3090Ti.
 
Status
Not open for further replies.
Back
Top