NVidia Ada Speculation, Rumours and Discussion

DegustatoR · May 13, 2022

TopSpoiler said:
Seems the "Easy 2x" is 4090.

Not really. "Low frequency AD102" implies that it's not a 4090 but a low clocked 4090.

But anyway all these "leakers" are fairly clueless right now.

trinibwoy · May 13, 2022

I’ll take 2x at 350w.

vola · May 14, 2022

https://twitter.com/x/status/1524947535973912576

kopite thinks that ada will now have 32 ROPs/GPC so 384 ROPs in total. is that really realistic?

Deleted member 2197 · May 14, 2022

NVIDIA Ada Lovelace 'GeForce RTX 40' Gaming GPU Detailed: Double The ROPs, Huge L2 Cache & 50% More FP32 Units Than Ampere, 4th Gen Tensor & 3rd Gen RT Cores (wccftech.com)

You are looking at up to 384 ROPs on the next-gen flagship versus just 112 on the fastest Ampere GPU, the RTX 3090 Ti. There are also going to be the latest 4th Generation Tensor and 3rd Generation RT (Raytracing) cores infused on the Ada Lovelace GPUs which will help boost DLSS & Raytracing performance to the next level. Overall, the Ada Lovelace AD102 GPU will offer:

2x GPCs (Versus Ampere)

50% More Cores (Versus Ampere)

50% More L1 Cache (Versus Ampere)

16x More L2 Cache (Versus Ampere)

Double The ROPs (Versus Ampere)

4th Gen Tensor & 3rd Gen RT Cores

Do note that clock speeds, which are said to be between the 2-3 GHz range, aren't taken into the equation so they will also play a major role in improving the per-core performance versus Ampere. The NVIDIA GeForce RTX 40 series graphics cards featuring the next-gen Ada Lovelace gaming GPUs are expected to launch in the second half of 2022 & are said to utilize the same TSMC 4N process node as the Hopper H100 GPU.

NVIDIA-Ada-Lovelace-GPU-Block-Diagram-For-GeForce-RTX-40-Series-Gaming-Graphics-Cards-low_res-scale-4_00x-1480x830.jpg

trinibwoy · May 14, 2022

vola said:
https://twitter.com/x/status/1524947535973912576

kopite thinks that ada will now have 32 ROPs/GPC so 384 ROPs in total. is that really realistic?

Doesn’t make sense. Next generation renderers will be limited by compute and RT performance. Crazy high fillrate isn’t going to help much.

Rootax · May 14, 2022

Hypotetical, can you have more "simpler/smaller" rops, so you have a better utilisation, maybe some performances gain, without taking more die space or something like that ?

TopSpoiler · May 14, 2022

https://imgur.com/a/qd32WnY

Async memory copy first introduced in A100 and it works on typical ALU threads. In Hopper they introduced Tensor Memory Accelerator unit that dedicated to async memory copy operation works fully independently/concurrently with ALU threads. So if the Async Memory Accelerator from Kopite's drawing works like TMA, it's very efficient way to hide VRAM latency. But without explicit user programming how it would work in games?

trinibwoy · May 14, 2022

Rootax said:
Hypotetical, can you have more "simpler/smaller" rops, so you have a better utilisation, maybe some performances gain, without taking more die space or something like that ?

The way Nvidia and AMD count ROPs it’s one ROP = one pixel. So you can’t really go any smaller than that.

Ampere has 16 ROPs per GPC but it doesn’t tell us much about the granularity of their operation. A global memory transaction on Nvidia hardware is 32 bytes which is enough for 8xINT8 pixels or 4xFP16 pixels. I’ve always assumed ROPs operated on quad granularity. So those 16 ROPs can work on 4 independent pixel quads instead of one contiguous tile of 16 adjacent pixels. This is just an assumption though as I couldn’t find any evidence to support it.

Did Nvidia ever move to full speed FP16 fill rate? The most recent numbers I could find are from Pascal which was still half speed.

https://www.hardware.fr/articles/948-9/performances-theoriques-pixels.html

Man from Atlantis · May 14, 2022

trinibwoy said:
The way Nvidia and AMD count ROPs it’s one ROP = one pixel. So you can’t really go any smaller than that.

Ampere has 16 ROPs per GPC but it doesn’t tell us much about the granularity of their operation. A global memory transaction on Nvidia hardware is 32 bytes which is enough for 8xINT8 pixels or 4xFP16 pixels. I’ve always assumed ROPs operated on quad granularity. So those 16 ROPs can work on 4 independent pixel quads instead of one contiguous tile of 16 adjacent pixels. This is just an assumption though as I couldn’t find any evidence to support it.

Did Nvidia ever move to full speed FP16 fill rate? The most recent numbers I could find are from Pascal which was still half speed.

https://www.hardware.fr/articles/948-9/performances-theoriques-pixels.html

closest i could find, it's not raw metrics though

Feature Test 2: Color Fill

The second task is the fill rate test. It uses a very simple pixel shader that does not limit performance. The interpolated color value is written to an offscreen buffer (render target) using alpha blending. It uses a 16-bit FP16 off-screen buffer, the most commonly used in games that use HDR rendering, so this test is quite modern.
The numbers from the second subtest of 3DMark Vantage usually show the performance of the ROP units, without taking into account the amount of video memory bandwidth, and the test usually measures the performance of the ROP subsystem

https://www.ixbt.com/3dv/amd-radeon-rx-6900xt-review.html

DegustatoR · May 14, 2022

vola said:
https://twitter.com/x/status/1524947535973912576

kopite thinks that ada will now have 32 ROPs/GPC so 384 ROPs in total. is that really realistic?

Feeding three 32 wide SIMDs with one 32 thread dispatch port is what completely unrealistic in this picture.

They're just guessing.

Jawed · May 14, 2022

Rootax said:
Hypotetical, can you have more "simpler/smaller" rops, so you have a better utilisation, maybe some performances gain, without taking more die space or something like that ?

Along the same lines, but I suppose "bigger", isn't this what happened with RDNA 2 versus RDNA 1? For VRS the colour rate was doubled, but other rates left alone, I guess. RB+

from:

AMD Radeon RDNA 2 "Big Navi" Architectural Deep Dive: A Focus on Efficiency | Hardware Times

So NVidia might increase colour rate only. It may already have done something like this with Turing, which has VRS support (doesn't it?)

TopSpoiler · May 15, 2022

https://twitter.com/x/status/1525797225657696257

5 months from tape-out to HVM for AD102? That's so freaking fast.

xpea said:
Yes and more things that they don't disclosure in public. The obvious one is the usage of their Selene supercomputer for intensive algorithm/RTL/floor plan verification. Nvidia spend much more time than before in simulation and their silicon verification lab is now a huge department in order to shorten tape out to HVM time. For small dies like GA106-107, tape out to HVM was less than 5 months and it should be even less for Ada

It's actually happening..

Remij · May 15, 2022

I've held off of getting the 30 series after failing to get one back when they launched. I've had multiple chances since then, but never felt quite committed to pull the trigger. I decided to upgrade other aspects of my setup in the meantime, and I'm glad I waited tbh. Now I'm just hoping I can be quick enough to snag a 4090.

DegustatoR · May 16, 2022

https://videocardz.com/newz/nvidia-...0w-tdp-is-allegedly-twice-as-fast-as-rtx-3090

Oh, look, it's 450W for 4090 now.

xpea · May 16, 2022

Today got a surprising comment from my usual source about AD102 vs Navi31:
"Easy win in RT and Compute"
(for green team)
Did Navi31 perf leak? Jetbait game?
No idea but it's spicy!

techuse · May 16, 2022

xpea said:
Today got a surprising comment from my usual source about AD102 vs Navi31:
"Easy win in RT and Compute"
(for green team)
Did Navi31 perf leak? Jetbait game?
No idea but it's spicy!

RT was a given. Compute too given the direction in architecture Nvidia started with Ampere.

DegustatoR · May 16, 2022

techuse said:
RT was a given. Compute too given the direction in architecture Nvidia started with Ampere.

What else is there?

Phantom88 · May 16, 2022

https://twitter.com/x/status/1526135149246976001

wonder what amd messed up. We were expecting them to be the leader and nvidia scrambling to match them. How did this do a complete 180 ?

DegustatoR · May 16, 2022

Phantom88 said:
wonder what amd messed up. We were expecting them to be the leader and nvidia scrambling to match them. How did this do a complete 180 ?

Who expected them to be the leader? The expectation was that they may win in "rasterization" in the highest pricing tier.

PSman1700 · May 16, 2022

xpea said:
Jetbait

What is a jetbait?

DegustatoR said:
What else is there?

ML/AI acceleration perhaps? But that's a given too anyway.

Phantom88 said:
We were expecting them to be the leader and nvidia scrambling to match them.

It has been the otherway around for a very long time now. AMD has come close/matching in rasterization but lacked in ML/AI, reconstruction, and RT aswell as compute power.
NV just leapfrogged on raw rasterization most likely, aswell as ray tracing and compute power. Aslong as AMD doesnt cover much of the gaming market (like now) then they wont be able to compete, unless they leafrog and sink prices alot.

DegustatoR said:
they may win in "rasterization" in the highest pricing tier

They didnt win in rasterization vs Ampere, they probably wont with RDNA3 vs RTX4000 either.

NVidia Ada Speculation, Rumours and Discussion

DegustatoR

trinibwoy

Meh

vola

Deleted member 2197

Guest

trinibwoy

Meh

Rootax

TopSpoiler

trinibwoy

Meh

Man from Atlantis

DegustatoR

Jawed

TopSpoiler

Remij

DegustatoR

xpea

techuse

DegustatoR

Phantom88

DegustatoR

PSman1700

Similar threads