NVidia Ada Speculation, Rumours and Discussion

DegustatoR · Mar 27, 2022

Jawed said:
Why do you conclude that the difference is greater here? Is there a specific feature of the tensor cores or cache that you're referring to or is there something else?

Well for one there's the fact that GH100 isn't a videochip as it lacks most of graphics h/w.

trinibwoy · Mar 27, 2022

DegustatoR said:
Well for one there's the fact that GH100 isn't a videochip as it lacks most of graphics h/w.

Whitepaper says: “Only two TPCs in both the SXM5 and PCIe H100 GPUs are graphics-capable (that is, they can run vertex, geometry, and pixel shaders”.

It doesn’t say anything about the hardware though. Hopper still has a full complement of texture units and Nvidia is still using “GPC” and “TPC” to describe the architecture. I wonder if the rasterizers, ROPs and setup hardware are actually missing.

rSkip · Mar 28, 2022

H100 can still run compute workloads, which require texture filtering, so texture units are still there.
I guess rasterizer and ROPs might only exist in the graphics-capable GPC after nvidia decoupled ROPs and MCs.

xpea · Mar 28, 2022

Asking the programmers, do DPX (dynamic programming) introduced in Hopper, will be beneficial on gaming GPU Lovelace ?
https://blogs.nvidia.com/blog/2022/...s-dynamic-programming-using-dpx-instructions/
I was thinking of using DPX to avoid shader stalls, optimize warp occupancy, or get better branching performance

pcchen · Mar 29, 2022

xpea said:
Asking the programmers, do DPX (dynamic programming) introduced in Hopper, will be beneficial on gaming GPU Lovelace ?
https://blogs.nvidia.com/blog/2022/...s-dynamic-programming-using-dpx-instructions/
I was thinking of using DPX to avoid shader stalls, optimize warp occupancy, or get better branching performance

There's no public available documentations of these DPX instructions right now (at least I can't find any).
But since these are about dynamic programming I'd guess that they are probably about finding/indexing something in a vector, since a lot of dynamic programming algorithms are about finding a specific value or a minimum/maximum value in an array.
These instructions might be useful in some game logics (e.g. path finding), but of course we'll only find out after the documentations are public.

TopSpoiler · Apr 9, 2022

https://twitter.com/x/status/1512313284237533189

What technology that only Hopper had could be integrated into Ada? I think Hopper doesn't have such technologies can benefit gaming architecture. It's not even MCM.

DegustatoR · Apr 9, 2022

TopSpoiler said:
I think Hopper doesn't have such technologies can benefit gaming architecture.

We don't really know that. Nvidia hasn't said anything about the changes in Hopper which would benefit gaming so far.
But I doubt that this tweet has anything in common with reality TBH.
Unless AD102 is MCM and that technology is what was "integrated" from GH202. But I don't see how that's "Hopper's new technology" really.

xpea · Apr 9, 2022

TopSpoiler said:
https://twitter.com/x/status/1512313284237533189

What technology that only Hopper had could be integrated into Ada? I think Hopper doesn't have such technologies can benefit gaming architecture. It's not even MCM.

For gaming:
New SM scheduler (TMA accelerator in Hopper terminology ie more efficient asynchronous execution)
DPX instructions
Thread block cluster
The biggest boost: 2xFP + INT pipeline (still not sure about this one)

And not in Hopper but 100% sure, 3rd gen RT cores that are much improved (compressed BVH pipeline that fits in L2 cache + new hyperfast intersection algo)

DegustatoR · Apr 9, 2022

xpea said:
The biggest boost: 2xFP + INT pipeline (still not sure about this one)

How this would even work though? Three 16-wide SIMDs with one 32-wide warp per clock? They'll need a separate port for INT h/w then and the SM level scheduling will become complicated.
It also seem a bit excessive from typical math mix POV - unless the mix has changed more in favor of FP32 now.

Rootax · Apr 9, 2022

xpea said:
For gaming:
New SM scheduler (TMA accelerator in Hopper terminology ie more efficient asynchronous execution)
DPX instructions
Thread block cluster
The biggest boost: 2xFP + INT pipeline (still not sure about this one)

And not in Hopper but 100% sure, 3rd gen RT cores that are much improved (compressed BVH pipeline that fits in L2 cache + new hyperfast intersection algo)

Have we some hints about that ?

trinibwoy · Apr 9, 2022

It always amazes me when things are cancelled that never existed. This next generation will just be bigger caches and faster RT. Bet on it.

McHuj · Apr 9, 2022

trinibwoy said:
It always amazes me when things are cancelled that never existed. This next generation will just be bigger caches and faster RT. Bet on it.

I've worked on plenty of SOCs that never existed as products but went far into development and were cancelled due to changing customer needs or demand. I don't know if it's true in this case, but this stuff happens all the time in the industry.

trinibwoy · Apr 9, 2022

McHuj said:
I've worked on plenty of SOCs that never existed as products but went far into development and were cancelled due to changing customer needs or demand. I don't know if it's true in this case, but this stuff happens all the time in the industry.

That’s fair but it’s hard to imagine Nvidia cancelling entire architectures these days given their market position and expertise in the game. Cancelling an SKU or two sure but they have a massive R&D machine that should “cancel” ideas well before they’re close to production.

DegustatoR · Apr 9, 2022

Chips can be cancelled prior to production, happens all the time, nothing weird about it.
Entire architectures are quite unlikely, and GH202 is Hopper so it's not the case anyway.
The early appearance of Blackwell chips does look a bit suspect though so it's possible that Nv decided to bring GB chips forward instead of producing GH202.

Picao84 · Apr 9, 2022

DegustatoR said:
Chips can be cancelled prior to production, happens all the time, nothing weird about it.
Entire architectures are quite unlikely, and GH202 is Hopper so it's not the case anyway.
The early appearance of Blackwell chips does look a bit suspect though so it's possible that Nv decided to bring GB chips forward instead of producing GH202.

Blackwell?

DegustatoR · Apr 9, 2022

Picao84 said:
Blackwell?

https://videocardz.com/newz/nvidia-ada-ad102-107-hopper-gh100-202-blackwell-gb100-102-gpu-confirmed

Kaotik · Apr 9, 2022

Picao84 said:
Blackwell?

Yeah, next gen server after hopper (or possibly separate line to co-exist with hopper specialized for something else?)
Kopite7Kimi hinted about it last summer and it was confirmed in the NVIDIA hack leaks

nnunn · Apr 9, 2022

TopSpoiler said:
What technology that only Hopper had could be integrated into Ada?

With so many cores needing to be fed, maybe AD102 will fix memory speed and heat issues with skinny 512-bit HBM3? From NextPlatform.com:

nextplatform.com said:
"That is why we believe there will be versions of HBM with skinnier 512-bit buses and no interposer as well as those that have the 1,024-bit bus and an interposer."

Samwell · Apr 9, 2022

Rootax said:
Have we some hints about that ?

Not sure about the details, but big architectural improvements for RT in Lovelace are expected from a chip design perspective. For Turing Nvidia was designing their RT in a vacuum without game developer experiences and suggestions. The same happened mostly for ampere, as you can't change much in the last 2 years in an architecture. So only intersection speed was improved and motion blur. Also the reason, why AMD could only add very slow RT late in the process of RDNA2 architecture.

Lovelace is the first architecture, which should implement architectural improvements based on the first RT games and fix some of the easier to solve RT bottlenecks, which appeared.

trinibwoy · Apr 9, 2022

Samwell said:
Lovelace is the first architecture, which should implement architectural improvements based on the first RT games and fix some of the easier to solve RT bottlenecks, which appeared.

And hopefully those improvements accelerate existing RT games and don’t require devs to do things differently.

NVidia Ada Speculation, Rumours and Discussion

DegustatoR

trinibwoy

Meh

rSkip

xpea

pcchen

Moderator

TopSpoiler

DegustatoR

xpea

DegustatoR

Rootax

trinibwoy

Meh

McHuj

trinibwoy

Meh

DegustatoR

Picao84

DegustatoR

Kaotik

Drunk Member

nnunn

Samwell

trinibwoy

Meh

Similar threads