NVIDIA discussion [2024]

Given how well they've performed in games (Halo Infinite & Starfield)
I don't see how well they performed in any of these games, their performance level is the same RDNA2 vs Ampere or RDNA3 vs Ada.


You are reaching with your conclusion based on false or insufficient data.
and that there's publicized preliminary data from a big ISV (Epic Games) too
That publicized "beta" preliminary data contained no performance comparisons or even worthy general performance gains, the jury is still far away on this one.

They're behind feature/integration wise but by no means do they have 'bad' HW design when they can have a smaller physical HW implementation for a given performance profile and they have more freedom to improve this aspect as well ...
These are generalized statements with no current data to back them up, no one said their HW design is bad, just they are lagging in visual and performance features.
 
I don't see how well they performed in any of these games, their performance level is the same RDNA2 vs Ampere or RDNA3 vs Ada.

You are reaching with your conclusion based on false or insufficient data.
There's other GPU driven rendering centric games out there like the recently released Sony titles on PC and Starfield is still a good result for them...
That publicized "beta" preliminary data contained no performance comparisons or even worthy general performance gains, the jury is still far away on this one.
They showed profiling data where empty dispatches cost 0.2 usec and a demo like their City sample can potentially have over 3000 empty shading bins. A frametime reduction of ~0.6ms can be extracted ...
These are generalized statements with no current data to back them up, no one said their HW design is bad, just they are lagging in visual and performance features.
Well given the fact that they're not using exotic memory standards like GDDR6X and have an asymmetric process technology implementation per chiplet with a much smaller main graphics die, it is an achievement in of itself that they're even able to get within the realm with the highest performing parts very frequently. Their advantage in HW design simplicity manifests even in smaller parts ...
 
There's other GPU driven rendering centric games out there like the recently released Sony titles on PC and Starfield is still a good result for them...
We are practically arguing semantics here, the recent released Sony titles are based on old cross platform titles that are constantly being hammered by the famous "DX12 tax", and are hardly "modern" by any standards. For that you need recent Sony games with modern features like Ratchet and Clank Rift Apart, Returnal and Helldivers 2, where RDNA3 gets decimated at max settings.

You seem to be ignoring the reality here, PS5 Pro is releasing with increased RT capabilities, meaning all RDNA3/RDNA2 are going to be even more decimated through their lackluster RT performance, it won't even matter if they have a slightly more robust WorkGraphs implementation (which is questionable) when they lose to NVIDIA by a 50% margins.

They showed profiling data where empty dispatches cost 0.2 usec and a demo like their City sample can potentially have over 3000 empty shading bins. A frametime reduction of ~0.6ms can be extracted ...
I see the word "demo" and lots of "can" words, we shall see 2 years later when this becomes reality in an actual game. Let's compare performance between vendors then.

Well given the fact that they're not using exotic memory standards like GDDR6X and have an asymmetric process technology implementation per chiplet with a much smaller main graphics die, it is an achievement in of itself that they're even able to get within the realm with the highest performing parts very frequently.
The AD103/4080 Super is a much smaller die and it trades blows with the 7900XTX in raster and blows past it in actual ray tracing loads, with vastly better power efficiency. As for the 4090, it's a cut down part that NVIDIA didn't even bother release the full 4090Ti die, which can easily add 15% to 20% more raster performance. But why would NVIDIA do that when AD103 is more than enough. As for actual modern PC games performance with ray tracing/path tracing, well we don't have to talk about that, do we?
 
Last edited:
Well given the fact that they're not using exotic memory standards like GDDR6X and have an asymmetric process technology implementation per chiplet with a much smaller main graphics die, it is an achievement in of itself that they're even able to get within the realm with the highest performing parts very frequently. Their advantage in HW design simplicity manifests even in smaller parts ...
Wow that's some nice cherry picking here. How can you take a single metric (raster) to evaluate a GPU arch and claim one is better? What about RT? What about 3D rendering? What about upscaling? What about power consumption? What about video encoders? What about the ecosystem that NV hardware enables? And what about the hot AI topic ?
Just published yesterday, see how your "superior" arch fades in AI in lot of workloads:

21-Int-Overall-Score.png

Basically, it's just embarrassing for AMD and it shows how short signed this company is when CoPilot+PC are all the rage nowadays.

We can argue about pricing if you only care about a single metric, but come one, don't say RDNA2 is the better arch when it's only one (old) trick raster pony
 
Given how well they've performed in games (Halo Infinite & Starfield) with the more pathological cases of the predecessor API (ExecuteIndirect) and that there's publicized preliminary data from a big ISV (Epic Games) too, it's not hard to see which vendor has the most robust implementation of GPU driven rendering functionality ...
Is it? Starfield got about +30+50% of performance via patches after launch on Nv h/w which definitely should tell you who's got what there.

All of that is their own self-inflicted problem to make HW not redundant in gaming ...
This "self-inflicted problem" is why they control 80% of the market right now.

They're behind feature/integration wise but by no means do they have 'bad' HW design when they can have a smaller physical HW implementation for a given performance profile and they have more freedom to improve this aspect as well ...
Their h/w isn't smaller if you consider that it lacks 75% of what Nvidia h/w has and even then direct comparisons are often in Nvidia's favor in chip die area.

There's other GPU driven rendering centric games out there like the recently released Sony titles on PC and Starfield is still a good result for them...
You're cherry picking badly optimized titles while completely ignoring the fact that it's an exact opposite in the vast majority of all games which are being released.

They showed profiling data where empty dispatches cost 0.2 usec and a demo like their City sample can potentially have over 3000 empty shading bins. A frametime reduction of ~0.6ms can be extracted ...
Wow what a win! Probably easily counter-able by some driver level optimization - even if we entertain an idea that it's actually a win and not some fluke of early benchmarking of a beta s/w.

it is an achievement in of itself that they're even able to get within the realm with the highest performing parts very frequently

performance-pt-1920-1080.png
 
You could potentially fall behind in other (gaming) performance metrics and you risk creating a bubble for consumer AI HW (what will happen to DLSS wo/ AI HW ?) ...

That’s giving a lot of weight to hypothetical negative ML outcomes while ignoring whats actually happening today. If the market hysteria over AI pans out then there’s no bubble. It’s just another rendering feature in which Nvidia has a significant lead. DLSS has been successful even if you just look at the marketing benefit. Neural radiance caching may be the closest thing to magic we’ve seen in render tech.

You’re also implying that it’s a zero sum game where if you invest in ML rendering you need to sacrifice performance and features elsewhere. This isn’t true today given their lead in nearly every relevant metric - features, raster, RT, power efficiency. So I see no reason to assume that ML will negatively impact other areas. Nvidia can probably do ML and work graphs and RT and whatever next thing they come up with. They’re not short of cash.

I don't know if I'd describe the competition to be 'lagging' when they've scored a major API win with GPU driven rendering and they could do it again with bindless or somewhere else since the other player is distracted making AI work for rendering ...

You’re assigning wins based on PPT slides while dismissing actual wins in the real world. Let’s evaluate the state of GPU driven rendering and work graphs when those features actually come to market.
 
Last edited:
I am just speculating here, but what I would expect from an engineering perspective is that the "shared" parts of the design like the SM processor will be more and more optimised towards AI and the graphics engineering teams will struggle more to get features they want into the SM processor because they have to compete with AI features for the time of a relatively small number of specialised engineers/architects. On the other hand, if they want to design a new TPU with practically the same interface to the rest of the SM as the current one? Go wild. Want to improve raytracing, again without changing the rest of the SM too much? Again, go wild. The return-on-investment for improving perf/mm2 and perf/watt of the graphics-specific parts is still extremely good with ~$10B revenue per year... as long as it doesn't affect their AI roadmap.

SER likely needed some changes to the SM so it wasn’t all about AI.

What changes could they make to the core SM that would specifically benefit graphics? Smaller warp sizes could benefit RT but seems expensive. The bottlenecks are probably elsewhere - work distribution, memory pipeline etc. Larger register files perhaps? They’ve been at 64KB forever.
 
That’s giving a lot of weight to hypothetical negative ML outcomes while ignoring whats actually happening today. If the market hysteria over AI pans out then there’s no bubble. It’s just another rendering feature in which Nvidia has a significant lead. DLSS has been successful even if you just look at the marketing benefit. Neural radiance caching may be the closest thing to magic we’ve seen in render tech.

You’re also implying that it’s a zero sum game where if you invest in ML rendering you need to sacrifice performance and features elsewhere. This isn’t true today given their lead in nearly every relevant metric - features, raster, RT, power efficiency. So I see no reason to assume that ML will negatively impact other areas. Nvidia can probably do ML and work graphs and RT and whatever next thing they come up with. They’re not short of cash.
As far as I can tell AI/ML has yet to affect *how* we do real-time rendering at a "deeper level", e.g. geometry, shadows, texturing, sparse rendering methods, unique data structures, implicit surfaces, acceleration structures, rendering techniques/systems/pipelines (deferred/forward/object-space/etc), global illumination, graphics programming, or 'exotic' modifications to the graphics pipeline ...

Neural rendering has only affected how we perform temporal filtering and reconstruction so far and it's possible that it may never make any breakthroughs in other applications to real-time rendering.

Real-time tendering is too big of a field to have all of it's future improvements be pigeonholed into AI (besides RT). It is my hope that the industry finds more potential ways (like advanced GPU driven functionality) to displace AI HW so that we can truly move forward in a fundamental manner ...
 
As far as I can tell AI/ML has yet to affect *how* we do real-time rendering at a "deeper level"

You don’t consider light transport caching as part of the “how” of rendering? Either way ML doesn’t need to fundamentally change how rendering works in order to be useful. I really don’t know why you’re approaching this as an either/or thing. You can do ML plus all the other non-ML stuff too.

Real-time tendering is too big of a field to have all of it's future improvements be pigeonholed into AI (besides RT).

Has someone claimed that all future improvements will rely on AI? Again it’s not AI versus everything else. It’s everything else + AI.
 
You don’t consider light transport caching as part of the “how” of rendering? Either way ML doesn’t need to fundamentally change how rendering works in order to be useful. I really don’t know why you’re approaching this as an either/or thing. You can do ML plus all the other non-ML stuff too.

Has someone claimed that all future improvements will rely on AI? Again it’s not AI versus everything else. It’s everything else + AI.
Then why doesn't the tune of the leading graphics vendor reflects this doctrine ? What other killer rendering features are they even developing that's NOT based on AI ?
 
Then why doesn't the tune of the leading graphics vendor reflects this doctrine ? What other killer rendering features are they even developing that's NOT based on AI ?

There’s no way for us to know that. Did they talk about distributed geometry processing before Fermi or upscaling before Turing? We only learned what they were up to after those architectures launched. It’s impossible for us to know what they’re up to now. You’re drawing a lot of conclusions off little data.
 
SER likely needed some changes to the SM so it wasn’t all about AI.

What changes could they make to the core SM that would specifically benefit graphics? Smaller warp sizes could benefit RT but seems expensive. The bottlenecks are probably elsewhere - work distribution, memory pipeline etc. Larger register files perhaps? They’ve been at 64KB forever.
SER is a good point - the Hopper & Ada SMs feel like they possibly diverged from the same GA102 baseline (with a lot more changes on the Hopper side than the Ada side).

It's hard to tell what would help graphics most, but I think the "obvious" weakness of the current NVIDIA SM is their instruction fetch/decode pipeline, which still has bruteforce 128-bit fixed length instructions (they are only "saved" by large instruction caches and *extremely* aggressive prefetching, but that doesn't work for more unpredictable code like raytracing) and which can still only decode 1 instruction/clk. Need a dynamic branch? That's a minimum of 3 instructions in total. So that means you lose 3 FMAs - there's simply no way around that on their current architecture.

On the original V100, this wasn't an issue since FP32 could only use half the issue ports, so in a sense decode was "overspecced" except for that rare "perfectly balanced" workload between fp/int/control/etc... but since GA102 the opposite is now true. I think the 2xFP32 change was a very clear perf/mm2 win and it makes a lot of sense in isolation, but it just doesn't fit the rest of their pipeline very well, so they are definitely leaving some performance on the table. However... this doesn't matter *at all* for AI on Hopper as Tensor Cores are now asynchronous instructions that take 10s of cycles to execute. So fixing both the instruction length and instruction decode bottlenecks is understandably low priority for AI.
 
SER is a good point - the Hopper & Ada SMs feel like they possibly diverged from the same GA102 baseline (with a lot more changes on the Hopper side than the Ada side).
SER is a RT core change, there were some others in Ada (OMM, ODM, throughput improvements). RT core is a part of the SM but in the line of thinking of what changes were done to the main floating point pipelines it is outside of the scope.
 
Then why doesn't the tune of the leading graphics vendor reflects this doctrine ? What other killer rendering features are they even developing that's NOT based on AI ?

I'm sure you've seen this. What else would you like them to do?

 
SER is a RT core change, there were some others in Ada (OMM, ODM, throughput improvements). RT core is a part of the SM but in the line of thinking of what changes were done to the main floating point pipelines it is outside of the scope.

No, SER doesn't have anything to do with RT cores. It's sorting of SM warps - i.e. shader code.
 
I don't know what that means. Suggest you read the docs. SER is about shuffling threads running in the SM (e.g. hit shaders). It's not related to fixed function RT stuff.
All threads are running "in the SM". SER is applicable only to RT workloads because it is an RT h/w improvement. Otherwise it would be possible to use it for any shader in the SM.
 
All threads are running "in the SM". SER is applicable only to RT workloads because it is an RT h/w improvement. Otherwise it would be possible to use it for any shader in the SM.

Yes, the programmable shading RT workloads that run on the standard SM ALUs. That's what Arun was referring to.

You said "SER is an RT core change" which is incorrect.
 
Back
Top