NVIDIA discussion [2024]

DavidGraham · May 23, 2024

Lurkmass said:
Given how well they've performed in games (Halo Infinite & Starfield)

I don't see how well they performed in any of these games, their performance level is the same RDNA2 vs Ampere or RDNA3 vs Ada.

AMD Radeon RX 7900 XTX Review - Disrupting the GeForce RTX 4080

Navi 31 is here! The new $999 Radeon RX 7900 XTX in this review is AMD's new flagship card based on the wonderful chiplet technology that made the Ryzen Effect possible. In our testing we can confirm that the new RX 7900 XTX is indeed faster than the GeForce RTX 4080, but only with RT disabled.

www.techpowerup.com

Nvidia RTX 4080 Super Review - KitGuru

Today marks the end of a busy January launch period for the GPU market, as we check out the RTX 4080

www.kitguru.net

You are reaching with your conclusion based on false or insufficient data.

Lurkmass said:
and that there's publicized preliminary data from a big ISV (Epic Games) too

That publicized "beta" preliminary data contained no performance comparisons or even worthy general performance gains, the jury is still far away on this one.

Lurkmass said:
They're behind feature/integration wise but by no means do they have 'bad' HW design when they can have a smaller physical HW implementation for a given performance profile and they have more freedom to improve this aspect as well ...

These are generalized statements with no current data to back them up, no one said their HW design is bad, just they are lagging in visual and performance features.

Lurkmass · May 24, 2024

DavidGraham said:
I don't see how well they performed in any of these games, their performance level is the same RDNA2 vs Ampere or RDNA3 vs Ada.

You are reaching with your conclusion based on false or insufficient data.

There's other GPU driven rendering centric games out there like the recently released Sony titles on PC and Starfield is still a good result for them...

DavidGraham said:
That publicized "beta" preliminary data contained no performance comparisons or even worthy general performance gains, the jury is still far away on this one.

They showed profiling data where empty dispatches cost 0.2 usec and a demo like their City sample can potentially have over 3000 empty shading bins. A frametime reduction of ~0.6ms can be extracted ...

DavidGraham said:
These are generalized statements with no current data to back them up, no one said their HW design is bad, just they are lagging in visual and performance features.

Well given the fact that they're not using exotic memory standards like GDDR6X and have an asymmetric process technology implementation per chiplet with a much smaller main graphics die, it is an achievement in of itself that they're even able to get within the realm with the highest performing parts very frequently. Their advantage in HW design simplicity manifests even in smaller parts ...

DavidGraham · May 24, 2024

Lurkmass said:
There's other GPU driven rendering centric games out there like the recently released Sony titles on PC and Starfield is still a good result for them...

We are practically arguing semantics here, the recent released Sony titles are based on old cross platform titles that are constantly being hammered by the famous "DX12 tax", and are hardly "modern" by any standards. For that you need recent Sony games with modern features like Ratchet and Clank Rift Apart, Returnal and Helldivers 2, where RDNA3 gets decimated at max settings.

You seem to be ignoring the reality here, PS5 Pro is releasing with increased RT capabilities, meaning all RDNA3/RDNA2 are going to be even more decimated through their lackluster RT performance, it won't even matter if they have a slightly more robust WorkGraphs implementation (which is questionable) when they lose to NVIDIA by a 50% margins.

Lurkmass said:
They showed profiling data where empty dispatches cost 0.2 usec and a demo like their City sample can potentially have over 3000 empty shading bins. A frametime reduction of ~0.6ms can be extracted ...

I see the word "demo" and lots of "can" words, we shall see 2 years later when this becomes reality in an actual game. Let's compare performance between vendors then.

Lurkmass said:
Well given the fact that they're not using exotic memory standards like GDDR6X and have an asymmetric process technology implementation per chiplet with a much smaller main graphics die, it is an achievement in of itself that they're even able to get within the realm with the highest performing parts very frequently.

The AD103/4080 Super is a much smaller die and it trades blows with the 7900XTX in raster and blows past it in actual ray tracing loads, with vastly better power efficiency. As for the 4090, it's a cut down part that NVIDIA didn't even bother release the full 4090Ti die, which can easily add 15% to 20% more raster performance. But why would NVIDIA do that when AD103 is more than enough. As for actual modern PC games performance with ray tracing/path tracing, well we don't have to talk about that, do we?

xpea · May 24, 2024

Lurkmass said:
Well given the fact that they're not using exotic memory standards like GDDR6X and have an asymmetric process technology implementation per chiplet with a much smaller main graphics die, it is an achievement in of itself that they're even able to get within the realm with the highest performing parts very frequently. Their advantage in HW design simplicity manifests even in smaller parts ...

Wow that's some nice cherry picking here. How can you take a single metric (raster) to evaluate a GPU arch and claim one is better? What about RT? What about 3D rendering? What about upscaling? What about power consumption? What about video encoders? What about the ecosystem that NV hardware enables? And what about the hot AI topic ?
Just published yesterday, see how your "superior" arch fades in AI in lot of workloads:

NVIDIA vs. AMD and workstation vs. consumer: Who has the edge in the AI graphics card benchmarks? | igor´sLAB

Today we’re doing something completely different, because for once it’s not about gaming, which is getting boring by now, but about the new golden calf, namely AI. NVIDIA’s record result of 26.04…

www.igorslab.de

Basically, it's just embarrassing for AMD and it shows how short signed this company is when CoPilot+PC are all the rage nowadays.

We can argue about pricing if you only care about a single metric, but come one, don't say RDNA2 is the better arch when it's only one (old) trick raster pony

DegustatoR · May 24, 2024

Lurkmass said:
Given how well they've performed in games (Halo Infinite & Starfield) with the more pathological cases of the predecessor API (ExecuteIndirect) and that there's publicized preliminary data from a big ISV (Epic Games) too, it's not hard to see which vendor has the most robust implementation of GPU driven rendering functionality ...

Is it? Starfield got about +30+50% of performance via patches after launch on Nv h/w which definitely should tell you who's got what there.

Lurkmass said:
All of that is their own self-inflicted problem to make HW not redundant in gaming ...

This "self-inflicted problem" is why they control 80% of the market right now.

Lurkmass said:
They're behind feature/integration wise but by no means do they have 'bad' HW design when they can have a smaller physical HW implementation for a given performance profile and they have more freedom to improve this aspect as well ...

Their h/w isn't smaller if you consider that it lacks 75% of what Nvidia h/w has and even then direct comparisons are often in Nvidia's favor in chip die area.

Lurkmass said:
There's other GPU driven rendering centric games out there like the recently released Sony titles on PC and Starfield is still a good result for them...

You're cherry picking badly optimized titles while completely ignoring the fact that it's an exact opposite in the vast majority of all games which are being released.

Lurkmass said:
They showed profiling data where empty dispatches cost 0.2 usec and a demo like their City sample can potentially have over 3000 empty shading bins. A frametime reduction of ~0.6ms can be extracted ...

Wow what a win! Probably easily counter-able by some driver level optimization - even if we entertain an idea that it's actually a win and not some fluke of early benchmarking of a beta s/w.

Lurkmass said:
it is an achievement in of itself that they're even able to get within the realm with the highest performing parts very frequently

trinibwoy · May 24, 2024

Lurkmass said:
You could potentially fall behind in other (gaming) performance metrics and you risk creating a bubble for consumer AI HW (what will happen to DLSS wo/ AI HW ?) ...

That’s giving a lot of weight to hypothetical negative ML outcomes while ignoring whats actually happening today. If the market hysteria over AI pans out then there’s no bubble. It’s just another rendering feature in which Nvidia has a significant lead. DLSS has been successful even if you just look at the marketing benefit. Neural radiance caching may be the closest thing to magic we’ve seen in render tech.

You’re also implying that it’s a zero sum game where if you invest in ML rendering you need to sacrifice performance and features elsewhere. This isn’t true today given their lead in nearly every relevant metric - features, raster, RT, power efficiency. So I see no reason to assume that ML will negatively impact other areas. Nvidia can probably do ML and work graphs and RT and whatever next thing they come up with. They’re not short of cash.

I don't know if I'd describe the competition to be 'lagging' when they've scored a major API win with GPU driven rendering and they could do it again with bindless or somewhere else since the other player is distracted making AI work for rendering ...

You’re assigning wins based on PPT slides while dismissing actual wins in the real world. Let’s evaluate the state of GPU driven rendering and work graphs when those features actually come to market.

trinibwoy · May 24, 2024

Arun said:
I am just speculating here, but what I would expect from an engineering perspective is that the "shared" parts of the design like the SM processor will be more and more optimised towards AI and the graphics engineering teams will struggle more to get features they want into the SM processor because they have to compete with AI features for the time of a relatively small number of specialised engineers/architects. On the other hand, if they want to design a new TPU with practically the same interface to the rest of the SM as the current one? Go wild. Want to improve raytracing, again without changing the rest of the SM too much? Again, go wild. The return-on-investment for improving perf/mm2 and perf/watt of the graphics-specific parts is still extremely good with ~$10B revenue per year... as long as it doesn't affect their AI roadmap.

SER likely needed some changes to the SM so it wasn’t all about AI.

What changes could they make to the core SM that would specifically benefit graphics? Smaller warp sizes could benefit RT but seems expensive. The bottlenecks are probably elsewhere - work distribution, memory pipeline etc. Larger register files perhaps? They’ve been at 64KB forever.

Lurkmass · May 24, 2024

trinibwoy said:
That’s giving a lot of weight to hypothetical negative ML outcomes while ignoring whats actually happening today. If the market hysteria over AI pans out then there’s no bubble. It’s just another rendering feature in which Nvidia has a significant lead. DLSS has been successful even if you just look at the marketing benefit. Neural radiance caching may be the closest thing to magic we’ve seen in render tech.

You’re also implying that it’s a zero sum game where if you invest in ML rendering you need to sacrifice performance and features elsewhere. This isn’t true today given their lead in nearly every relevant metric - features, raster, RT, power efficiency. So I see no reason to assume that ML will negatively impact other areas. Nvidia can probably do ML and work graphs and RT and whatever next thing they come up with. They’re not short of cash.

As far as I can tell AI/ML has yet to affect *how* we do real-time rendering at a "deeper level", e.g. geometry, shadows, texturing, sparse rendering methods, unique data structures, implicit surfaces, acceleration structures, rendering techniques/systems/pipelines (deferred/forward/object-space/etc), global illumination, graphics programming, or 'exotic' modifications to the graphics pipeline ...

Neural rendering has only affected how we perform temporal filtering and reconstruction so far and it's possible that it may never make any breakthroughs in other applications to real-time rendering.

Real-time tendering is too big of a field to have all of it's future improvements be pigeonholed into AI (besides RT). It is my hope that the industry finds more potential ways (like advanced GPU driven functionality) to displace AI HW so that we can truly move forward in a fundamental manner ...

trinibwoy · May 24, 2024

Lurkmass said:
As far as I can tell AI/ML has yet to affect *how* we do real-time rendering at a "deeper level"

You don’t consider light transport caching as part of the “how” of rendering? Either way ML doesn’t need to fundamentally change how rendering works in order to be useful. I really don’t know why you’re approaching this as an either/or thing. You can do ML plus all the other non-ML stuff too.

Lurkmass said:
Real-time tendering is too big of a field to have all of it's future improvements be pigeonholed into AI (besides RT).

Has someone claimed that all future improvements will rely on AI? Again it’s not AI versus everything else. It’s everything else + AI.

Lurkmass · May 24, 2024

trinibwoy said:
You don’t consider light transport caching as part of the “how” of rendering? Either way ML doesn’t need to fundamentally change how rendering works in order to be useful. I really don’t know why you’re approaching this as an either/or thing. You can do ML plus all the other non-ML stuff too.

Has someone claimed that all future improvements will rely on AI? Again it’s not AI versus everything else. It’s everything else + AI.

Then why doesn't the tune of the leading graphics vendor reflects this doctrine ? What other killer rendering features are they even developing that's NOT based on AI ?

trinibwoy · May 24, 2024

Lurkmass said:
Then why doesn't the tune of the leading graphics vendor reflects this doctrine ? What other killer rendering features are they even developing that's NOT based on AI ?

There’s no way for us to know that. Did they talk about distributed geometry processing before Fermi or upscaling before Turing? We only learned what they were up to after those architectures launched. It’s impossible for us to know what they’re up to now. You’re drawing a lot of conclusions off little data.

Arun · May 24, 2024

trinibwoy said:
SER likely needed some changes to the SM so it wasn’t all about AI.

What changes could they make to the core SM that would specifically benefit graphics? Smaller warp sizes could benefit RT but seems expensive. The bottlenecks are probably elsewhere - work distribution, memory pipeline etc. Larger register files perhaps? They’ve been at 64KB forever.

SER is a good point - the Hopper & Ada SMs feel like they possibly diverged from the same GA102 baseline (with a lot more changes on the Hopper side than the Ada side).

It's hard to tell what would help graphics most, but I think the "obvious" weakness of the current NVIDIA SM is their instruction fetch/decode pipeline, which still has bruteforce 128-bit fixed length instructions (they are only "saved" by large instruction caches and *extremely* aggressive prefetching, but that doesn't work for more unpredictable code like raytracing) and which can still only decode 1 instruction/clk. Need a dynamic branch? That's a minimum of 3 instructions in total. So that means you lose 3 FMAs - there's simply no way around that on their current architecture.

On the original V100, this wasn't an issue since FP32 could only use half the issue ports, so in a sense decode was "overspecced" except for that rare "perfectly balanced" workload between fp/int/control/etc... but since GA102 the opposite is now true. I think the 2xFP32 change was a very clear perf/mm2 win and it makes a lot of sense in isolation, but it just doesn't fit the rest of their pipeline very well, so they are definitely leaving some performance on the table. However... this doesn't matter *at all* for AI on Hopper as Tensor Cores are now asynchronous instructions that take 10s of cycles to execute. So fixing both the instruction length and instruction decode bottlenecks is understandably low priority for AI.

DegustatoR · May 24, 2024

Arun said:
SER is a good point - the Hopper & Ada SMs feel like they possibly diverged from the same GA102 baseline (with a lot more changes on the Hopper side than the Ada side).

SER is a RT core change, there were some others in Ada (OMM, ODM, throughput improvements). RT core is a part of the SM but in the line of thinking of what changes were done to the main floating point pipelines it is outside of the scope.

trinibwoy · May 24, 2024

Lurkmass said:
Then why doesn't the tune of the leading graphics vendor reflects this doctrine ? What other killer rendering features are they even developing that's NOT based on AI ?

I'm sure you've seen this. What else would you like them to do?

Advancing GPU-Driven Rendering with Work Graphs in Direct3D 12 | NVIDIA Technical Blog

GPU-driven rendering has long been a major goal for many game applications. It enables better scalability for handling large virtual scenes and reduces cases where the CPU could bottleneck a game’s…

developer.nvidia.com

trinibwoy · May 24, 2024

DegustatoR said:
SER is a RT core change, there were some others in Ada (OMM, ODM, throughput improvements). RT core is a part of the SM but in the line of thinking of what changes were done to the main floating point pipelines it is outside of the scope.

No, SER doesn't have anything to do with RT cores. It's sorting of SM warps - i.e. shader code.

DegustatoR · May 24, 2024

trinibwoy said:
No, SER doesn't have anything to do with RT cores. It's sorting of SM warps - i.e. shader code.

SM warps are not "shader core". RT warps are also SM warps.

trinibwoy · May 24, 2024

DegustatoR said:
SM warps are not "shader core". RT warps are also SM warps.

I don't know what that means. Suggest you read the docs. SER is about shuffling threads running in the SM (e.g. hit shaders). It's not related to fixed function RT stuff.

DegustatoR · May 24, 2024

trinibwoy said:
I don't know what that means. Suggest you read the docs. SER is about shuffling threads running in the SM (e.g. hit shaders). It's not related to fixed function RT stuff.

All threads are running "in the SM". SER is applicable only to RT workloads because it is an RT h/w improvement. Otherwise it would be possible to use it for any shader in the SM.

trinibwoy · May 24, 2024

DegustatoR said:
All threads are running "in the SM". SER is applicable only to RT workloads because it is an RT h/w improvement. Otherwise it would be possible to use it for any shader in the SM.

Yes, the programmable shading RT workloads that run on the standard SM ALUs. That's what Arun was referring to.

You said "SER is an RT core change" which is incorrect.

DegustatoR · May 24, 2024

trinibwoy said:
Yes, the programmable shading RT workloads that run on the standard SM ALUs. That's what Arun was referring to.

You said "SER is an RT core change" which is incorrect.

Programmable RT shading depends on the output of RT h/w.

NVIDIA discussion [2024]

DavidGraham

AMD Radeon RX 7900 XTX Review - Disrupting the GeForce RTX 4080

Nvidia RTX 4080 Super Review - KitGuru

Lurkmass

DavidGraham

xpea

NVIDIA vs. AMD and workstation vs. consumer: Who has the edge in the AI graphics card benchmarks? | igor´sLAB

DegustatoR

trinibwoy

Meh

trinibwoy

Meh

Lurkmass

trinibwoy

Meh

Lurkmass

trinibwoy

Meh

Arun

Unknown.

DegustatoR

trinibwoy

Meh

Advancing GPU-Driven Rendering with Work Graphs in Direct3D 12 | NVIDIA Technical Blog

trinibwoy

Meh

DegustatoR

trinibwoy

Meh

DegustatoR

trinibwoy

Meh

DegustatoR

Similar threads