NVIDIA discussion [2024]

Lurkmass · May 24, 2024

trinibwoy said:
I'm sure you've seen this. What else would you like them to do?

Advancing GPU-Driven Rendering with Work Graphs in Direct3D 12 | NVIDIA Technical Blog

GPU-driven rendering has long been a major goal for many game applications. It enables better scalability for handling large virtual scenes and reduces cases where the CPU could bottleneck a game’s…

developer.nvidia.com

How about they give us fully bindless (REAL pointers in shaders) hardware too because the recent Marvel's Spider-Man games in RT mode do ~1 million descriptor updates/copies per frame and SM6.6 style dynamic resource binding model (ResourceDescriptorHeap/SamplerDescriptorHeap) isn't good enough for that purpose ...

It'd be a nice bonus as well to support RGBE encoded render target formats as well because Xbox developers have been raving for this feature too even though it's only available on recent Xbox consoles ... (it was that good to expose a PC D3D12 extension for it even if only one HW vendor currently supports it) ...

DavidGraham · May 24, 2024

trinibwoy said:
You’re assigning wins based on PPT slides

Not even AMD considers this a major win, so I don't know where the extreme enthusiasm is coming from? We've seen Variable Rate Shading falter, Mesh Shaders, Sampler Feedback, DirectStorage take ages to be adopted, even DX12/Vulkan themselves were a trainwreck that arguably benefitted no one in PC gaming, and gave no advantage to any single hardware vendor while giving the users/gamers a whole set of new headaches and inconveniences, what makes anyone think that WorkGraphs would any different?

The only somewhat successful API in recent years has been DXR, because it had the backing of NVIDIA behind it, which allowed it so spread quickly among AAA and AA developers, and allowed tangible visual improvements to be made after DX12 stalled on that front for years.

Lurkmass said:
How about

How about AMD steps up their game to achieve user experience parity with NVIDIA, because as of right now, the gulf of experience between NVIDIA and AMD is so vast it's not even funny.

An enthusiast gamer purchasing an RTX GPU will have vastly superior AI upscaling, AI denoising, AI HDR, significantly faster RT performance, lower latency as well as frame gen which is just available in many many more games. So now the user has faster frames, lower latency and better visuals (due to better upscaling, better denoising and better HDR) not available on AMD GPUs. This is the difference that counts to the user. Literally nothing else counts.

Lurkmass said:
because the recent Marvel's Spider-Man games in RT mode do ~1 million

hmm, the same Spider Man where the 3080 is 65% faster than 6800XT using max RT settings, and 45% faster using medium RT settings?

Even without heavy scenes, the 3080Ti is 45% faster than 6900XT at both 1440p and 2160p using max RT settings.

Marvel's Spider-Man Remastered тест GPU/CPU | Action / FPS / TPS | Тест GPU

Marvel's Spider-Man Remastered – это обновленное и переработанное издание игры, который содержит глобальные новшества в в

gamegpu.com

Lurkmass said:
like advanced GPU driven functionality

AI is GPU driven rendering, the ultimate goal is detach rendering from relying on the CPU with it's added latency and threading issues, and AI is one way to do this.

Xmas · May 24, 2024

trinibwoy said:
Larger register files perhaps? They’ve been at 64KB forever.

256KiB (or 64Ki * 32b) per SM.

trinibwoy · May 24, 2024

DavidGraham said:
Not even AMD considers this a major win, so I don't know where the extreme enthusiasm is coming from?

Yeah not holding my breath on this one either. It sounds good on paper and I hope it works out but that’s all it is right now. Paper. The most tangible advances in the past few years have come from dynamic GI driven by Nvidia and Epic.

DavidGraham said:
GPU driven rendering

GPU driven in this context means work created on the GPU for the GPU. DLSS/upscaling are still CPU issued workloads that run on the GPU.

trinibwoy · May 24, 2024

Xmas said:
256KiB (or 64Ki * 32b) per SM.

Yeah 64KB per scheduler.

DavidGraham · May 24, 2024

NVIDIA began selling H20 in large numbers in China, but they face stiff competition from Huawei. So, NVIDIA is cutting prices.

The H20 became widely available in China last month, with deliveries to clients in little over a month, the sources said

Some of China's technology giants have already made orders, with Alibaba ordering over 30,000 H20 chips

Dylan Patel, founder of research group SemiAnalysis, said close to a million H20 chips will be shipped to China in the second half of 2024 and Nvidia must compete with Huawei on pricing

The H20 cost more than an H100 to manufacture due to its higher memory capacity," Patel said, adding that it is being sold, however, at half the price of the H100

Exclusive: Nvidia cuts China prices in Huawei chip fight, sources say

Nvidia's most advanced AI chip it developed for the China market has got off to a weak start, with abundant supply forcing it to be priced below a rival chip from Chinese tech giant Huawei, according to sources familiar with the matter. The flattening prices underscore the challenges Nvidia's...

finance.yahoo.com

Lurkmass · May 25, 2024

trinibwoy said:
Yeah not holding my breath on this one either. It sounds good on paper and I hope it works out but that’s all it is right now. Paper.

It's been hailed as the "biggest thing since compute shaders" and even EA's SEED division thinks it's the holy grail to GPU driven rendering. There are a number of lock-based rendering algorithms out there that requires some forward progress guarantees (traditionally no gfx APIs/shading languages made a solid guarantee of this property) only afforded with that functionality ...

Also if you people want more 'data' on work graphs, apparently the one of the author's of the VSM blogpost and an Activision employee are in unanimous agreement with the WB Games (ex-Ascendant Studios) employee who experimented on the feature with inline RT that Nvidia's implementation has bad/low occupancy ...

DegustatoR · May 25, 2024

Any discussion about some code's "occupancy" has zero meaning without knowing how well said code is optimized for the h/w in question.

Lurkmass · May 25, 2024

DegustatoR said:
Any discussion about some code's "occupancy" has zero meaning without knowing how well said code is optimized for the h/w in question.

You can just ask them about it for more details (edited link) instead but I'll leave it at that ...

DegustatoR · May 25, 2024

Lurkmass said:
You can just ask them about it for more details instead but I'll leave it at that ...

You've linked to a private repo and a user account.

Generally speaking if some code has low occupancy then it is badly optimized. The h/w may not allow to run the code more optimally due to its own issues but then you have to ask yourself - why force the h/w to run something it is bad at? Applies to all vendors.

DavidGraham · May 25, 2024

Lurkmass said:
experimented on the feature with inline RT that Nvidia's implementation has bad/low occupancy ...

DegustatoR said:
Any discussion about some code's "occupancy" has zero meaning without knowing how well said code is optimized for the h/w in question.

We've discussed the subject of "occupancy" before, low occupancy can mean good performance, and high occupancy can mean bad performance, depending on what you are doing of course.

So occupancy going up can mean performance goes down, because of that complex interplay between getting the most out of the GPU’s execution resources while balancing its ability to service memory requests and absorb them well inside its cache hierarchy. It’s such a difficult thing to influence and balance as the GPU programmer, especially on PC where the problem space spans many GPUs from many vendors, and where the choices the shader compiler stack makes to compile your shader can change between driver updates.

Occupancy explained

In this blog post we will try to demystify what exactly occupancy is, which factors limit occupancy, and how to use tools to identify occupancy-limited workloads.

gpuopen.com

DegustatoR · May 25, 2024

DavidGraham said:
We've discussed the subject of "occupancy" before, low occupancy can mean good performance, and high occupancy can mean bad performance, depending on what you are doing of course.

This is mostly relevant for things like async compute where an async workload needs to fit in "free" h/w slots or it would affect main workload performance which could result in a worse overall performance. For the main workload though the higher is the occupancy the better.

Lurkmass · May 25, 2024

DavidGraham said:
We've discussed the subject of "occupancy" before, low occupancy can mean good performance, and high occupancy can mean bad performance, depending on what you are doing of course.

Or it could just mean that a particular feature/code is hitting the driver/HW's "slow path" and it doesn't really matter how much you optimize it

To this day, even on their latest architectures doing async compute STILL causes WFI (wait for idle) which means means they have an implicit barrier during every switching instance between every graphics/compute dispatch ...

DavidGraham · May 25, 2024

DegustatoR said:
For the main workload though the higher is the occupancy the better

Again no, this applies to main workload as well as others. This is documented by many developers and by both IHVs. Check the links I posted.

Lurkmass said:
Or it could just mean that a particular feature/code is hitting the driver/HW's "slow path" and it doesn't really matter how much you optimize it

Could be, impossible to determine without knowing the specifics of what we are talking about. But "low occupancy" as a general blanket statement is meaningless, because it can actually mean a good thing.

del42sa · May 26, 2024

trinibwoy said:
Like others you’re jumping to the conclusion that Nvidia doesn’t care about gaming and is willing to hand it over to the competition by pricing their cards out of control. This conclusion is based on their success in enterprise AI even though the two markets are unrelated. However there’s zero evidence to support this line of thinking. I’m not trying to convince you otherwise.

we will see, when Blackwell price emerges

trinibwoy · May 26, 2024

Lurkmass said:
It's been hailed as the "biggest thing since compute shaders" and even EA's SEED division thinks it's the holy grail to GPU driven rendering. There are a number of lock-based rendering algorithms out there that requires some forward progress guarantees (traditionally no gfx APIs/shading languages made a solid guarantee of this property) only afforded with that functionality ...

We aren’t questioning the potential benefits of work graphs. Let’s count those chickens after they hatch though. You implied that Nvidia isn’t paying it sufficient lip service because of their focus on AI (and presumably RT) but that’s not a very reliable metric.

If as you say it’s really a panacea then even Nvidia’s alleged apathy won’t stand in its way if the other IHVs push for it.

xpea · May 26, 2024

trinibwoy said:
We aren’t questioning the potential benefits of work graphs. Let’s count those chickens after they hatch though. You implied that Nvidia isn’t paying it sufficient lip service because of their focus on AI (and presumably RT) but that’s not a very reliable metric.

If as you say it’s really a panacea then even Nvidia’s alleged apathy won’t stand in its way if the other IHVs push for it.

Totally Agree.
And am I the only one getting DX12 vibes again? When half a decade after launch, most of the games ran better on the DX11 path ^_^
Sorry to be that guy, but I trust IHV HW engineers much more than the army of "smart" developers who promise the moon... The reality is, except some 3D deccelerators from the pre programmable shaders era (cough S3 Virge cough), Hardware blocks on silicon are always faster than the equivalent software technique that brings the same effect. Eventually, years later, software takes advantage of the flexibility and performance bump offered by new hardware to catch up and do better, but it's always much much later. And innovation is not a waiting game...

troyan · May 26, 2024

The problem is that no developer will do multiple paths for IHVs. So we get these unoptimized consoles implementation based on outdated IP. Pathtracing is great because this is just brute force. Better hardware wins.

trinibwoy · May 26, 2024

troyan said:
The problem is that no developer will do multiple paths for IHVs. So we get these unoptimized consoles implementation based on outdated IP. Pathtracing is great because this is just brute force. Better hardware wins.

It’s not that simple. You can also optimize path tracing to run better on certain architectures. The ray tracing step is kinda brute force but there are BVH builds, ray sorting, alpha optimizations and of course the material shading that runs on the standard FP ALUs.

DegustatoR · May 26, 2024

xpea said:
And am I the only one getting DX12 vibes again?

GPUWG are just another D3D12 feature, nothing more. It won't suddenly be used everywhere and it won't be always a win in comparison to other scheduling approaches (even on the supposedly "superior" AMD h/w). It's first iteration is also somewhat limited in what it can do from where. It basically looks like a feature made at Epic's request to improve Nanite workloads at the moment.

NVIDIA discussion [2024]

Lurkmass

Advancing GPU-Driven Rendering with Work Graphs in Direct3D 12 | NVIDIA Technical Blog

DavidGraham

Marvel's Spider-Man Remastered тест GPU/CPU | Action / FPS / TPS | Тест GPU

Xmas

Porous

trinibwoy

Meh

trinibwoy

Meh

DavidGraham

Exclusive: Nvidia cuts China prices in Huawei chip fight, sources say

Lurkmass

DegustatoR

Lurkmass

DegustatoR

DavidGraham

Occupancy explained

DegustatoR

Lurkmass

DavidGraham

del42sa

trinibwoy

Meh

xpea

troyan

trinibwoy

Meh

DegustatoR

Similar threads