NVIDIA discussion [2024]

I'm sure you've seen this. What else would you like them to do?

How about they give us fully bindless (REAL pointers in shaders) hardware too because the recent Marvel's Spider-Man games in RT mode do ~1 million descriptor updates/copies per frame and SM6.6 style dynamic resource binding model (ResourceDescriptorHeap/SamplerDescriptorHeap) isn't good enough for that purpose ...

It'd be a nice bonus as well to support RGBE encoded render target formats as well because Xbox developers have been raving for this feature too even though it's only available on recent Xbox consoles ... (it was that good to expose a PC D3D12 extension for it even if only one HW vendor currently supports it) ...
 
You’re assigning wins based on PPT slides
Not even AMD considers this a major win, so I don't know where the extreme enthusiasm is coming from? We've seen Variable Rate Shading falter, Mesh Shaders, Sampler Feedback, DirectStorage take ages to be adopted, even DX12/Vulkan themselves were a trainwreck that arguably benefitted no one in PC gaming, and gave no advantage to any single hardware vendor while giving the users/gamers a whole set of new headaches and inconveniences, what makes anyone think that WorkGraphs would any different?

The only somewhat successful API in recent years has been DXR, because it had the backing of NVIDIA behind it, which allowed it so spread quickly among AAA and AA developers, and allowed tangible visual improvements to be made after DX12 stalled on that front for years.

How about
How about AMD steps up their game to achieve user experience parity with NVIDIA, because as of right now, the gulf of experience between NVIDIA and AMD is so vast it's not even funny.

An enthusiast gamer purchasing an RTX GPU will have vastly superior AI upscaling, AI denoising, AI HDR, significantly faster RT performance, lower latency as well as frame gen which is just available in many many more games. So now the user has faster frames, lower latency and better visuals (due to better upscaling, better denoising and better HDR) not available on AMD GPUs. This is the difference that counts to the user. Literally nothing else counts.

because the recent Marvel's Spider-Man games in RT mode do ~1 million
hmm, the same Spider Man where the 3080 is 65% faster than 6800XT using max RT settings, and 45% faster using medium RT settings?


Even without heavy scenes, the 3080Ti is 45% faster than 6900XT at both 1440p and 2160p using max RT settings.

like advanced GPU driven functionality
AI is GPU driven rendering, the ultimate goal is detach rendering from relying on the CPU with it's added latency and threading issues, and AI is one way to do this.
 
Last edited:
Not even AMD considers this a major win, so I don't know where the extreme enthusiasm is coming from?

Yeah not holding my breath on this one either. It sounds good on paper and I hope it works out but that’s all it is right now. Paper. The most tangible advances in the past few years have come from dynamic GI driven by Nvidia and Epic.

GPU driven rendering

GPU driven in this context means work created on the GPU for the GPU. DLSS/upscaling are still CPU issued workloads that run on the GPU.
 
NVIDIA began selling H20 in large numbers in China, but they face stiff competition from Huawei. So, NVIDIA is cutting prices.

The H20 became widely available in China last month, with deliveries to clients in little over a month, the sources said
Some of China's technology giants have already made orders, with Alibaba ordering over 30,000 H20 chips
Dylan Patel, founder of research group SemiAnalysis, said close to a million H20 chips will be shipped to China in the second half of 2024 and Nvidia must compete with Huawei on pricing
The H20 cost more than an H100 to manufacture due to its higher memory capacity," Patel said, adding that it is being sold, however, at half the price of the H100

 
Last edited:
Yeah not holding my breath on this one either. It sounds good on paper and I hope it works out but that’s all it is right now. Paper.
It's been hailed as the "biggest thing since compute shaders" and even EA's SEED division thinks it's the holy grail to GPU driven rendering. There are a number of lock-based rendering algorithms out there that requires some forward progress guarantees (traditionally no gfx APIs/shading languages made a solid guarantee of this property) only afforded with that functionality ...

Also if you people want more 'data' on work graphs, apparently the one of the author's of the VSM blogpost and an Activision employee are in unanimous agreement with the WB Games (ex-Ascendant Studios) employee who experimented on the feature with inline RT that Nvidia's implementation has bad/low occupancy ...
 
You can just ask them about it for more details instead but I'll leave it at that ...
You've linked to a private repo and a user account.

Generally speaking if some code has low occupancy then it is badly optimized. The h/w may not allow to run the code more optimally due to its own issues but then you have to ask yourself - why force the h/w to run something it is bad at? Applies to all vendors.
 
experimented on the feature with inline RT that Nvidia's implementation has bad/low occupancy ...
Any discussion about some code's "occupancy" has zero meaning without knowing how well said code is optimized for the h/w in question.
We've discussed the subject of "occupancy" before, low occupancy can mean good performance, and high occupancy can mean bad performance, depending on what you are doing of course.

So occupancy going up can mean performance goes down, because of that complex interplay between getting the most out of the GPU’s execution resources while balancing its ability to service memory requests and absorb them well inside its cache hierarchy. It’s such a difficult thing to influence and balance as the GPU programmer, especially on PC where the problem space spans many GPUs from many vendors, and where the choices the shader compiler stack makes to compile your shader can change between driver updates.

 
We've discussed the subject of "occupancy" before, low occupancy can mean good performance, and high occupancy can mean bad performance, depending on what you are doing of course.
This is mostly relevant for things like async compute where an async workload needs to fit in "free" h/w slots or it would affect main workload performance which could result in a worse overall performance. For the main workload though the higher is the occupancy the better.
 
We've discussed the subject of "occupancy" before, low occupancy can mean good performance, and high occupancy can mean bad performance, depending on what you are doing of course.
Or it could just mean that a particular feature/code is hitting the driver/HW's "slow path" and it doesn't really matter how much you optimize it

To this day, even on their latest architectures doing async compute STILL causes WFI (wait for idle) which means means they have an implicit barrier during every switching instance between every graphics/compute dispatch ...
 
For the main workload though the higher is the occupancy the better
Again no, this applies to main workload as well as others. This is documented by many developers and by both IHVs. Check the links I posted.
Or it could just mean that a particular feature/code is hitting the driver/HW's "slow path" and it doesn't really matter how much you optimize it
Could be, impossible to determine without knowing the specifics of what we are talking about. But "low occupancy" as a general blanket statement is meaningless, because it can actually mean a good thing.
 
Like others you’re jumping to the conclusion that Nvidia doesn’t care about gaming and is willing to hand it over to the competition by pricing their cards out of control. This conclusion is based on their success in enterprise AI even though the two markets are unrelated. However there’s zero evidence to support this line of thinking. I’m not trying to convince you otherwise.
we will see, when Blackwell price emerges
 
It's been hailed as the "biggest thing since compute shaders" and even EA's SEED division thinks it's the holy grail to GPU driven rendering. There are a number of lock-based rendering algorithms out there that requires some forward progress guarantees (traditionally no gfx APIs/shading languages made a solid guarantee of this property) only afforded with that functionality ...

We aren’t questioning the potential benefits of work graphs. Let’s count those chickens after they hatch though. You implied that Nvidia isn’t paying it sufficient lip service because of their focus on AI (and presumably RT) but that’s not a very reliable metric.

If as you say it’s really a panacea then even Nvidia’s alleged apathy won’t stand in its way if the other IHVs push for it.
 
We aren’t questioning the potential benefits of work graphs. Let’s count those chickens after they hatch though. You implied that Nvidia isn’t paying it sufficient lip service because of their focus on AI (and presumably RT) but that’s not a very reliable metric.

If as you say it’s really a panacea then even Nvidia’s alleged apathy won’t stand in its way if the other IHVs push for it.
Totally Agree.
And am I the only one getting DX12 vibes again? When half a decade after launch, most of the games ran better on the DX11 path ^_^
Sorry to be that guy, but I trust IHV HW engineers much more than the army of "smart" developers who promise the moon... The reality is, except some 3D deccelerators from the pre programmable shaders era (cough S3 Virge cough), Hardware blocks on silicon are always faster than the equivalent software technique that brings the same effect. Eventually, years later, software takes advantage of the flexibility and performance bump offered by new hardware to catch up and do better, but it's always much much later. And innovation is not a waiting game...
 
Last edited:
The problem is that no developer will do multiple paths for IHVs. So we get these unoptimized consoles implementation based on outdated IP. Pathtracing is great because this is just brute force. Better hardware wins.
 
The problem is that no developer will do multiple paths for IHVs. So we get these unoptimized consoles implementation based on outdated IP. Pathtracing is great because this is just brute force. Better hardware wins.

It’s not that simple. You can also optimize path tracing to run better on certain architectures. The ray tracing step is kinda brute force but there are BVH builds, ray sorting, alpha optimizations and of course the material shading that runs on the standard FP ALUs.
 
And am I the only one getting DX12 vibes again?
GPUWG are just another D3D12 feature, nothing more. It won't suddenly be used everywhere and it won't be always a win in comparison to other scheduling approaches (even on the supposedly "superior" AMD h/w). It's first iteration is also somewhat limited in what it can do from where. It basically looks like a feature made at Epic's request to improve Nanite workloads at the moment.
 
Back
Top