NVIDIA discussion [2024]

  • Thread starter Deleted member 2197
  • Start date
Thanks! Before anyone else feels stupid and asks it's under "Settings" in the app. (Yup, still getting used to nVidia again)
Even those of us who've used NVIDIA for decades are getting used to NVIDIA again. The NVCP definitely had problems (was bizarrely slow sometimes) but it was really nice having all the settings commited to muscle memory.

I've already discovered some weird behavior in the NV App. It's hard to say if it's intentional but certain combinations of features will cause them all to stop working etc. Overall it's an improvement but even with the slowness I mostly thought the NVCP was fine (and familiar).
 
I'm currently listening to "The Nvidia Way" on audiobook. Very fascinating insight so far, esp the pre-Nvidia years and what it's cofounders were doing that led up to Nvidia's formation and NV1.
 
Apple and NVIDIA are collaborating to improve LLM performance on NVIDIA GPUs.

In addition to ongoing efforts to accelerate inference on Apple silicon, we have recently made significant progress in accelerating LLM inference for the NVIDIA GPUs widely used for production applications across the industry.


 
It's closing in on two decades now. Almost as soon as it was announced in '06, there were university programs teaching it. Wen-mei Hwu and co at UIUC are the standout example, releasing all of the course material for free back in '07 or '08 if I remember rightly. Nvidia opened a highly regarded teaching centre back then, run by Hwu, and he literally co-wrote the book on CUDA with Dave Kirk.

I was invited to the announcement, which included a dinner with Jensen and Dave Kirk. It was crystal clear that teaching it had been part of their ecosystem development plan right from the internal inception of it all, with academia as close collaborators even before it was public.

Nvidia's biggest strength has always been their incredible ability to foster and develop rich and compelling software ecosystems that are deeply intertwined with, and not just sat loosely on top of, their hardware.
 
It's closing in on two decades now. Almost as soon as it was announced in '06, there were university programs teaching it. Wen-mei Hwu and co at UIUC are the standout example, releasing all of the course material for free back in '07 or '08 if I remember rightly. Nvidia opened a highly regarded teaching centre back then, run by Hwu, and he literally co-wrote the book on CUDA with Dave Kirk.

I was invited to the announcement, which included a dinner with Jensen and Dave Kirk. It was crystal clear that teaching it had been part of their ecosystem development plan right from the internal inception of it all, with academia as close collaborators even before it was public.

Nvidia's biggest strength has always been their incredible ability to foster and develop rich and compelling software ecosystems that are deeply intertwined with, and not just sat loosely on top of, their hardware.
Dave Kirk, there is a name I havn't heard in a long time.
I remember reading an interview with him right before the G80 launch.

(just did a quick google, and this popped up):
 
NVIDIA has intentionally reduced the performance of the RTX 4090 by blowing an eFuse on the AD102 die, halving the FP16 with FP32 accumulate performance, the Quadro RTX 6000 Ada doesn't suffer from this. An obvious market segmentation move.

""NVIDIA is so far ahead that all the 4090s are nerfed to half speed.

There's an eFuse blown on the AD102 die to halve the perf of FP16 with FP32 accumulate. RTX 6000 Ada is the same die w/o the blown fuse.

This is not binning, it's segmentation. Wish there was competition""

-George Hortz


 
Last edited:
NVIDIA has intentionally reduced the performance of the RTX 4090 by blowing an eFuse on the AD102 die, halving the FP16 with FP32 accumulate performance, the Quadro RTX 6000 Ada doesn't suffer from this. An obvious market segmentation move.


Some people in that thread think it should be illegal to disable features on a die. Some people are incredibly stupid.
 
NVIDIA has intentionally reduced the performance of the RTX 4090 by blowing an eFuse on the AD102 die, halving the FP16 with FP32 accumulate performance, the Quadro RTX 6000 Ada doesn't suffer from this. An obvious market segmentation move.

""NVIDIA is so far ahead that all the 4090s are nerfed to half speed.

There's an eFuse blown on the AD102 die to halve the perf of FP16 with FP32 accumulate. RTX 6000 Ada is the same die w/o the blown fuse.

This is not binning, it's segmentation. Wish there was competition""

-George Hortz



Since 4090 is a gaming card and such performance has nothing to do with games I don't see where's the problem.
 
As a 4090 owner I couldn't care less, since it does nothing for my gaming 🤷‍♂️
(I got this card for gaming, hint-hint)

I bet most people that cry out would never buy a 4090 anyways.
 
I got the impression some people on the thread do think it impacts gaming. Otherwise yeah it makes no sense.

It's tensor core performance, which is not exposed in DirectX at all. So I don't see how it could impact gaming, other than maybe DLSS. But obvious DLSS does not need anywhere near that performance, so it's a moot point.
 
NVIDIA has intentionally reduced the performance of the RTX 4090 by blowing an eFuse on the AD102 die, halving the FP16 with FP32 accumulate performance, the Quadro RTX 6000 Ada doesn't suffer from this. An obvious market segmentation move.

""NVIDIA is so far ahead that all the 4090s are nerfed to half speed.

There's an eFuse blown on the AD102 die to halve the perf of FP16 with FP32 accumulate. RTX 6000 Ada is the same die w/o the blown fuse.

This is not binning, it's segmentation. Wish there was competition""

-George Hortz


nVidia is gimping TensorCore Performance since Ampere. This is nothing new and four years to late...
 
It's tensor core performance, which is not exposed in DirectX at all. So I don't see how it could impact gaming, other than maybe DLSS. But obvious DLSS does not need anywhere near that performance, so it's a moot point.
FP16 is needed for training mostly, INT8 should be enough for inferencing. I do wonder if that's also halved though.
 
Since 4090 is a gaming card and such performance has nothing to do with games I don't see where's the problem.

The product segmentation debate aside Nvidia has not been marketing their consumer cards as solely for gaming for quite some time now and plenty of people buy them for non gaming purposes.

ultimate experience for gamers and creators.


But in general I don't agree with the the general dismissal in that Geforce cards are just strictly for gaming as Nvidia themselves specifically market them and support for non gaming usage and have consumer customers buying them for non gaming usage as well.


Including for AI specifically -

 
Which software for creators need tensor cores with FP32 accumulation? I don't know, but probably none.
And as already mentioned, when doing inference people tend to use FP8/INT8 instead of FP16. If you need FP16, 24GB is likely not enough anyway.
To be honest, GeForce is gaming oriented. It might be able to perform well in professional works, but the professional product line exists for a reason. If you really need the capability, just buy the products with those capabilities. I also want to point out one important distinction: the function is there, it's just slower. So if you are a poor student who just want to experiment with these functions, you can do that. It's just slower. I think this is a good balance.
 
The product segmentation debate aside Nvidia has not been marketing their consumer cards as solely for gaming for quite some time now and plenty of people buy them for non gaming purposes.




But in general I don't agree with the the general dismissal in that Geforce cards are just strictly for gaming as Nvidia themselves specifically market them and support for non gaming usage and have consumer customers buying them for non gaming usage as well.


Including for AI specifically -

Only non-gamer things I use my 4090 for is rendering videos and noise-cancelation in videocalls (outside driving my display).
The "outcry" is false in my view, mostly done by people not buying 4090 anyways I am willing to bet.

Can you show me where this affects anyone, outside benchmarks?
 
And even in it's slower state, it's faster than any consumer hardware out there. That was the point of the original tweet anyway. Competition is needed so that this segmentation is not a regular occurence.
It has been like that since the 486SX/DX (The DX "upgrade" chip wasn't a real chip, it was just a "bypass"), it is just recently it has become FOMO to try and change that.
Again a false outcry form people not buying 4090's themselfes.

Show me where this affects me outside benchmarks?
 
Back
Top