NVIDIA discussion [2024]

homerdog · Dec 17, 2024

digitalwanderer said:
Thanks! Before anyone else feels stupid and asks it's under "Settings" in the app. (Yup, still getting used to nVidia again)

Even those of us who've used NVIDIA for decades are getting used to NVIDIA again. The NVCP definitely had problems (was bizarrely slow sometimes) but it was really nice having all the settings commited to muscle memory.

I've already discovered some weird behavior in the NV App. It's hard to say if it's intentional but certain combinations of features will cause them all to stop working etc. Overall it's an improvement but even with the slowness I mostly thought the NVCP was fine (and familiar).

Mobius1aic · Dec 18, 2024

I'm currently listening to "The Nvidia Way" on audiobook. Very fascinating insight so far, esp the pre-Nvidia years and what it's cofounders were doing that led up to Nvidia's formation and NV1.

DavidGraham · Dec 19, 2024

Apple and NVIDIA are collaborating to improve LLM performance on NVIDIA GPUs.

In addition to ongoing efforts to accelerate inference on Apple silicon, we have recently made significant progress in accelerating LLM inference for the NVIDIA GPUs widely used for production applications across the industry.

Accelerating LLM Inference on NVIDIA GPUs with ReDrafter

Accelerating LLM inference is an important ML research problem, as auto-regressive token generation is computationally expensive and…

machinelearning.apple.com

NVIDIA TensorRT-LLM Now Supports Recurrent Drafting for Optimizing LLM Inference | NVIDIA Technical Blog

Recurrent drafting (referred as ReDrafter) is a novel speculative decoding technique developed and open-sourced by Apple for large language model (LLM) inference now available with NVIDIA TensorRT-LLM.

developer.nvidia.com

IQandHDR · Dec 19, 2024

Hard to ignore them, last I saw NVIDIA had a 92% market share on LLM/ML.
They made a smart move a decade ago by pushing CUDA into universities.

Rys · Dec 19, 2024

It's closing in on two decades now. Almost as soon as it was announced in '06, there were university programs teaching it. Wen-mei Hwu and co at UIUC are the standout example, releasing all of the course material for free back in '07 or '08 if I remember rightly. Nvidia opened a highly regarded teaching centre back then, run by Hwu, and he literally co-wrote the book on CUDA with Dave Kirk.

I was invited to the announcement, which included a dinner with Jensen and Dave Kirk. It was crystal clear that teaching it had been part of their ecosystem development plan right from the internal inception of it all, with academia as close collaborators even before it was public.

Nvidia's biggest strength has always been their incredible ability to foster and develop rich and compelling software ecosystems that are deeply intertwined with, and not just sat loosely on top of, their hardware.

IQandHDR · Dec 20, 2024

Rys said:
It's closing in on two decades now. Almost as soon as it was announced in '06, there were university programs teaching it. Wen-mei Hwu and co at UIUC are the standout example, releasing all of the course material for free back in '07 or '08 if I remember rightly. Nvidia opened a highly regarded teaching centre back then, run by Hwu, and he literally co-wrote the book on CUDA with Dave Kirk.

I was invited to the announcement, which included a dinner with Jensen and Dave Kirk. It was crystal clear that teaching it had been part of their ecosystem development plan right from the internal inception of it all, with academia as close collaborators even before it was public.

Nvidia's biggest strength has always been their incredible ability to foster and develop rich and compelling software ecosystems that are deeply intertwined with, and not just sat loosely on top of, their hardware.

Dave Kirk, there is a name I havn't heard in a long time.
I remember reading an interview with him right before the G80 launch.

(just did a quick google, and this popped up):

David Kirk of NVIDIA talks about Unified-Shader (Goto's article @ PC Watch)

After an article about DirectX 10/Common Shader Model and Unified-Shader GPUs by ATI and S3, Hiroshige Goto covers the position of NVIDIA which seems reluctant to go US architecture. http://pc.watch.impress.co.jp/docs/2006/0419/kaigai262.htm This article contains how David Kirk of NVIDIA...

forum.beyond3d.com

DavidGraham · Dec 20, 2024

NVIDIA has intentionally reduced the performance of the RTX 4090 by blowing an eFuse on the AD102 die, halving the FP16 with FP32 accumulate performance, the Quadro RTX 6000 Ada doesn't suffer from this. An obvious market segmentation move.

""NVIDIA is so far ahead that all the 4090s are nerfed to half speed.

There's an eFuse blown on the AD102 die to halve the perf of FP16 with FP32 accumulate. RTX 6000 Ada is the same die w/o the blown fuse.

This is not binning, it's segmentation. Wish there was competition""

-George Hortz

https://twitter.com/x/status/1868356459542770087

trinibwoy · Dec 20, 2024

DavidGraham said:
NVIDIA has intentionally reduced the performance of the RTX 4090 by blowing an eFuse on the AD102 die, halving the FP16 with FP32 accumulate performance, the Quadro RTX 6000 Ada doesn't suffer from this. An obvious market segmentation move.

https://twitter.com/x/status/1868356459542770087

Some people in that thread think it should be illegal to disable features on a die. Some people are incredibly stupid.

pcchen · Dec 20, 2024

DavidGraham said:
NVIDIA has intentionally reduced the performance of the RTX 4090 by blowing an eFuse on the AD102 die, halving the FP16 with FP32 accumulate performance, the Quadro RTX 6000 Ada doesn't suffer from this. An obvious market segmentation move.

""NVIDIA is so far ahead that all the 4090s are nerfed to half speed.

There's an eFuse blown on the AD102 die to halve the perf of FP16 with FP32 accumulate. RTX 6000 Ada is the same die w/o the blown fuse.

This is not binning, it's segmentation. Wish there was competition""

-George Hortz

https://twitter.com/x/status/1868356459542770087

Since 4090 is a gaming card and such performance has nothing to do with games I don't see where's the problem.

IQandHDR · Dec 20, 2024

As a 4090 owner I couldn't care less, since it does nothing for my gaming

(I got this card for gaming, hint-hint)

I bet most people that cry out would never buy a 4090 anyways.

trinibwoy · Dec 20, 2024

pcchen said:
Since 4090 is a gaming card and such performance has nothing to do with games I don't see where's the problem.

I got the impression some people on the thread do think it impacts gaming. Otherwise yeah it makes no sense.

pcchen · Dec 20, 2024

trinibwoy said:
I got the impression some people on the thread do think it impacts gaming. Otherwise yeah it makes no sense.

It's tensor core performance, which is not exposed in DirectX at all. So I don't see how it could impact gaming, other than maybe DLSS. But obvious DLSS does not need anywhere near that performance, so it's a moot point.

troyan · Dec 20, 2024

DavidGraham said:
NVIDIA has intentionally reduced the performance of the RTX 4090 by blowing an eFuse on the AD102 die, halving the FP16 with FP32 accumulate performance, the Quadro RTX 6000 Ada doesn't suffer from this. An obvious market segmentation move.

""NVIDIA is so far ahead that all the 4090s are nerfed to half speed.

There's an eFuse blown on the AD102 die to halve the perf of FP16 with FP32 accumulate. RTX 6000 Ada is the same die w/o the blown fuse.

This is not binning, it's segmentation. Wish there was competition""

-George Hortz

https://twitter.com/x/status/1868356459542770087

nVidia is gimping TensorCore Performance since Ampere. This is nothing new and four years to late...

DegustatoR · Dec 20, 2024

pcchen said:
It's tensor core performance, which is not exposed in DirectX at all. So I don't see how it could impact gaming, other than maybe DLSS. But obvious DLSS does not need anywhere near that performance, so it's a moot point.

FP16 is needed for training mostly, INT8 should be enough for inferencing. I do wonder if that's also halved though.

troyan · Dec 20, 2024

Integer-Performance is full speed: https://images.nvidia.com/aem-dam/S...idia-ada-gpu-architecture-whitepaper-v2.1.pdf

arandomguy · Dec 21, 2024

pcchen said:
Since 4090 is a gaming card and such performance has nothing to do with games I don't see where's the problem.

The product segmentation debate aside Nvidia has not been marketing their consumer cards as solely for gaming for quite some time now and plenty of people buy them for non gaming purposes.

ultimate experience for gamers and creators.

NVIDIA GeForce RTX 4090 Graphics Cards

For gamers and creators.

www.nvidia.com

But in general I don't agree with the the general dismissal in that Geforce cards are just strictly for gaming as Nvidia themselves specifically market them and support for non gaming usage and have consumer customers buying them for non gaming usage as well.

https://www.nvidia.com/en-us/studio/

Including for AI specifically -

RTX AI PCs | Next-Level AI Performance

Unparalleled AI performance in gaming, creativity, and everyday.

www.nvidia.com

pcchen · Dec 21, 2024

Which software for creators need tensor cores with FP32 accumulation? I don't know, but probably none.
And as already mentioned, when doing inference people tend to use FP8/INT8 instead of FP16. If you need FP16, 24GB is likely not enough anyway.
To be honest, GeForce is gaming oriented. It might be able to perform well in professional works, but the professional product line exists for a reason. If you really need the capability, just buy the products with those capabilities. I also want to point out one important distinction: the function is there, it's just slower. So if you are a poor student who just want to experiment with these functions, you can do that. It's just slower. I think this is a good balance.

IQandHDR · Dec 21, 2024

arandomguy said:
The product segmentation debate aside Nvidia has not been marketing their consumer cards as solely for gaming for quite some time now and plenty of people buy them for non gaming purposes.

NVIDIA GeForce RTX 4090 Graphics Cards

For gamers and creators.

www.nvidia.com

But in general I don't agree with the the general dismissal in that Geforce cards are just strictly for gaming as Nvidia themselves specifically market them and support for non gaming usage and have consumer customers buying them for non gaming usage as well.

https://www.nvidia.com/en-us/studio/

Including for AI specifically -

RTX AI PCs | Next-Level AI Performance

Unparalleled AI performance in gaming, creativity, and everyday.

www.nvidia.com

Only non-gamer things I use my 4090 for is rendering videos and noise-cancelation in videocalls (outside driving my display).
The "outcry" is false in my view, mostly done by people not buying 4090 anyways I am willing to bet.

Can you show me where this affects anyone, outside benchmarks?

DavidGraham · Dec 21, 2024

pcchen said:
the function is there, it's just slower

And even in it's slower state, it's faster than any consumer hardware out there. That was the point of the original tweet anyway. Competition is needed so that this segmentation is not a regular occurence.

IQandHDR · Dec 21, 2024

DavidGraham said:
And even in it's slower state, it's faster than any consumer hardware out there. That was the point of the original tweet anyway. Competition is needed so that this segmentation is not a regular occurence.

It has been like that since the 486SX/DX (The DX "upgrade" chip wasn't a real chip, it was just a "bypass"), it is just recently it has become FOMO to try and change that.
Again a false outcry form people not buying 4090's themselfes.

Show me where this affects me outside benchmarks?

NVIDIA discussion [2024]

homerdog

donator of the year

Mobius1aic

Quo vadis?

DavidGraham

Accelerating LLM Inference on NVIDIA GPUs with ReDrafter

NVIDIA TensorRT-LLM Now Supports Recurrent Drafting for Optimizing LLM Inference | NVIDIA Technical Blog

IQandHDR

Rys

Graphics @ AMD

IQandHDR

David Kirk of NVIDIA talks about Unified-Shader (Goto's article @ PC Watch)

DavidGraham

trinibwoy

Meh

pcchen

Moderator

IQandHDR

trinibwoy

Meh

pcchen

Moderator

troyan

DegustatoR

troyan

arandomguy

NVIDIA GeForce RTX 4090 Graphics Cards

RTX AI PCs | Next-Level AI Performance

pcchen

Moderator

IQandHDR

NVIDIA GeForce RTX 4090 Graphics Cards

RTX AI PCs | Next-Level AI Performance

DavidGraham

IQandHDR

Similar threads