NVIDIA discussion [2024]

DavidGraham · Dec 21, 2024

Cloud customers are sticking with NVIDIA GPUs for inferencing as they like the CUDA stack, so Microsoft is struggling to keep up with demand (Tegus).

Cloud customers are sticking with $NVDA's GPUs for inferencing as they like the CUDA stack, so $MSFT is struggling to keep up with demand (Tegus): pic.twitter.com/yp7vd78SEO
— Tech Fund (@techfund1) December 21, 2024

trinibwoy · Dec 21, 2024

That software moat is crazy. I wonder why Microsoft didn’t take that into consideration with their projections. There must be a non negligible cost to not use CUDA. Ease of use, time to market etc.

IQandHDR · Dec 21, 2024

CUDA is entrenched, if you want to take on NVIDIA you need both the hardware and an API/software stack combined.

The competition is at least a decade behind.

Albuquerque · Dec 21, 2024

Talking about "non-Gaming" uses: my 4090 operates 24/7 processing Folding at Home workloads. Even when I'm playing games, I'll still let it fold in the background -- and it will continue cranking out some measure of PPD while doing so. F@H is almost all FP32, with several FP64 steps depending on the workload. The RTX 4090 ranks as the highest PPD generating card on the leaderboard, with the entire top 20 consisting of all NVIDIA 4000 and 3000 cards. The Radeon 7900XTX finally shows up at position 21: https://folding.lar.systems/gpu_ppd/overall_ranks

Despite being the very fastest card in the pack, the 4090 is also the 6th most power efficient per PPD card in the leaderboard, and moves to 3rd most efficient power-per-PPD card if we discount the NVIDIA "Max-Q" mobile chips which own the top three spots. The Radeon series doesn't show up on this list until position 38 : https://folding.lar.systems/gpu_ppd/overall_ranks_power_to_ppd

Looks like non-gaming workloads seem to be doing just fine for the entire RTX 3000 and RTX 4000 series.

arandomguy · Dec 23, 2024

pcchen said:
Which software for creators need tensor cores with FP32 accumulation? I don't know, but probably none.
And as already mentioned, when doing inference people tend to use FP8/INT8 instead of FP16. If you need FP16, 24GB is likely not enough anyway.
To be honest, GeForce is gaming oriented. It might be able to perform well in professional works, but the professional product line exists for a reason. If you really need the capability, just buy the products with those capabilities. I also want to point out one important distinction: the function is there, it's just slower. So if you are a poor student who just want to experiment with these functions, you can do that. It's just slower. I think this is a good balance.

I specifically stated up front I am not looking to debate product segmentation in general or this specific decision but am solely focusing on the idea that -

pcchen said:
Since 4090 is a gaming card and such performance has nothing to do with games I don't see where's the problem.

Or that "geforce is for gaming". This has not been the case with which Nvidia has been approaching the product line nor how a significant segment of the consumer base has been purchasing them. This specific issue isn't the only time that dismissive response gets brought up (I don't mean these forums specifically either, but in general).

I also want to dispel this idea that anything non gaming equals professional usage which seems to be another lingering common sentiment. Non gaming usage can very much be non professional these days and has been for awhile. It's also not of just interest to students either outside of professional use cases.

IQandHDR · Dec 24, 2024

arandomguy said:
I specifically stated up front I am not looking to debate product segmentation in general or this specific decision but am solely focusing on the idea that -

Or that "geforce is for gaming". This has not been the case with which Nvidia has been approaching the product line nor how a significant segment of the consumer base has been purchasing them. This specific issue isn't the only time that dismissive response gets brought up (I don't mean these forums specifically either, but in general).

I also want to dispel this idea that anything non gaming equals professional usage which seems to be another lingering common sentiment. Non gaming usage can very much be non professional these days and has been for awhile. It's also not of just interest to students either outside of professional use cases.

So where does this affect me outside benchmarks?

DegustatoR · Dec 24, 2024

arandomguy said:
Or that "geforce is for gaming". This has not been the case with which Nvidia has been approaching the product line nor how a significant segment of the consumer base has been purchasing them. This specific issue isn't the only time that dismissive response gets brought up (I don't mean these forums specifically either, but in general).

I disagree. "Geforce" is a gaming product line. There were non-gaming products which were also called "Geforce" but they were also called "Titans" - or even just called "Titans" at some points - to differentiate them from just "Geforce". People can and do buy Geforce cards for non-gaming applications but this is a "bonus" functionality which isn't promoted or specced by Nvidia in case of "Geforce" SKUs. So complaining that some of them are cut down in comparison to SKUs which are being sold and promoted as AI/proviz parts is disingenuous. If you care about that then you shouldn't have opted for a Geforce and you did so because it is cheaper - well, that's the trade you've made, not Nvidia done something to what you've bought.

And you can substitute "Geforce" here with any other gaming brand really.

iroboto · Dec 24, 2024

The Jetson line is much cheaper to get in DS work. But I mean, the XX90 cards are still very useful as hybrid cards for kaggle competitions where training speed does indeed matter if you are trying to win and of course to play games at high fidelity.

Rys · Dec 25, 2024

@trinibwoy posted this link in another thread, and something stood out to me on the 4090 perf/$ results:

AMD RDNA4 potential product value

Has there been recent info regarding rdna4 performance? AMD has said multiple times mid-range. 7900xtx level performance is not mid-range.

forum.beyond3d.com

It costs about $900 more than it should relative to its realised gaming performance, if you assume their last flagship sets its relative perf/$ bar. I’m personally ambivalent as to whether that assumption holds since I’m not the target customer.

In general, I’m not someone who thinks consumer electronics should be sold for BOM + a small margin. You should make what you can convince your customers it’s worth, and so it’s very impressive to me what they’ve convinced the market what the broad value of one is, if TPU’s testing is representative.

No comment on whether it is representative, I haven’t dug in past the graphs and the metric is very simplistic, but if you believe it is then you have to say that 4090 is clearly a Veblen good and they’re probably leaving money on the table.

Lurkmass · Dec 25, 2024

trinibwoy said:
That software moat is crazy. I wonder why Microsoft didn’t take that into consideration with their projections. There must be a non negligible cost to not use CUDA. Ease of use, time to market etc.

Does the other inherent qualities of a software stack really matter much in the case of inferencing large transformer models outside of the fact that users mostly rely on it being well tested against any minefields that were stepped on in the past ?

pcchen · Dec 25, 2024

Lurkmass said:
Does the other inherent qualities of a software stack really matter much in the case of inferencing large transformer models outside of the fact that users mostly rely on it being well tested against any minefields that were stepped on in the past ?

I don't know the exact situations about large models, but my experience in other fields tells me that this alone is probably already very hard, especially when a field is in the experiment stage.
If a field is mature enough, most people will be using a few well developed software, so problems with the underlying toolkits would be smaller, because the vendors can just work with the best and most commonly used software developers and solve most of the problems. However, when a field is still growing, the software situation would be more fragmented and some people may even have to develop their own software. In this case, it's more likely people will be stepping on new minefields every day. Thus, the stabliity of the underlying toolkits is very important. It's not just about bugs, also it needs to be stable with as few surprises as possible.

I think CUDA benefit from the fact that it was developed long time ago with a lot of users and have time to become very mature and stable. So it can handle the coming AI mania relatively easy. On the other hand, many other AI vendors just enter this market after this AI mania, so their toolkits do not have the time to become mature and stable.

From what I've heard, Google's TPU also have relatively good software stacks, and they also developed for quite a long time, although they don't have a lot of users. In AMD's case, it's unfortunate that they didn't spend enough resources on this when GPGPU became a thing. Their main tools are all developed relatively recently, so that's to be expected. One day when the AI software market becomes mature, I think AMD will be able to catch up, but of course at that time the profit margins won't be so pretty anymore.

trinibwoy · Dec 25, 2024

Rys said:
@trinibwoy posted this link in another thread, and something stood out to me on the 4090 perf/$ results:

AMD RDNA4 potential product value

Has there been recent info regarding rdna4 performance? AMD has said multiple times mid-range. 7900xtx level performance is not mid-range.

forum.beyond3d.com

It costs about $900 more than it should relative to its realised gaming performance, if you assume their last flagship sets its relative perf/$ bar. I’m personally ambivalent as to whether that assumption holds since I’m not the target customer.

In general, I’m not someone who thinks consumer electronics should be sold for BOM + a small margin. You should make what you can convince your customers it’s worth, and so it’s very impressive to me what they’ve convinced the market what the broad value of one is, if TPU’s testing is representative.

No comment on whether it is representative, I haven’t dug in past the graphs and the metric is very simplistic, but if you believe it is then you have to say that 4090 is clearly a Veblen good and they’re probably leaving money on the table.

I’m not sure the $2400 price tag on the 4090 is accurate but people certainly seem very willing to accept poor performance/$ to have the very best. The 2nd tier card isn’t getting the same benefit.

IQandHDR · Dec 25, 2024

Performance/$ was not a facotr for me when getting my 4090.
It was the fastest (RT) GPU on the market.
I doubt a lot of 4090 owners care about performance/$ that is more for the (if I can use that word here) low end SKU owners (eg. 4060)

DavidGraham · Dec 25, 2024

NVIDIA is fast tracking B300, a hefty upgrade to B200.

NVIDIA's B300 GPU is a brand-new tape out on the TSMC 4NP process node tweaked to deliver 50% higher FLOPS versus the B200.

-200W additional TDP.

-Upgrade to 12-Hi HBM3E from 8-Hi growing the HBM capacity per GPU to 288GB.

Nvidia’s #B300 #GPU is a brand-new tape out on the TSMC 4NP process node tweaked to deliver 50% higher FLOPS versus the B200

o 200W additional TDP

o Upgrade to 12-Hi HBM3E from 8-Hi growing the #HBM capacity per GPU to 288GB#AI #HPC via @dylan522p https://t.co/OJs68Ioj3v
— HPC Guru (on an extended break) (@HPC_Guru) December 25, 2024

Lurkmass · Dec 25, 2024

pcchen said:
I don't know the exact situations about large models, but my experience in other fields tells me that this alone is probably already very hard, especially when a field is in the experiment stage.
If a field is mature enough, most people will be using a few well developed software, so problems with the underlying toolkits would be smaller, because the vendors can just work with the best and most commonly used software developers and solve most of the problems. However, when a field is still growing, the software situation would be more fragmented and some people may even have to develop their own software. In this case, it's more likely people will be stepping on new minefields every day. Thus, the stabliity of the underlying toolkits is very important. It's not just about bugs, also it needs to be stable with as few surprises as possible.

I think CUDA benefit from the fact that it was developed long time ago with a lot of users and have time to become very mature and stable. So it can handle the coming AI mania relatively easy. On the other hand, many other AI vendors just enter this market after this AI mania, so their toolkits do not have the time to become mature and stable.

From what I've heard, Google's TPU also have relatively good software stacks, and they also developed for quite a long time, although they don't have a lot of users. In AMD's case, it's unfortunate that they didn't spend enough resources on this when GPGPU became a thing. Their main tools are all developed relatively recently, so that's to be expected. One day when the AI software market becomes mature, I think AMD will be able to catch up, but of course at that time the profit margins won't be so pretty anymore.

I'm not underplaying the importance of a software platform's stability but I am somewhat dubious of the assessment that the other features behind the platform itself lead to any tangible anciliary advantages in the case of AI inferencing ...

I presume that the reason why CUDA is currently dominant in AI is purely down to the fact that there's no percieved benefit (performance/cost/etc or otherwise) for the opportunity cost of discovering those minefields with more specialized or fragmented platforms ...

Also if your implication/belief that growth in AI will come to a halt or even decline before the other players like AMD will be able to profit off from it then where does that leave the very leader (Nvidia) of the field itself whose firm is clearly the most exposed to any deflationary phenomenon in that sector ?

DavidGraham · Dec 25, 2024

For what it's worth, Dylan Patel (from SemiAnalysis) thinks AMD will have a lot less success with Microsoft and Meta in 2025 (timestamped). There was also this news that AMD trimmed down some CoWoS capacity due to uncertainty around MI325X demand. The whole podcast with Dylan is very good and covers everything AI related, from TPUs, to GPUs to custom hardware to large clusters to the workloads/models themselves.

(04:18) Google's AI Workload
(06:58) NVIDIA's Edge
(10:59) NVIDIA's Incremental Differentiation
(13:12) Potential Vulnerabilities for NVIDIA
(17:18) The Shift to GPUs: What It Means for Data Centers
(22:29) AI Pre-training Scaling Challenges
(29:43) If Pretraining Is Dead, Why Bigger Clusters?
(34:00) Synthetic Data Generation
(36:26) Hyperscaler CapEx
(38:12) Pre-training and Inference-tIme Reasoning
(41:00) Cisco Comparison to NVIDIA
(44:11) Inference-time Compute
(53:18) The Future of AI Models and Market Dynamics
(01:00:58) Evolving Memory Technology
(01:06:46) Chip Competition
(01:07:18) AMD
(01:10:35) Google’s TPU
(01:14:56) Cerebras and Grok
(01:14:51) Amason’s Tranium
(01:17:33) Predictions for 2025 and 2026

pcchen · Dec 25, 2024

Lurkmass said:
I'm not underplaying the importance of a software platform's stability but I am somewhat dubious of the assessment that the other features behind the platform itself lead to any tangible anciliary advantages in the case of AI inferencing ...

I presume that the reason why CUDA is currently dominant in AI is purely down to the fact that there's no percieved benefit (performance/cost/etc or otherwise) for the opportunity cost of discovering those minefields with more specialized or fragmented platforms ...

Yeah, it's quite possible that the stability of CUDA alone is enough for many people to select NVIDIA solutions. Although many analysts did conjecture that since it's very difficult (not to mention expensive) to get NVIDIA GPU there's a big incentive for companies to look for alternative inference solutions. There were also many AI chip companies tried to go into this field but most failed. So it's definitely not because of lack of efforts. Therefore I suspect that most of these solutions (except maybe Google's TPU) are probably not up to snuff yet for various reasons.

Lurkmass said:
Also if your implication/belief that growth in AI will come to a halt or even decline before the other players like AMD will be able to profit off from it then where does that leave the very leader (Nvidia) of the field itself whose firm is clearly the most exposed to any deflationary phenomenon in that sector ?

Well I don't know if growth in AI will come to a halt or decline soon but other vendors will eventually have mature enough solutions. It'll probably happen first in the "edge" market where power efficiency and cost are paramount, and something like NPU could be the first to breakout. I suspect that NVIDIA will be able to keep their lead in the training market for a bit longer. However, there are also analysts who believe that NVIDIA will be able to keep their lead in the inference market, especially for those large service providers such as OpenAI. The question I think is for how long.

To look further into the future (which is of course a foolish endeavor

) I believe even after this AI mania people still need computation power and CUDA is probably still going to be the go to solution. In a way, this AI mania is actually a god send for those who were behind, because the market can be large enough for them to develop their own solutions. But this takes time and I think it's very important that these companies shouldn't abandon their developments just because they can't compete right now. Otherwise it'll just be the same when the next big thing requiring a lot of computation power comes.

Lurkmass · Dec 26, 2024

pcchen said:
Yeah, it's quite possible that the stability of CUDA alone is enough for many people to select NVIDIA solutions. Although many analysts did conjecture that since it's very difficult (not to mention expensive) to get NVIDIA GPU there's a big incentive for companies to look for alternative inference solutions. There were also many AI chip companies tried to go into this field but most failed. So it's definitely not because of lack of efforts. Therefore I suspect that most of these solutions (except maybe Google's TPU) are probably not up to snuff yet for various reasons.

I think that there are other major factors that are out of control from AI HW vendors such as the (lack of) choice for suppliers between leading edge IC manufacturers (only TSMC is seen as reliable in this regard) and high performance memory module producers (the usual triopoly) so there's not a whole lot of room for competitors to differentiate in terms of cost structure or solutions and they're ultimately left burning resources to get their software up to par ...

pcchen said:
To look further into the future (which is of course a foolish endeavor ) I believe even after this AI mania people still need computation power and CUDA is probably still going to be the go to solution. In a way, this AI mania is actually a god send for those who were behind, because the market can be large enough for them to develop their own solutions. But this takes time and I think it's very important that these companies shouldn't abandon their developments just because they can't compete right now. Otherwise it'll just be the same when the next big thing requiring a lot of computation power comes.

Just as predicting the future is a dangerous game, how is it any more reasonable from that perspective to see much higher potential growth in computational power (let alone GPU compute specifically) as a hedge against AI especially with the looming "post-silicon era" (possibiliy as early as the start of next decade) where transistor area scaling will plateau ?

TopSpoiler · Dec 26, 2024

Perhaps Nvidia's post-GPU bet will be on quantum computing, and they will try to take momentum from the CUDA ecosystem.

TopSpoiler · Dec 27, 2024

A great interview with the author of the "The Nvidia Way" here:

Nvidia's Unique History and Culture

An interview with Tae Kim, author of the "Nvidia Way"

www.asianometry.com

NVIDIA discussion [2024]

DavidGraham

trinibwoy

Meh

IQandHDR

Albuquerque

Red-headed step child

arandomguy

IQandHDR

DegustatoR

iroboto

Daft Funk

Rys

Graphics @ AMD

AMD RDNA4 potential product value

Lurkmass

pcchen

Moderator

trinibwoy

Meh

AMD RDNA4 potential product value

IQandHDR

DavidGraham

Lurkmass

DavidGraham

pcchen

Moderator

Lurkmass

TopSpoiler

TopSpoiler

Nvidia's Unique History and Culture

Similar threads