DavidGraham
Veteran
Cloud customers are sticking with NVIDIA GPUs for inferencing as they like the CUDA stack, so Microsoft is struggling to keep up with demand (Tegus).
Which software for creators need tensor cores with FP32 accumulation? I don't know, but probably none.
And as already mentioned, when doing inference people tend to use FP8/INT8 instead of FP16. If you need FP16, 24GB is likely not enough anyway.
To be honest, GeForce is gaming oriented. It might be able to perform well in professional works, but the professional product line exists for a reason. If you really need the capability, just buy the products with those capabilities. I also want to point out one important distinction: the function is there, it's just slower. So if you are a poor student who just want to experiment with these functions, you can do that. It's just slower. I think this is a good balance.
Since 4090 is a gaming card and such performance has nothing to do with games I don't see where's the problem.
So where does this affect me outside benchmarks?I specifically stated up front I am not looking to debate product segmentation in general or this specific decision but am solely focusing on the idea that -
Or that "geforce is for gaming". This has not been the case with which Nvidia has been approaching the product line nor how a significant segment of the consumer base has been purchasing them. This specific issue isn't the only time that dismissive response gets brought up (I don't mean these forums specifically either, but in general).
I also want to dispel this idea that anything non gaming equals professional usage which seems to be another lingering common sentiment. Non gaming usage can very much be non professional these days and has been for awhile. It's also not of just interest to students either outside of professional use cases.
I disagree. "Geforce" is a gaming product line. There were non-gaming products which were also called "Geforce" but they were also called "Titans" - or even just called "Titans" at some points - to differentiate them from just "Geforce". People can and do buy Geforce cards for non-gaming applications but this is a "bonus" functionality which isn't promoted or specced by Nvidia in case of "Geforce" SKUs. So complaining that some of them are cut down in comparison to SKUs which are being sold and promoted as AI/proviz parts is disingenuous. If you care about that then you shouldn't have opted for a Geforce and you did so because it is cheaper - well, that's the trade you've made, not Nvidia done something to what you've bought.Or that "geforce is for gaming". This has not been the case with which Nvidia has been approaching the product line nor how a significant segment of the consumer base has been purchasing them. This specific issue isn't the only time that dismissive response gets brought up (I don't mean these forums specifically either, but in general).
Does the other inherent qualities of a software stack really matter much in the case of inferencing large transformer models outside of the fact that users mostly rely on it being well tested against any minefields that were stepped on in the past ?That software moat is crazy. I wonder why Microsoft didn’t take that into consideration with their projections. There must be a non negligible cost to not use CUDA. Ease of use, time to market etc.
Does the other inherent qualities of a software stack really matter much in the case of inferencing large transformer models outside of the fact that users mostly rely on it being well tested against any minefields that were stepped on in the past ?
@trinibwoy posted this link in another thread, and something stood out to me on the 4090 perf/$ results:
AMD RDNA4 potential product value
Has there been recent info regarding rdna4 performance? AMD has said multiple times mid-range. 7900xtx level performance is not mid-range.forum.beyond3d.com
It costs about $900 more than it should relative to its realised gaming performance, if you assume their last flagship sets its relative perf/$ bar. I’m personally ambivalent as to whether that assumption holds since I’m not the target customer.
In general, I’m not someone who thinks consumer electronics should be sold for BOM + a small margin. You should make what you can convince your customers it’s worth, and so it’s very impressive to me what they’ve convinced the market what the broad value of one is, if TPU’s testing is representative.
No comment on whether it is representative, I haven’t dug in past the graphs and the metric is very simplistic, but if you believe it is then you have to say that 4090 is clearly a Veblen good and they’re probably leaving money on the table.
I'm not underplaying the importance of a software platform's stability but I am somewhat dubious of the assessment that the other features behind the platform itself lead to any tangible anciliary advantages in the case of AI inferencing ...I don't know the exact situations about large models, but my experience in other fields tells me that this alone is probably already very hard, especially when a field is in the experiment stage.
If a field is mature enough, most people will be using a few well developed software, so problems with the underlying toolkits would be smaller, because the vendors can just work with the best and most commonly used software developers and solve most of the problems. However, when a field is still growing, the software situation would be more fragmented and some people may even have to develop their own software. In this case, it's more likely people will be stepping on new minefields every day. Thus, the stabliity of the underlying toolkits is very important. It's not just about bugs, also it needs to be stable with as few surprises as possible.
I think CUDA benefit from the fact that it was developed long time ago with a lot of users and have time to become very mature and stable. So it can handle the coming AI mania relatively easy. On the other hand, many other AI vendors just enter this market after this AI mania, so their toolkits do not have the time to become mature and stable.
From what I've heard, Google's TPU also have relatively good software stacks, and they also developed for quite a long time, although they don't have a lot of users. In AMD's case, it's unfortunate that they didn't spend enough resources on this when GPGPU became a thing. Their main tools are all developed relatively recently, so that's to be expected. One day when the AI software market becomes mature, I think AMD will be able to catch up, but of course at that time the profit margins won't be so pretty anymore.
I'm not underplaying the importance of a software platform's stability but I am somewhat dubious of the assessment that the other features behind the platform itself lead to any tangible anciliary advantages in the case of AI inferencing ...
I presume that the reason why CUDA is currently dominant in AI is purely down to the fact that there's no percieved benefit (performance/cost/etc or otherwise) for the opportunity cost of discovering those minefields with more specialized or fragmented platforms ...
Also if your implication/belief that growth in AI will come to a halt or even decline before the other players like AMD will be able to profit off from it then where does that leave the very leader (Nvidia) of the field itself whose firm is clearly the most exposed to any deflationary phenomenon in that sector ?
I think that there are other major factors that are out of control from AI HW vendors such as the (lack of) choice for suppliers between leading edge IC manufacturers (only TSMC is seen as reliable in this regard) and high performance memory module producers (the usual triopoly) so there's not a whole lot of room for competitors to differentiate in terms of cost structure or solutions and they're ultimately left burning resources to get their software up to par ...Yeah, it's quite possible that the stability of CUDA alone is enough for many people to select NVIDIA solutions. Although many analysts did conjecture that since it's very difficult (not to mention expensive) to get NVIDIA GPU there's a big incentive for companies to look for alternative inference solutions. There were also many AI chip companies tried to go into this field but most failed. So it's definitely not because of lack of efforts. Therefore I suspect that most of these solutions (except maybe Google's TPU) are probably not up to snuff yet for various reasons.
Just as predicting the future is a dangerous game, how is it any more reasonable from that perspective to see much higher potential growth in computational power (let alone GPU compute specifically) as a hedge against AI especially with the looming "post-silicon era" (possibiliy as early as the start of next decade) where transistor area scaling will plateau ?To look further into the future (which is of course a foolish endeavor ) I believe even after this AI mania people still need computation power and CUDA is probably still going to be the go to solution. In a way, this AI mania is actually a god send for those who were behind, because the market can be large enough for them to develop their own solutions. But this takes time and I think it's very important that these companies shouldn't abandon their developments just because they can't compete right now. Otherwise it'll just be the same when the next big thing requiring a lot of computation power comes.