NVIDIA discussion [2024]

  • Thread starter Deleted member 2197
  • Start date
In the Korean retail market.
the RTX 4060 GPU holds the biggest share, accounting for 27.7% of the total sales during April (2024). This is slightly below its peak of 28.93% a month prior. The NVIDIA GeForce RTX 4060 Ti takes up the second spot with a sales share of 21.63% which is also lower than the 23.01% share it achieved a month prior. RTX 4070 SUPER 14.86%. The NVIDIA GeForce RTX 4070 Ti SUPER with a 6.26% sales share and the RTX 4080 SUPER with a 4.77%.

 
Dell's new PowerEdge XE9680L server has support for up to 8 x liquid-cooled NVIDIA Blackwell AI GPUs, while Dell also offers businesses to rapidly scale deployment through a turnkey solution, with future versions of Dell's XE9680L servers to support an insane 72 x NVIDIA AI GPUs in a single rack.

Dell's new AI factory will be using hardware and software from a variety of US-based companies, including networking products frorm Broadcom, PCs powered by Qualcomm Snapdragon processors, running Microsoft software, and a partnership wtih Meta on deploying Llama 3 and other business solutions that use Microsoft's Azure cloud computing platform.
 
The process hasn't been confirmed yet.

Though I do believe there was some shipping manifest Nintendo obsessives found confirming both the ram speed and the process to be Samsung's 7nm (the newest tool compatible variant of their 8nm)

Is it old, and a bit silly? Yes. Is it the Nintendo way, which will almost certainly be successful anyway, also yes.
 
Though I do believe there was some shipping manifest Nintendo obsessives found confirming both the ram speed and the process to be Samsung's 7nm (the newest tool compatible variant of their 8nm)

Is it old, and a bit silly? Yes. Is it the Nintendo way, which will almost certainly be successful anyway, also yes.
Can I have a source for the 7nm claims? Since I didn't see this in the Famiboards breakdown.

The main argument against it being 8 nm is the Famiboards analysis that a 12 SM part would end up hitting the voltage
floor at Switch 1 undocked power levels. So Nintendo could have saved both money and performance by building a smaller chip.
 
Sounds very exaggerated. 3x performance optimization specifically for some ONNX LLM stuff on Windows. I spend way too much time on LocalLLaMA subreddit and I don't see anyone talking about running LLM's in ONNX format.

Most RTX GPU home users are running llama.cpp or exllamav2 quants and the new driver doesn't boost performance there (or if it does, it's a very small boost). AI image generation stuff also falls under the category of "AI performance" but definitely no 3x performance boost. Not sure who Wccftech is doing a favor to by parroting Nvidia hype or even further inflating it.
 
From the image in the article it's pretty clear the performance improvement is more like 50% ~ 70% for int4 quants. 3X is probably from comparing int4 vs FP16 which is of course not very reasonable.

[EDIT] I think I found a possible source of this "3X" claim:


It compares the speed between ONNX runtime and CUDA and with batch size = 16 and 4096 tokens it's 3.42X as fast as using CUDA on an A100 80GB.
 
Last edited:
Raja on the CUDA moat and competitors.

"the programming model and the execution models of massively parallel/accelerated systems are tied at the hip (didn't intend to get RocM fans excited 😉).

Until you match the execution model of a CUDA GPU, it is frustrating experience to get performance. My scars on the back are screaming ..OpenCL, RocM, OneAPI....

triton is old news. if anything triton helped reduce friction to access CUDA, and is a boon for nVidia. One can argue that CUDA stickiness grew since mid-2021, not reduced - kind of validated by Nvidia's market growth since 2021 as well.

Software standards don't solve CUDA problem.

Simple, scalable, open and robust parallel HW architecture is the foundation needed to build a new software stack. There are some good attempts in the startup eco-system, but way too fragmented.

Each small team is trying to build the entire stack from chiplets through systems through compilers and runtimes and cloud orchestration layers as well, and failing

There is great opportunity for entrepreuners with deep pockets, likes of @a16z @vkhosla @elonmus to orchestrate 5-6 startups to collaborate to build a whole new stack"

 
Last edited:
I think I found a possible source of this "3X" claim:
It's most likely from this Nvidia marketing post from yesterday (Nvidia probably sends this kind of material directly to press too?):


Nvidia at least makes the silly 3x claim in use case specific context only while Wccftech hypes 3x twice before specifying the niche use case.

3X is probably from comparing int4 vs FP16 which is of course not very reasonable.
True, the old driver is actually faster! Nvidia RTX AI Performance reduced by 30%!! (old driver int4, new driver fp16) :giggle:
 
They guided for 28 billion in Q2.
Internal target is over 110 billion for this year... in DC revenue alone (Q1 / Q2 / Q3 / Q4 of 22B / 26B / 30B / 34B)
And next year will reach the sky with Grace-Blackwell sales that are already fully booked until end of 2025. I predict a stock value of 1500 dollars (=150 after the split) by end of 2025
 
Back
Top