NVIDIA discussion [2024]

  • Thread starter Deleted member 2197
  • Start date
Oracle is building data centers with acres of NVIDIA GPUs around the world.

Oracle $ORCL Chairman and CTO Larry Ellison: "Oracle has 162 cloud datacenters in operation and under construction around the world. The largest of these datacenters is 800 megawatts and will contain acres of Nvidia $NVDA GPU Clusters for training large scale AI models. In Q1, 42 additional cloud GPU contracts were signed for a total of $3 billion.

 
Good read.
Acres and Acres of Nvidia GPU's. Can you imagine a field of Nvidia multi-million dollar GPU racks powering the next wave of super advanced AI? That's what Oracle just reported in a bombshell earnings report. I can't remember many earnings report where they literally flex another company's technology as the reason why they are accelerating and generating record breaking revenue but here we are.

But here is the thing that people don't get about Nvidia's revenue model for data centers and I think it needs repeating. It was something that Jensen mentioned on Nvidia's past earnings call. An analyst asked him why doesn't Nvidia make distribute their own chips directly. Jensen said no, "we work directly through are OEM/ORM's to fulfill our distribution and that's how it will always be".

But another analyst question was even more peculiar because I showed the analyst doesn't realize how Nvidia's data center business model works and I think it needs repeating. When major cloud providers, including Oracle, purchase Nvidia GPU hardware they have 2 hierarchy options of what they will do with Nvidia GPU's.

  1. They use the hardware for their own compute needs and produce an output through an offering such as an LLM, gaming, or other accelerated compute needs directly. And end product if you will.
  2. They provide GPU's as actual hardware to lease out to enterprises and businesses such as startups that want to do their own accelerated compute. The offering here is called DGX Cloud.
The second delivery method here is in fact recurring revenue in many cases. A startup doesn't have to worry about going out and buying a datacenter and install on prem Nvidia hardware when they can just lease a node directly all major cloud providers. Now here's the thing, when using Nvidia hardware you will purchase the underlying DGX platform capabilities including CUDA.
...
Why is the "G" along with the "B" so important. Imagine, all of the revenue that Nvidia has done TO DATE is solely with the H100; not even the H200/GH200 AI factory systems. lol, think about that. ALL OF THESE BILLIONS and BILLIONS OF DOLLARS have only been via the H100 chip. The H200/GH200 just recently came out so while customers are needing to purchase the H200's the real platform GH200 SuperPOD server systems probably have not even begun to take hold with a lot of anticipation for the more powerful GB200 systems.

So you see, when Jensen told that analyst that NO they won't deliver direct as a cloud vender is because they don't have to and they already are delivering as a cloud provider via stronger contractual agreements while allowing others to also profit and eat from the hardware purchase which is exactly what Oracle reported today.

Others buy the hardware and Nvidia reaps the benefit of that plus the platform instance leasing for the entire stack including software which will always be recurring revenue.

In this way, Nvidia won't have a hard landing and in fact will be one of the largest companies the world has ever seen and it already is. However, people just don't realize that Nvidia is a cloud company in it's own way. It's just doing it in a way where everyone eats at his table. It's really amazing when you think about it.

There you have it folks, acres and acres of recurring revenue through DGX Cloud and CUDA software licensing.
 
Last edited by a moderator:
NVIDIA has finally joined the Ultra Ethernet Consortium (UEC), signaling that it's growing more enthusiastic about Ethernet-based solutions for AI networking.
...
For its part, joining the UEC does not represent a change of mind, insists one NVIDIA spokesperson. NVIDIA simply decided to support the ecosystem. But the move might also demonstrate NVIDIA sees growing momentum behind Ethernet, which it also supports with its Spectrum-X brand of Ethernet networking products. Although NVIDIA controls the InfiniBand networking market, it has been hedging its bets by developing Ethernet networking in parallel with technology based on its Mellanox acquisition in 2019.
...
Many of the top Ethernet networking providers and OEMs such as Arista Networks, Cisco, and Juniper Networks got behind to UEC to build an Ethernet solution for AI infrastructure. Founding members of the UEC included AMD, Arista, Broadcom, Cisco, Eviden, HPE, Intel, Meta, Microsoft, and Oracle.
...
There is no reason to think NVIDIA is backing away from InfiniBand. Instead, it’s been focused on hedging with Ethernet all along, and its new membership in the UEC may signal additional momentum. NVIDIA is seeing a lot of success with its Ethernet products, including its Spectrum-X platform, which along with a BlueField-3 SuperNIC improves the performance of traditional Ethernet by 1.6X, NVIDIA claims.

Still, being a UEC member puts NVIDIA closer to the action, shoulder to shoulder with some of its rivals, providing not just a front-row view of the Ethernet spec creation but a way to contribute to it and control how it’s used in NVIDIA products.
 
Meta Platforms is putting the final touches on an over-100,000 Nvidia H100 server chip cluster in the US to train the next update to its generative AI model, Llama 4, The Information reports, and the cost of the chips alone could be more than US$2 billion. The cluster will be finished by October or November, and comes as Elon Musk’s xAI, others, build similar systems.


 
For that oracle supercomputer that's 5.2 Exaflops FP64 (both vector and tensor), zettascale using Int8/FP8 - circa 1.2 Zettaops/Zops? 1000W TDP each on top of that, that alone is 131MW for GPUs and with everything else, networking, power for cooling on top, are they getting close to 200MW to run the entire thing? Do they get a free power station with their order on top of that? Engineered insanity
 
are they getting close to 200MW to run the entire thing? Do they get a free power station with their order on top of that? Engineered insanity
3 small portable nuclear reactors!

Oracle has secured the permits to build three small modular reactors (SMRs) to power its AI data center. During its quarterly earnings call, the company said (via The Register) that it plans to use those tiny nuclear plants for a planned AI data center with at least one-gigawatt capacity.

SMRs are miniaturized reactors similarly sized to those used on naval vessels like submarines and aircraft carriers. However, since they do not have to be built inside the cramped space of a warship, SMRs do not have to be customized to the needs of a particular vessel. This means Oracle could find a supplier to mass produce it for them at a lower cost than the Navy

Furthermore, an SMR’s modular design means that it should, in theory, be cheaper to operate, especially as it no longer has the massive infrastructure often associated with traditional nuclear power plants.

 
I wonder if Oracle is already seeing enough demand to justify building such a massive complex or if they’re building it in the hope that demand will materialize. That’s the biggest question around all this AI spend right now.
 
The article mentions Oracle currently has 162 data centers in running or under construction. But their plan calls for 2000 data centers worldwide for Oracle only! And that is just one data center company.
 
5.2 eflops is a lot of computation power and people always want more of it, even for non-AI related works. This is one of the upside of buying GPU for AI instead of dedicated AI processors.
 
Oracle founder Larry Ellison admitted (via Barron’s) that he had to beg Nvidia CEO Jensen Huang to supply his company with its latest GPUs

In Nobu Palo Alto, I went to dinner with Elon Musk, Jensen Huang, and I would describe the dinner as me and Elon begging Jensen for GPUs. Please take our money; no, take more of it. You’re not taking enough of it; we need you to take more of our money, please,” Ellison said during the call. “It went okay; it worked.”


 
I'd like to say it's because they are worried about a potential war which might disrupt the supply chain greatly, so they need to grab as much as possible when they can. However, if that's the case it'd be the US government that's be grabbing all the GPU, not the private industries. So... maybe they see something ordinary people don't see, such as an imminent breakthrough or something similar.
It's of course possible that these people are just drinking the AI kool-aid and investing unwisely. I really can't say which one is more likely to be correct, though personally I'd like to believe the former one more.
 
Is nVidia or anyone else researching quantum AI computing devices yet? I see that as just an incredibly terrifying combo. :s
Think Nvidia is already involved.
By tightly integrating quantum computers with supercomputers, CUDA-Q also enables quantum computing with AI to solve problems such as noisy qubits and develop efficient algorithms.

CUDA-Q is an open-source and QPU-agnostic quantum-classical accelerated supercomputing platform. It is used by the majority of the companies deploying QPUs and delivers best-in-class performance.
 
"We can't do computer graphics anymore without artificial intelligence," he said. "We compute one pixel, we infer the other 32. I mean, it's incredible... And so we hallucinate, if you will, the other 32, and it looks temporally stable, it looks photorealistic, and the image quality is incredible, the performance is incredible."
 
Back
Top