NVIDIA discussion [2024]

  • Thread starter Deleted member 2197
  • Start date
Some more information on the Foxconn/Nvidia supercomputer in Taiwan.
With an expected performance of over 90 exaflops of AI performance, the machine would easily be considered the fastest in Taiwan. Foxconn plans to use the supercomputer, once operational, to power breakthroughs in cancer research, large language model development and smart city innovations, positioning Taiwan as a global leader in AI-driven industries.

Construction has started on the new supercomputer housed in Kaohsiung, Taiwan. The first phase is expected to be operational by mid-2025. Full deployment is targeted for 2026. The project will integrate with NVIDIA technologies, such as NVIDIA Omniverse and Isaac robotics platforms for AI and digital twins technologies to help transform manufacturing processes.

“Powered by NVIDIA’s Blackwell platform, Foxconn’s new AI 64 rack supercomputer is one of the most powerful in the world, representing a significant leap forward in AI computing and efficiency,” said Foxconn Vice President and Spokesperson James Wu.

The GB200 NVL72 is a state-of-the-art data center platform optimized for AI and accelerated computing. Each rack features 36 NVIDIA Grace CPUs and 72 NVIDIA Blackwell GPUs connected via NVIDIA’s NVLink technology, delivering 130TB/s of bandwidth. NVIDIA NVLink Switch allows the 72-GPU system to function as a single, unified GPU.

Taiwan-based Foxconn, officially known as Hon Hai Precision Industry Co., is the world’s largest electronics manufacturer, known for producing a wide range of products, from smartphones to servers, for the world’s top technology brands.

With a vast global workforce and manufacturing facilities across the globe, Foxconn is key in supplying the world’s technology infrastructure. It is a leader in smart manufacturing as one of the pioneers of industrial AI as it digitalizes its factories in NVIDIA Omniverse.

Foxconn was also one of the first companies to use NVIDIA NIM microservices in the development of domain-specific large language models, or LLMs, embedded into a variety of internal systems and processes in its AI factories for smart manufacturing, smart electric vehicles and smart cities.
 
The Weibo post that Wccftech cites as their source says (according to two different machine translations) that the SoC is entering mass production during H2 2025. So not H2 2025 "unveil" or "release" as Wccftech words it.

If the Weibo post is 100% on the money, actual product launch could also be early 2026.
 
The Weibo post that Wccftech cites as their source says (according to two different machine translations) that the SoC is entering mass production during H2 2025. So not H2 2025 "unveil" or "release" as Wccftech words it.

If the Weibo post is 100% on the money, actual product launch could also be early 2026.
Also if it would be in "tape-out phase" now it would be horribly outdated by then
 
Taiwanese financial newspaper has the AI chip taping out third quarter(2024) and mass production 1st quarter next year.
Sounds more logical but at this point just rumors.
That would fit the supposed Qualcomm exclusivity in Win11 space ending mid 2025 better
 
I recently said that Nvidia would have some new proprietary technology coming. Sounds like it will be revealed in January at CES.

NVIDIA's GeForce RTX 50 "Blackwell" GPUs including the RTX 5090, RTX 5080, and RTX 5070 will be announced at CES 2025 so expect a jam-packed session delivered by CEO Jensen Huang & we will also learn a bit more about next-generation AI technologies for gamer which are going to be a major surprise for everyone.

Any guesses as to what it could be? :D
 
Some sort of image "enhancing" post processing suite. We already have a first step in this with RTX HDR. The integration potential here would be to combine that with a LLM and basically have it effectively automatically tune visuals somehow to user preference. I do wonder how something like this would be received. It would cause an even larger disconnect and schism between benchmark numbers and user experience.

In terms of a next gen DLSS 4 something major is if they could make frame generation variable, as in you just set a FPS target, essentially eliminating the need for VRR or vsync.
 
In terms of a next gen DLSS 4 something major is if they could make frame generation variable, as in you just set a FPS target, essentially eliminating the need for VRR or vsync.
Not perhaps in the extent you want (and no, it doesn't eliminate the need for VRR), but this is already available with AFMF2, it works with Radeon Chill which you can use to limit FPS. Not sure if it works with FSR3 too.
 
Also if it would be in "tape-out phase" now it would be horribly outdated by then

Not necessarily, that's just how long the bring up process seems to take these days. For example Intel has recently taped out Panther Lake, which is due to launch in H2'25 (Likely Q4'25) which seems a similar time frame. They took even longer with Meteor Lake (~18 months IIRC). AMD seemingly takes a bit less, Zen 5 taped out sometime in 2023.
 
Raja Kudori and Stas Bekman developed a tool that measured the effective compute performance extractable from a given AI GPU (resources, flops and bandwidth). The new metric is called MAMF "Maximum Achievable Matmul FLOPS". All tests are done using the maximum software optimizations possible and using latest software. Results are interesting.

Pure Performance measured in MAMF:
MI250X: 147
A100 SXM: 267
MI300X: 782
H100 SXM: 792
H200 SXM: 820

Effeciency of extracted performance/theoretical maximum:
MI250X: 38%
MI300X: 60%
A100 SXM: 85%
H100 SXM: 80%
H200 SXM: 83%

 
Raja Kudori and Stas Bekman developed a tool that measured the effective compute performance extractable from a given AI GPU (resources, flops and bandwidth). The new metric is called MAMF "Maximum Achievable Matmul FLOPS". All tests are done using the maximum software optimizations possible and using latest software.
This seems a pretty useful measure, though there are some acknowledged caveats:

MAMF Finder benchmarks torch.mm with row-major matrices, but other memory layouts are possible and may allow higher peak performance. Also, in certain cases like fusing operations that may follow a matmul (e.g. Gated Linear Units, LSTM, FlashAttention) or using a sliding window (as in convolution or windowed attention) you can get a higher flops/bandwidth ratio than a standard matmul and might be able to exceed MAMF.


It's important to note that MAMF Finder uses N as the shared dimension, whereas BLAS GEMM uses K as the shared dimension. So read "best shape" with that in mind.
 
Board Channel rumor (KitGuru report linked below) that RTX 50 will launch on a relatively tight schedule - if true it seems likely aimed at sucking the any air left for RDNA 4 out of the market. I guess one could also look at this as close to “on-time” and it’s the high end cards that are actually late.

On a side note, I’ve noticed that most of the recent reports on release timing are coming from Chinese boards that refer explicitly to the 5090D, not the 5090 launching at CES. English media keeps reporting these “leaks” as applicable to the 5090 (or to both). That is not accurate reporting, even if common sense suggests it will probably end up being the case given a) it’s a Chinese board and they’re naturally going to be interested in the model they can buy; and b) it’s unclear when else the 5090 would launch (I can’t see Jensen doing a 5090 only release presentation then heading to CES a month or two later, but really who the hell knows). I expect this kind of poor reporting from click mills but not somewhere like KitGuru. Maybe it’s my expectations that are out of whack!

All that said, I’m cautious on whether this is true. According to the rumor both 5080 and 5090 will release in January - a tight window between release and presentation given the reviewers generally need at least a week prior to launch. Could be done - mid January to late March would be absolutely packed with releases between NVIDIA, RDNA4, and the rest of Zen 5.

 
Last edited:
Back
Top