Are cutting edge GPU technologies memory bound

Potato Head · Oct 4, 2024

Lurkmass said:
There was never any magic solution to RT or AI other than "more memory performance". More fixed function or other specialized HW logic alone won't net you major gains ...

Hmmmm, I wonder why Microsoft wants an entire nuke power plant to run GPUs if they are just sitting around underutilized waiting on memory accesses. It would seem to make more sense to load up on 3070’s instead of $15K a pop H100’s, no?

MfA · Nov 8, 2024

If your devs earn high 6 figures and demand the easiest to work with hardware, what you gonna do? I suspect the Chinese have far better training/inference architectures while Americans paper over poor architecture with hardware.

HBM/NVLink (and the also rans equivalents, Infinity Fabric etc) are the SMP of the modern age.

OlegSH · Feb 27, 2025

Inception Labs

We are leveraging diffusion technology to develop a new generation of LLMs. Our dLLMs are much faster and more efficient than traditional auto-regressive LLMs. And diffusion models are more accurate, controllable, and performant on multimodal tasks.

www.inceptionlabs.ai

Cool stuff!

Now, with diffusion replacing autoregressive word prediction, one can generate tokens 10X faster for a single batch request, with fewer hallucinations and improved metrics as a bonus. Another win for more mathematically dense algorithms.

Autoregressive prediction always felt inefficient (algorithmically) since it's serial. Speculative multi-token prediction was a step in the right direction, but not a true paradigm shift and now we finally have one.

MfA · Feb 27, 2025

Lower latency is nice, but total token throughput in batched inference for Deepseek V3 is around 5000 tk/s with open source software at the moment for H200/MI300x, with the code likely being far from optimal.

This method isn't necessarily more efficient, if it can't benefit from the same gains from batching.

Are cutting edge GPU technologies memory bound

Potato Head

MfA

OlegSH

Inception Labs

MfA

Similar threads