How would NPUs improve on LLMs?
Would they improve the performance of LLM services like ChatGPT or CoPilot, which do all the processing in the cloud?
Or would they for local LLMs, in which case what is the point, unless maybe you're trying to develop specialized LLMs?
But you're referring to NPU at the data center.When you are running stuff like ChatGPT inference at scale in a data center, power consumption is critical. And as good as B100 is as a general purpose GPU that needs to do training and inference, it won't be able to compete in terms of tokens/watt with a dedicated inference NPU. There's a reason that companies like MS, Meta, and Amazon are developing their own inference chips for the cloud.
They're better fit for SLMs (Small Language Model). At least some of Windows' new AI features use SLMs rather than LLMs and they run locally. Pretty sure anything running locally on any low power platform is using SLMs tbh.But you're referring to NPU at the data center.
What about the NPUs that companies are touting on these ARM SOC laptops and tablets? How do they enhance LLM performance on the client machines?
But you're referring to NPU at the data center.
What about the NPUs that companies are touting on these ARM SOC laptops and tablets? How do they enhance LLM performance on the client machines?
I understand the idea of SLMs for mobile devices.
But would they have the same appeal or hype of external services like ChatGPT or CoPilot?
Would people be content to use these local AI features or would they mostly use these commercial services which have all the visibility and attention?
I know one of the arguments for SLMs is privacy but if LLMs offer certain features that you can't do locally, you wonder if NPUs on the device will get orphaned.
This depends on how well "SLMs" performs compared to ChatGPT or other very large models.
The big question is, is it possible to make a relatively small model which might not be as knowledgable but still retains reasonable ability for, say, reasoning? Microsoft has been trying to do that with Phi-3-mini, which only has 3.8 billion parameters.
One can argue that for a local LLM (or "SLM") it does not really need to be able to pass the bar exam. It's probably more useful if it knows more about the OS you're using and is able to help you on how to change some settings or fix some problems, than knowing some specific detail of some obscure law.
The problem is we don't know how bad a LLM's reasoning ability is affected by its size. However, there are researches on improving it, for example, running multiple rounds with automatic feedbacks, which may enable a smaller LLM to have better reasoning capability.
Is it really reasoning or recognizing word patterns and able to statistically determine what the most likely following words are?