LLMs may force Apple to integrate more RAM into A-series SOCs

wco81

Legend
Read an article this morning about Apple possibly looking at finding workarounds to run LLMs locally, without embedding a lot of RAM in their next SOCs.

iPhone 15 Pro Max has 8 GB of RAM. Non Pro models have 6 GB.

In comparison, Pixel 8 Pro may have 12 GB and the new Pixel 9 phones may have some SKUs with 16 GB of RAM in their next Tensor SOC.

Apple SOCs for iPhones have remained at the top of mobile device performance despite lower RAM. But LLMs may demand more RAM.

So the article referenced some research paper about ways to cache parts of the LLMs in flash memory, so that Apple devices with less RAM can get around the high RAM requirements of LLMs.

Turns out this research paper isn't new, was publicized back in December.

Apple’s AI researchers say they’ve made a huge breakthrough in their quest to deploy large language models (LLMs) on Apple devices like the iPhone without using a ton of memory. Researchers are instead bringing LLMs to the device using a new flash memory technique, MacRumors reports.

In a research paper called "LLM in a Flash: Efficient Large Language Model Inference with Limited Memory," researchers note that in the world of mobile phones, flash storage is more prevalent than the RAM that is traditionally used to run LLMs. Their method works by employing windowing, which is reusing some of the data it has already processed to reduce the need for constant memory fetching, and by row-column bundling, which involves grouping data so it can be read faster from the flash memory.

According to the paper, this method allows AI models to run twice the size of the available memory in the iPhone, something researchers claim is 4-5x faster than on standard processors (CPUs) and 20-25 times on graphics processors (GPUs).

The researchers note: “This breakthrough is particularly crucial for deploying advanced LLMs in resource-limited environments, thereby expanding their applicability and accessibility. We believe as LLMs continue to grow in size and complexity, approaches like this work will be essential for harnessing their full potential in a wide range of devices and applications.”


Might be a workaround for older devices but Apple which has gradually been increasing RAM in the A-series SOCs may have to accelerate their schedule.

Of course this might imply higher costs for devices with more RAM in the SOCs, translating to increased prices.
 
The title is a bit misleading (should refer to LLMs) as Apple has been doing machine learning for many years on device.

The biggest component being their computational photography workflow.
 
Changed the title.

You're right but of course it's LLMs powering all the hype right now, though image and video generation may become bigger part of "AI."

Apple has been able to skip on RAM and do well with the CPU-intensive functions like Portrait mode photos, Night photos, etc.

They helped move devices but now the market is demanding some kind of AI strategy. So it will be interesting to see if different types of AI applications lead to design changes or changes in how the silicon budget is deployed on Apple SOCs for mobile devices.

For the M chips for Macs and iPad Pros, looks like they have more room to integrate more RAM, not just for AI but other types of applications.

But iPhone is the big moneymaker and expectations are that AI will drive phone sales, which have been slowing down in the last year or two.
 
Changed the title.

You're right but of course it's LLMs powering all the hype right now, though image and video generation may become bigger part of "AI."

Apple has been able to skip on RAM and do well with the CPU-intensive functions like Portrait mode photos, Night photos, etc.

They helped move devices but now the market is demanding some kind of AI strategy. So it will be interesting to see if different types of AI applications lead to design changes or changes in how the silicon budget is deployed on Apple SOCs for mobile devices.

For the M chips for Macs and iPad Pros, looks like they have more room to integrate more RAM, not just for AI but other types of applications.

But iPhone is the big moneymaker and expectations are that AI will drive phone sales, which have been slowing down in the last year or two.
They also do FaceID, autocorrect, facial recognition, the integration with the ISP (as mentioned above) and a bunch of smaller task.

Apple have been doing machine learning with dedicated hardware all the way back to the iPhone X on device pretty much.

They (Apple) have just released their own machine learning models - 8 in total - called OpenELM (Open-source Efficient Language Models) for running on device. You can take a gander on Huggingface here for their transformer models.
 
Regarding to LLMs, Microsoft recently released Phi3 (still in beta) with the aiming to have fewer parameters to achieve the same quality, using better training materials.

There are also researches focusing on the number precision of parameters. Llama-2 was released with FP16, but people found that quantizing it into 8 bits or even 4 bits still produces pretty good quality results. Now there are even 2 bits or 1 bit quantization and that allows people to run models like the recent Llama-3 70B with a consumer GPU with 24GB VRAM. The general idea seems to be that a model with higher precision but fewer paramters tend to perform worse than a model with lower precision but more paramters (e.g. a 13B model quantized to 4 bits tend to perform better than a 7B model quantized to 8 bits). Phi-3 quantized to 4 bits takes only ~2.2GB memory, which should be fine even under current phone standard.

Furthermore, for something that's running on a phone for general usage, there's no need for it to be like all knowledgeable. Llama-3 is pretty good that it speaks multiple languages and able to produce reasonable answers for something like "Write a Go program solving n queen problem" but normally people don't need that. Maybe we'll see more "mini" models in the future like Phi-3 and Google's Gemma.
 
Some speculation on how Apple may deploy different AI apps on their devices.


There are ELMs or Efficient Language Models which would take up smaller footprints on devices.

There are also image-editing models which might run on mobile devices as well.

However, Apple may still consider trying to deploy LLMs from the big names, which might mean larger footprints.

However, even with all the model releases from Apple, the company reportedly reached out to Google and OpenAI to bring their models to Apple products.
 
Back
Top