Digital Foundry Article Technical Discussion [2022]

Dampf

Regular
The last question is very interesting, too. Unified memory is IMO the way to go on laptops, because unlike desktops, upgradeability is not important in that form factor while efficiency is super important. With desktop GPUs having higher and higher TGP, laptops will have to make perfect use of the closed system they have to come closer to desktop versions again, and I assume unified memory plays a crucial role here.

There's also another major advantage aside from efficiency. You can basically have as much VRAM as you want, because it's all shared memory, so if you configure your laptop with 64 or even 128 GB, you have way more VRAM than any desktop GPU will have even in the far future, so you theoretically never have to bother with video memory ever again in games and you can bascially run every machine learning task you wish. That is all under the condition that the RAM is fast enough of course, but the M1 Ultra proves LPDDRx something can deliver up to 800 GB/s bandwidth and more. Maybe with this design you can play for more than 1 hour on battery.

So yeah, this is definately the future for laptops and I look forward to buy a new gaming laptop with unified memory when it releases. I hope it will be soon, but I agree with Alex here that it requires the whole industry to come together so it might take a while. Intel and Nvidia for example have to create a high performance SoC that incorporates a high end i7 as well as a high performant RTX xx80 series GPU. Given this circurmstance, I suppose AMD is going the first to show true high performance SoCs in laptops as they produce CPU and GPU inhouse. Their 680m is already quite a capable iGPU.
 
The last question is very interesting, too. Unified memory is IMO the way to go on laptops, because unlike desktops, upgradeability is not important in that form factor while efficiency is super important. With desktop GPUs having higher and higher TGP, laptops will have to make perfect use of the closed system they have to come closer to desktop versions again, and I assume unified memory plays a crucial role here.

There's also another major advantage aside from efficiency. You can basically have as much VRAM as you want, because it's all shared memory, so if you configure your laptop with 64 or even 128 GB, you have way more VRAM than any desktop GPU will have even in the far future, so you theoretically never have to bother with video memory ever again in games and you can bascially run every machine learning task you wish. That is all under the condition that the RAM is fast enough of course, but the M1 Ultra proves LPDDRx something can deliver up to 800 GB/s bandwidth and more. Maybe with this design you can play for more than 1 hour on battery.

So yeah, this is definately the future for laptops and I look forward to buy a new gaming laptop with unified memory when it releases. I hope it will be soon, but I agree with Alex here that it requires the whole industry to come together so it might take a while. Intel and Nvidia for example have to create a high performance SoC that incorporates a high end i7 as well as a high performant RTX xx80 series GPU. Given this circurmstance, I suppose AMD is going the first to show true high performance SoCs in laptops as they produce CPU and GPU inhouse. Their 680m is already quite a capable iGPU.
Apple showed the benefits of unified memory the industry has no excuse now.. just imagine a rtx 5070 laptop with 80gb of vram ultra fast 😏✅
 

Jay

Veteran
Really like the comparison at 35:10. Exactly as expected, Turing runs ML much faster than RDNA1 because of DP4a support, even before accounting for tensor core acceleration.

I assume this trend will continue when more and more ML in real time gets used in gaming!
Doesn't RDNA1 have DP4a?
Or is it SM driver issues?
 

PSman1700

Legend
The last question is very interesting, too. Unified memory is IMO the way to go on laptops, because unlike desktops, upgradeability is not important in that form factor while efficiency is super important. With desktop GPUs having higher and higher TGP, laptops will have to make perfect use of the closed system they have to come closer to desktop versions again, and I assume unified memory plays a crucial role here.

There's also another major advantage aside from efficiency. You can basically have as much VRAM as you want, because it's all shared memory, so if you configure your laptop with 64 or even 128 GB, you have way more VRAM than any desktop GPU will have even in the far future, so you theoretically never have to bother with video memory ever again in games and you can bascially run every machine learning task you wish. That is all under the condition that the RAM is fast enough of course, but the M1 Ultra proves LPDDRx something can deliver up to 800 GB/s bandwidth and more. Maybe with this design you can play for more than 1 hour on battery.

So yeah, this is definately the future for laptops and I look forward to buy a new gaming laptop with unified memory when it releases. I hope it will be soon, but I agree with Alex here that it requires the whole industry to come together so it might take a while. Intel and Nvidia for example have to create a high performance SoC that incorporates a high end i7 as well as a high performant RTX xx80 series GPU. Given this circurmstance, I suppose AMD is going the first to show true high performance SoCs in laptops as they produce CPU and GPU inhouse. Their 680m is already quite a capable iGPU.

Aslong theres enough BW it could be a nice solution for laptops, theres defenitely disadvantages for unified ram like bw content and latency etc. For desktop i hope it never happens for obvious reasons.
 
Apple showed the benefits of unified memory the industry has no excuse now.. just imagine a rtx 5070 laptop with 80gb of vram ultra fast 😏✅

Apple is quite a different beast than the more open x86 marketplace though. The most obvious difference is that they operate like a console vendor - their customers don't really have that much of a choice as to what to prioritize in terms of design. Every Windows laptop OEM has to justify their design choices against a massively more differentiated market (which can certainly be a negative at points). When the bulk of your sales go to the enterprise, you've got a hard sell to convince them that a big APU is worth the extra outlay.

Secondly, they have over 200 million iPhone sales every year, that really helps to 'grease the skids' with TMSC - they're starting from a very strong place when negotiating wafer allocation. Those previous fab contracts make the cost of shipping these gargantuan chips actually viable. If they didn't and had to negotiate based on ~10million M1 sales per year by themselves without years of being TMSC's most profitable customer, they may not be financially viable at these nodes with this density.

Third, they also make some of the most widely used pieces of software for their chip architecture. They're not releasing the equivalent of a $800 APU and saying "It will be great once developers tap into this" - they're releasing updated versions of stuff like Final Cut that can actually take advantage of all that extra custom silicon right at launch. They don't have to convince customers that they will 'eventually' get some benefit from their massive laptop APU, in that the extra cost they paid for that GPU power will be applicable to apps they use beyond games. Games, mind you, that might actually have worse performance than a discrete solution available for cheaper. Apple doesn't have to worry about competition from discrete solutions entirely.

Just much bigger challenges for PC OEM's. I think we will eventually get there, but for AMD/Intel to really prioritize high-performance APU's, we need a high-volume case for them to be produced that makes them far more cost effective than discrete GPU solutions. For that to happen we need some way to make high-bandwidth memory that's unified and affordable, but also more software use-cases for a beefy GPU that matters more than laptop gaming. I don't know if that market is large enough by itself to facilitate the production of these APU's at a scale that makes them financially feasible atm.
 
Last edited:

PSman1700

Legend
I think it will be like DF mentioned (things will probably never go there), make it so that the distance between GPU/CPU/main memory is shorter and faster, instead of 'downgrading' to UMA solutions which would implicate performance/latency and upgrading/changing defective components etc. We kind of see things moving this way already.
 

Dampf

Regular
UMA is not a downgrade. The kind of LPDDRX used in M1 Ultra has similar low latency to DRAM but also provides super fast bandwidth for the GPU, it's the "eierlegende Wollmilchsau" as we say in German. There is no need for seperated pools anymore and UMA would result into unprecedented amounts of video memory, as the LPDDRX RAM is both high bandwidth GPU memory and low latency DRAM for the CPU at the same time.

This needs to happen and when it happens, it's glorious upgrade time for me.
 

PSman1700

Legend
UMA is not a downgrade. The kind of LPDDRX used in M1 Ultra has similar low latency to DRAM but also provides super fast bandwidth for the GPU, it's the "eierlegende Wollmilchsau" as we say in German. There is no need for seperated pools anymore and UMA would result into unprecedented amounts of video memory, as the LPDDRX RAM is both high bandwidth GPU memory and low latency DRAM for the CPU at the same time.

This needs to happen and when it happens, it's glorious upgrade time for me.

Could see it happening for (some) gaming laptops for sure. In the desktop space? I think not (for the same reasons DF pointed out in the video).
 

pjbliverpool

B3D Scallywag
Legend
UMA is not a downgrade. The kind of LPDDRX used in M1 Ultra has similar low latency to DRAM but also provides super fast bandwidth for the GPU, it's the "eierlegende Wollmilchsau" as we say in German. There is no need for seperated pools anymore and UMA would result into unprecedented amounts of video memory, as the LPDDRX RAM is both high bandwidth GPU memory and low latency DRAM for the CPU at the same time.

This needs to happen and when it happens, it's glorious upgrade time for me.

Presumably there's a big cost premium associated with it then? And how does the interface compare with current PC memory interfaces in terms of cost and complexity?

It's great that it can achieve both high bandwidth and low latency but if the memory is much more expensive and/or the interface bigger/more complex then for the same price bracket we could still be looking at less overall performance or smaller memory capacities.
 
While in an ideal world UMA would be good for PC it's important to remember we don't live in an ideal world.

The ability to use 'add-in' cards on PC create huge technical and financial hurdles for using UMA and if you remove that 'upgrade' ability to make UMA viable you've basically reduced a PC to a console.
 
Last edited:

pjbliverpool

B3D Scallywag
Legend
I'd prefer more of what we're currently getting in the PC space tbh which is ever closer integration between discrete CPU's and GPU's to bring the advantages of both UMA and separate dedicated processors under one umbrella. Things like resizable bar, smart access storage and HBCC are great examples of this. The tech just needs to become standardised so it can start being treated as the default by devs.
 

Dampf

Regular
I'd prefer more of what we're currently getting in the PC space tbh which is ever closer integration between discrete CPU's and GPU's to bring the advantages of both UMA and separate dedicated processors under one umbrella. Things like resizable bar, smart access storage and HBCC are great examples of this. The tech just needs to become standardised so it can start being treated as the default by devs.
Yeah, I assume desktop PCs will continue this path and improving it.

However, I assume huge frame drops when VRAM is overflowing will always be a thing there as a result. I can't see how this is ever going to change as DRAM will always be much slower than GDDR. DDR5-6400 for example only has around 90 GB/s, a far cry from 800 GB/s and more on High End GPUs and M1 Ultra.
 
Top